World: r3wp
[Parse] Discussion of PARSE dialect
older newer | first last |
BrianH 17-Oct-2009 [4522x2] | Gabriele, these changes can be backported to R2 in the form of a rule compiler that generates (unreadable) R2 parse rules. |
Pekr, we still need complementing to be enhanced. Even Carl has said so. | |
Maxim 17-Oct-2009 [4524x3] | but wouldn't work with remark ;-) |
one situation which complemet can't handle very well (ram wise): union charset "a" complement charset "b" | |
you end up with a full codepoint bitset minus one byte if it complemented or not | |
BrianH 17-Oct-2009 [4527x2] | Maxim, Remark could be adjusted to use the rule compiler. For that matter, Remark could use R2/Forward (which needs some work, but is already better than R2 on its own). |
Maxim, that is what Pekr was talking about. That is planned to be fixed. | |
Maxim 17-Oct-2009 [4529x4] | a rule compiler doesn't adapt very well to self-modifying rules |
laden with many paren expressions and a stack on top of it. | |
the rule I am writing now actually does JIT rule compilation... hairy to debug :-) | |
since I use binding to map inner rules which are also constructed on the fly but have to be pushed and poped from the stack as I traverse data... its a lot of fun :-D | |
BrianH 17-Oct-2009 [4533x2] | If the self-modifying rules are strung-together basic blocks, you can use the rule compiler to generate the blocks. And the R3 changes make self-modifying rules less necessary, so you can have even larger basic blocks. |
Of course the *result* of the compilation would be self-modifying rules :) | |
Maxim 17-Oct-2009 [4535x2] | really, the problem is not the parsing itself... its getting the darn rules to generate the proper rules hehehe. |
and its not simple parsing since I use parsing index manipulation, which is also dictated by the source data in encounters. its like swatting flies using a fly swatter at the end of a rope, while riding a roller coster which changes layout every time you ride it ;-) | |
BrianH 17-Oct-2009 [4537] | Which is what a rule compiler does :) Actually, it sounds like you could adapt the tricks of the ruule compiler to *your* rule compiler, which would let you use the new operations in your rule source and have the workarounds generated in the output. |
Maxim 17-Oct-2009 [4538x2] | well, build it and I will try it ;-) |
I promise. | |
BrianH 17-Oct-2009 [4540] | It's on my list... |
Maxim 17-Oct-2009 [4541x3] | my deadline is to have a site working by this week... unless this darned bug I am trying to kill doesn't kill me first. |
doh... when you're too close to the tree... you can't see the forest... I was using TO parse command on a rule ... this obviously won't work.... | |
(it only accepts a string... dummy :-) | |
Pekr 18-Oct-2009 [4544] | ah, got reply on Chat from Carl towards complementing: Re #5718: Pekr, that's a good question, and I think the answer must be YES. We need to be able to complement bitmaps in a nice way". Otherwise, Unicode bitmaps, even if simply used on ASCII chars, would take a lot of memory. This change should be listed on the project sheet, and if not, I'll add it there." |
Chris 22-Oct-2009 [4545x3] | Is there any advantage in breaking up charsets that represent a large varied range of the 16-bit character space? For example, XML names are defined as below (excluding > 2 ** 16), but are most commonly limited to the ascii-friendly subset: w1: charset [ #"A" - #"Z" #"_" #"a" - #"z" #"^(C0)" - #"^(D6)" #"^(D8)" - #"^(F6)" #"^(F8)" - #"^(02FF)" #"^(0370)" - #"^(037D)" #"^(037F)" - #"^(1FFF)" #"^(200C)" - #"^(200D)" #"^(2070)" - #"^(218F)" #"^(2C00)" - #"^(2FEF)" #"^(3001)" - #"^(D7FF)" #"^(f900)" - #"^(FDCF)" #"^(FDF0)" - #"^(FFFD)" ] w+: charset [ #"-" #"." #"0" - #"9" #"A" - #"Z" #"_" #"a" - #"z" #"^(B7)" #"^(C0)" - #"^(D6)" #"^(D8)" - #"^(F6)" #"^(F8)" - #"^(037D)" #"^(037F)" - #"^(1FFF)" #"^(200C)" - #"^(200D)" #"^(203F)" - #"^(2040)" #"^(2070)" - #"^(218F)" #"^(2C00)" - #"^(2FEF)" #"^(3001)" - #"^(D7FF)" #"^(f900)" - #"^(FDCF)" #"^(FDF0)" - #"^(FFFD)" ] word: [w1 any w+] |
(sorry if that looks messy) | |
Both w1 and w+ appear to be very large values. Would it be smart to perhaps do: [[aw1 | w1] any [aw+ | w+]] Where 'aw1 and 'aw+ are limited to ascii values? | |
Steeve 22-Oct-2009 [4548x4] | Uses R3 (and his optimized complemented bitsets) |
Anyway, a bitset with a length of 2 ** 16 is not so huge in memory (only 16kb) | |
64 Kb , sorry | |
So W1 + W+ = 128Kb Is this a problem ? | |
Chris 22-Oct-2009 [4552] | That's what I'm asking. Complemented bitsets wouldn't make a difference here though as the excluded range is of similar scope, right? |
Steeve 22-Oct-2009 [4553x2] | It seems |
if the size is a problem you can build a function to test each range. But It will be slow | |
Chris 22-Oct-2009 [4555x3] | Not size, efficiency. |
Allowing 'into to look inside strings can break current usage of 'into, requiring [and any-block! into ...] | |
An example: a nested d: [k v] structure where 'k is a word and 'v is 'd or any other type: data: [k [k "s"]] R2, you can validate with d: [word! [into d | skip]] Now you have to specify: d: [word! [and any-block! into d | skip]] otherwise you get an error if 'v is a string! | |
Sunanda 25-Oct-2009 [4558] | I guess parse can do this too? http://stackoverflow.com/questions/1621906/is-there-a-way-to-split-a-string-by-every-nth-seperator-in-python |
Will 25-Oct-2009 [4559] | is R2/Forward available for download? thx |
Geomol 25-Oct-2009 [4560x2] | Sunanda, one way: >> out: clear [] >> parse "this-is-a-string" [mark1: any [thru "-" [to "-" | to end] mark2: (append out copy/part mark1 mark2) skip mark1:]] >> out == ["this-is" "a-string"] |
Another: >> out: parse "this-is-a-string" "-" >> forall out [change/part out rejoin [out/1 "-" out/2] 2] >> out == ["this-is" "a-string"] | |
Steeve 25-Oct-2009 [4562] | R3 one liner ;-) >> map-each [a b] parse "this-is-a-string" "-" [ajoin [a #"-" b]] |
Graham 26-Oct-2009 [4563] | Rebol doesn't have lines :) |
BrianH 26-Oct-2009 [4564x2] | Chris, there can be an advantage in R3 to breaking up a bitset into more that one bitset on occasion, mostly memory savings. However, it might not work as well as you might like since offset and/or sparse bitsets aren't supported. Bitsets that involve high codepoints will take a lot of RAM no matter what you do. |
Will, R2/Forward is already available for download in DevBase (R3 chat). It is a little outdated though, since I had to take a break to rewrite R3's module system. I'll catch up when I get the chance. The percentage of R3 that I can emulate has gone down drastically since the last update, since R3 has made a lot of changes to basic datatype behavior since then. We'll see what we can do. | |
Steeve 26-Oct-2009 [4566x2] | Something funny. I spent an hour debugging a parsing rule. To finally understand this. Never name a rule, LIMIT. LIMIT keyword is reserved for a further use in parse apparently. |
(in R3) | |
Pekr 26-Oct-2009 [4568x2] | :-) |
I thought it is not implemented yet, hence no reservation? | |
Steeve 26-Oct-2009 [4570] | if you just try to use it, your parsing may crash. So, it's doing nothing but it's here. |
Pekr 26-Oct-2009 [4571] | Hmm, you are right .... But we might need better error message, no? >> test: ["123"] parse "123" [test] == true >> limit: ["123"] parse "123" [limit] ** Script error: PARSE - invalid rule or usage of rule: end! ** Where: parse ** Near: parse "123" [limit] |
older newer | first last |