World: r3wp
[Parse] Discussion of PARSE dialect
older newer | first last |
Maxim 17-Oct-2009 [4535x2] | really, the problem is not the parsing itself... its getting the darn rules to generate the proper rules hehehe. |
and its not simple parsing since I use parsing index manipulation, which is also dictated by the source data in encounters. its like swatting flies using a fly swatter at the end of a rope, while riding a roller coster which changes layout every time you ride it ;-) | |
BrianH 17-Oct-2009 [4537] | Which is what a rule compiler does :) Actually, it sounds like you could adapt the tricks of the ruule compiler to *your* rule compiler, which would let you use the new operations in your rule source and have the workarounds generated in the output. |
Maxim 17-Oct-2009 [4538x2] | well, build it and I will try it ;-) |
I promise. | |
BrianH 17-Oct-2009 [4540] | It's on my list... |
Maxim 17-Oct-2009 [4541x3] | my deadline is to have a site working by this week... unless this darned bug I am trying to kill doesn't kill me first. |
doh... when you're too close to the tree... you can't see the forest... I was using TO parse command on a rule ... this obviously won't work.... | |
(it only accepts a string... dummy :-) | |
Pekr 18-Oct-2009 [4544] | ah, got reply on Chat from Carl towards complementing: Re #5718: Pekr, that's a good question, and I think the answer must be YES. We need to be able to complement bitmaps in a nice way". Otherwise, Unicode bitmaps, even if simply used on ASCII chars, would take a lot of memory. This change should be listed on the project sheet, and if not, I'll add it there." |
Chris 22-Oct-2009 [4545x3] | Is there any advantage in breaking up charsets that represent a large varied range of the 16-bit character space? For example, XML names are defined as below (excluding > 2 ** 16), but are most commonly limited to the ascii-friendly subset: w1: charset [ #"A" - #"Z" #"_" #"a" - #"z" #"^(C0)" - #"^(D6)" #"^(D8)" - #"^(F6)" #"^(F8)" - #"^(02FF)" #"^(0370)" - #"^(037D)" #"^(037F)" - #"^(1FFF)" #"^(200C)" - #"^(200D)" #"^(2070)" - #"^(218F)" #"^(2C00)" - #"^(2FEF)" #"^(3001)" - #"^(D7FF)" #"^(f900)" - #"^(FDCF)" #"^(FDF0)" - #"^(FFFD)" ] w+: charset [ #"-" #"." #"0" - #"9" #"A" - #"Z" #"_" #"a" - #"z" #"^(B7)" #"^(C0)" - #"^(D6)" #"^(D8)" - #"^(F6)" #"^(F8)" - #"^(037D)" #"^(037F)" - #"^(1FFF)" #"^(200C)" - #"^(200D)" #"^(203F)" - #"^(2040)" #"^(2070)" - #"^(218F)" #"^(2C00)" - #"^(2FEF)" #"^(3001)" - #"^(D7FF)" #"^(f900)" - #"^(FDCF)" #"^(FDF0)" - #"^(FFFD)" ] word: [w1 any w+] |
(sorry if that looks messy) | |
Both w1 and w+ appear to be very large values. Would it be smart to perhaps do: [[aw1 | w1] any [aw+ | w+]] Where 'aw1 and 'aw+ are limited to ascii values? | |
Steeve 22-Oct-2009 [4548x4] | Uses R3 (and his optimized complemented bitsets) |
Anyway, a bitset with a length of 2 ** 16 is not so huge in memory (only 16kb) | |
64 Kb , sorry | |
So W1 + W+ = 128Kb Is this a problem ? | |
Chris 22-Oct-2009 [4552] | That's what I'm asking. Complemented bitsets wouldn't make a difference here though as the excluded range is of similar scope, right? |
Steeve 22-Oct-2009 [4553x2] | It seems |
if the size is a problem you can build a function to test each range. But It will be slow | |
Chris 22-Oct-2009 [4555x3] | Not size, efficiency. |
Allowing 'into to look inside strings can break current usage of 'into, requiring [and any-block! into ...] | |
An example: a nested d: [k v] structure where 'k is a word and 'v is 'd or any other type: data: [k [k "s"]] R2, you can validate with d: [word! [into d | skip]] Now you have to specify: d: [word! [and any-block! into d | skip]] otherwise you get an error if 'v is a string! | |
Sunanda 25-Oct-2009 [4558] | I guess parse can do this too? http://stackoverflow.com/questions/1621906/is-there-a-way-to-split-a-string-by-every-nth-seperator-in-python |
Will 25-Oct-2009 [4559] | is R2/Forward available for download? thx |
Geomol 25-Oct-2009 [4560x2] | Sunanda, one way: >> out: clear [] >> parse "this-is-a-string" [mark1: any [thru "-" [to "-" | to end] mark2: (append out copy/part mark1 mark2) skip mark1:]] >> out == ["this-is" "a-string"] |
Another: >> out: parse "this-is-a-string" "-" >> forall out [change/part out rejoin [out/1 "-" out/2] 2] >> out == ["this-is" "a-string"] | |
Steeve 25-Oct-2009 [4562] | R3 one liner ;-) >> map-each [a b] parse "this-is-a-string" "-" [ajoin [a #"-" b]] |
Graham 26-Oct-2009 [4563] | Rebol doesn't have lines :) |
BrianH 26-Oct-2009 [4564x2] | Chris, there can be an advantage in R3 to breaking up a bitset into more that one bitset on occasion, mostly memory savings. However, it might not work as well as you might like since offset and/or sparse bitsets aren't supported. Bitsets that involve high codepoints will take a lot of RAM no matter what you do. |
Will, R2/Forward is already available for download in DevBase (R3 chat). It is a little outdated though, since I had to take a break to rewrite R3's module system. I'll catch up when I get the chance. The percentage of R3 that I can emulate has gone down drastically since the last update, since R3 has made a lot of changes to basic datatype behavior since then. We'll see what we can do. | |
Steeve 26-Oct-2009 [4566x2] | Something funny. I spent an hour debugging a parsing rule. To finally understand this. Never name a rule, LIMIT. LIMIT keyword is reserved for a further use in parse apparently. |
(in R3) | |
Pekr 26-Oct-2009 [4568x2] | :-) |
I thought it is not implemented yet, hence no reservation? | |
Steeve 26-Oct-2009 [4570] | if you just try to use it, your parsing may crash. So, it's doing nothing but it's here. |
Pekr 26-Oct-2009 [4571x2] | Hmm, you are right .... But we might need better error message, no? >> test: ["123"] parse "123" [test] == true >> limit: ["123"] parse "123" [limit] ** Script error: PARSE - invalid rule or usage of rule: end! ** Where: parse ** Near: parse "123" [limit] |
posted to Chat/R3/Parse group ... | |
BrianH 26-Oct-2009 [4573x2] | Keywords that are *planned* to be added should definitely be reserved. |
Otherwise adding them would be difficult. | |
Steeve 26-Oct-2009 [4575] | But it should return a proper error message as Pekr noticed it. |
BrianH 26-Oct-2009 [4576] | Agreed :) |
Robert 8-Nov-2009 [4577x2] | I have used www.antlr.org stuff several years ago with C/C++ target. It's a very cool parser generator toolkit. Just took a look again. It has emitters for different languages. Maybe one of the parse gurus here can take a look if we can do a REBOL emitter. |
IMO that would be really nice. | |
JoshF 17-Nov-2009 [4579x4] | Hi! I'm trying to use REBOL's parse to make a simple calculator dialect. However, I'm having trouble with escaping entities (I think)... Here's my first try (that worked): |
>> parse [3 + 2] [some [integer! (print "number") | ['+ | '- ] (print "op")]] number op number == true | |
>> parse [3 - 2] [some [integer! (print "number") | ['+ | '- | '* | '/ ] (print "op")]] ** Syntax Error: Invalid word-lit -- ' ** Near: (line 1) parse [3 - 2] [some [integer! (print "number") | ['+ | '- | '* | '/ ] (print "op")]] | |
The second one failed when I tried to extend the dialect with multiply (*) and divide (/). After further experimentation, it seems that you can't escape the "/". Google has not been helpful here... Does anybody have any ideas? I could parse for just a word! instead of the +, -, etc., but I wanted parse to do the work of deciding what was a valid operation or not. Sorry for the multiple messages, I'm still trying to figure this client out... Thanks for any advice! | |
Ladislav 17-Nov-2009 [4583] | JoshF: Rebol load does not parse the '/, but you can do: as-lit-word: func ['word [any-word!]] [to lit-word! word] lit-div: as-lit-word / parse [3 - 2] [some [integer! (print "number") | ['+ | '- | '* | lit-div] (print "op")]] |
JoshF 17-Nov-2009 [4584] | Ha! Black magic! That works a champ Ladislav, thanks very much! I had tried >> tdiv: to-word "/" == / >> parse [3 / 2] [some [integer! (print "number") | ['+ | '- | '* | tdiv ] (print "op )]] But had gotten the same error. What makes yours work? |
older newer | first last |