World: r3wp

Join the discussions in the REBOL3 world...

[Parse] Discussion of PARSE dialect

older newer	first last
Davide 3-Dec-2008 [3283x2]	ok this is my first attempt for a basic expression parser stack: copy [] push: func [x] [insert stack x] pop: does [take stack] term: [ pos: paren! :pos into [expr] (push rejoin ["(" pop ")"]) \| set t string! (push mold t ) \| set sign opt ['+ \| '-] set t number! (if none? sign [sign: '+] push mold (t * to-integer join sign "1")) \| set gp path! (push rejoin [{$('#} first gp {').attr('} second gp {')} ]) ] oper: compose ['+ \| '- \| '* \| (to-lit-word "/")] operrule: [set o oper (push o)] expr: [term any [operrule term (t2: pop o: pop t1: pop push rejoin [t1 o t2])]] I'm sure it's not optimized nor correct, so any correction is welcome.
Davide 3-Dec-2008 [3283x2]	it can parse expression like: [- 1 * (a/b + 2)] and it emits a javascript compatible string. (is a part of a comet server written in rebol)
Oldes 4-Dec-2008 [3285x3]	instead of: set sign opt ['+ \| '-] set t number! (if none? sign [sign: '+] push mold (t * to-integer join sign "1")) \| I would use something like: set t number! (push mold t ) \| '- set t number! (push mold negate t ) \| (Which is not perfect as well)
	because the above will not work for expressions like: [ - (1 + 1)]
	also I'm not sure, if the mold is needed
Brock 4-Dec-2008 [3288x5]	I am having some parse difficulties. I have the below code...
	records: read/lines http://www.geocities.com/[kalef-:-rogers-:-com]/samples.txt lang-rule: [ thru "language%3a" copy language to "+%2b" \| thru "language%3a" copy language to "&resultview" \| thru "language%3a" copy language to "&" ;thru "language=" copy language to "&" \| ] rec: 1 foreach record records[ language: copy "" parse record [lang-rule] print [rec tab language] rec: rec + 1 ]
	in the parse rule lang-rule. It seems that only two of the three rules work at any one time with the data I have loaded. (note the last rule is commented out)
	If I change the order of the rules and place the pipe or bar '\|' as appropriate, I can't get all three of these rules to work together.
	I expect the output to be either the text 'en' or 'fr' (for english or french) for each record, however records 52 & 66 the parse is not ending properly for the current setup. If you change the order of the rules, then other records will not work as expected. Any ideas?
Oldes 4-Dec-2008 [3293]	lang-rule: [ thru "language%3a" copy language ["en" \| "fr"] to end ]
Chris 4-Dec-2008 [3294]	You could dehex first? That would make things more consistent...
Brock 4-Dec-2008 [3295x2]	thanks for the suggestions. I never thought of using the actual text I was expecting as an option like you suggested Oldes.
Brock 4-Dec-2008 [3295x2]	Okay Oldes, your solution works, but why does my code fail, any ideas? I have other scenarios that follow this same sort of structure but do not have a simple two word expected result. I've been able to handle these so far, simply by changing the order and moving the 'stop' word of the Ampersand to the bottom of the rule options. [I'm trying Chris's option now]
Davide 4-Dec-2008 [3297]	I'm here again ! Is possible to write a rule that match any word! but those that are in a block ? example: list: ['for 'while 'show] parse block [any [word! NOT IN list]] If there's no a direct approach, is there a workaround ?
Oldes 4-Dec-2008 [3298x6]	Brock: it'sbecause in rows 52 and 66 you find "+%2b".It'snot next to lang value you want, but it's there so the first rule is true.
	You can use this for any lang-id with 2 chars: lang-chars: charset [#"a" - #"z"] lang-rule: [ thru "language%3a" copy language 2 lang-chars to end ]
	I would not use dehexas with dehex you parse the data twiceso it must be slower.
	You can use this if there may be url-encoded data and not encoded as well: lang-chars: charset [#"a" - #"z"] lang-rule: [ thru "language" ["%3a" \| "="] copy language 2 lang-chars to end ]
	btw. you can also use: parse "language=xx" lang-rule
	Davide.. in R2: list: ['for \| 'while \| 'show] parse [for each while end] [any [list \| set w word! (probe w)]] In R3 it should be better I think.
Chris 4-Dec-2008 [3304]	dehex, agreed -- until the speed trade off becomes worth it. Which may be if you are wanting to get more than just the language from the string.
Brock 4-Dec-2008 [3305x2]	Oldes, thanks for ripping into that with so many options. To your first response - of course, didn't notice that. Your last response is not obvious to me what that does, I'll need to look at that more.
Brock 4-Dec-2008 [3305x2]	Dehex isn't an issue for me really. I am only taking a very small percentage of records. So in the big picture, it's not a significant slow-down. The process this is attached to runs daily on a group of text files totalling less than 10 MB in size.
Jerry 6-Dec-2008 [3307]	I was pasing something. I got this: ** Script Error: Internal limit reached I still don't know what's wrong. Anybody?
sqlab 6-Dec-2008 [3308]	Too many recursions. Maybe the rule is too complex or you get an infinte loop. Just show your rule and the problem. For sure someone will help.
Davide 6-Dec-2008 [3309]	Thanks Oldes, I've tried your hint, but it made my parse rule too complex, so I've used an additional word as discriminant. Hope that R3 will improve this.
Jerry 7-Dec-2008 [3310]	sqlab, your are right. There is an infinite loop. I am fixing it. It's a C++ parser, so the rule is very complicated.
Maxim 24-Dec-2008 [3311x3]	paul asked: "Question for you regarding parse. How do you force parse to return false immediately without processing the rest of the string or block one it evaluates a paren?"
	example: parse/all s [some ["12345" here: (print "*" here: tail here) :here skip]] basically, in the paren, you assing the tail of the data being parsed, and force parse to move to it, then try going beyond.... the skip makes it return false, otherwise it returns true.
	oops example forgot string to parse... hehehe >> s: "12345 12345 12345" == "12345 12345 12345" >> parse/all s [some ["12345" here: (print "" here: tail here) :here]] == true >> parse/all s [some ["12345" here: (print "" here: tail here) :here skip]] == false
[unknown: 5] 24-Dec-2008 [3314x2]	Not exactly what I'm talking about.
[unknown: 5] 24-Dec-2008 [3314x2]	Well maybe...
Steeve 24-Dec-2008 [3316]	parse/all s [some ["12345" (print "*") end skip] is that not enough ?
[unknown: 5] 24-Dec-2008 [3317]	Well the skip is going to produce false unless you know how much to skip
Maxim 24-Dec-2008 [3318x3]	the here: tail here: is just the code you assign IF something you want to check programatically matched.
	now the question, paul, is do you want the WHOLE parse to fail, or just that rule? up to the paren?
	or do you want to do, basically, look ahead assertion?
BrianH 24-Dec-2008 [3321x2]	The skip will produce false because it is attempting to skip past the end.
BrianH 24-Dec-2008 [3321x2]	The phrase [end skip] will always fail and return false.
[unknown: 5] 24-Dec-2008 [3323x2]	What I'm talking about is this: words: [this that blah] result: parse [this that bleh blah][some [set w word! (unless find words w [do something to return false here)]]
[unknown: 5] 24-Dec-2008 [3323x2]	result should evaluate to false in my example
BrianH 24-Dec-2008 [3325]	For now, you want a continuation variable. Like this: result: parse [this that bleh blah][some [set w word! (cont: unless find words w [[end skip]]) cont]]
Steeve 24-Dec-2008 [3326x2]	continue: none ;means continue is ok stop: [end skip] result: parse [this that bleh blah][some [set w word! (unless find words w [continue: stop]) continue]]
Steeve 24-Dec-2008 [3326x2]	ahah, same idea
[unknown: 5] 24-Dec-2008 [3328x2]	Brian, I thought I tried that.
[unknown: 5] 24-Dec-2008 [3328x2]	checking that one hold on.
Maxim 24-Dec-2008 [3330x2]	I was working on another idea where only THAT rule failed !
Maxim 24-Dec-2008 [3330x2]	for a general failure, the above works for sure.
[unknown: 5] 24-Dec-2008 [3332]	yeah that will work Brian. i think I was setting the word that I had placed at the end and that is why it failed before.
older newer	first last