r3wp [groups: 83 posts: 189283]
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

World: r3wp

[Parse] Discussion of PARSE dialect

BrianH
1-Dec-2010
[5383]
If it is not restored automatically on failure, backtracking and 
alternation, then that is a problem that needs a ticket submitted 
for it.
Steeve
1-Dec-2010
[5384x2]
Agreed, It would be a nice improvment
but it may slow down parse, no ?
BrianH
1-Dec-2010
[5386]
Not much, just one more pointer assignment at alternation.
Steeve
1-Dec-2010
[5387]
Ladislav, are you  lost in translation ?
Or are you crying :-)
BrianH
1-Dec-2010
[5388x4]
It fails. Here is the test code that I will put in the ticket:
>> a: "a" b: "b" parse a [:b "b" (print true) fail | "a"]
true
== false  ; should be true
>> a: "a" b: "b" parse a [:b "b" (print true) fail | "b"]
true
== true  ; should be false
So, half of the request succeeds: You can set the position to another 
series. I wonder if you can change series types from string to block.
Yup, you can.
It is not a simple problem though, as not only would you have to 
add a series reference to the fallback state but you would need to 
make those series references visible to the garbage collector so 
they won't be freed; backtracking to a freed series would be bad.
Steeve
1-Dec-2010
[5392x2]
parse is freeing is own allocated ressources currenlty, what would 
that be a problem to pursue ?
*why would that be...
BrianH
1-Dec-2010
[5394]
What if someone runs RECYCLE in a paren? It would need to know what 
to not collect.
Steeve
1-Dec-2010
[5395]
I mean, Parse must use a sort of stack to keep the backtracking references. 
The series will not be freed until parse destroy his stack
BrianH
1-Dec-2010
[5396]
Right now it is a stack of integers (position) and a single pointer 
(series reference). To do this it would need to be a stack of series 
references too, and the collector would need to be informed of its 
exdistence so it could scan it for references.
Steeve
1-Dec-2010
[5397]
That's why I said previously, it may slown down the whole process.
BrianH
1-Dec-2010
[5398]
Yup. The ticket needs to be made either way. If it is rejected it 
will serve as documentation of the issue.
Ladislav
1-Dec-2010
[5399x2]
It can cause problems with backtracking though
 - actually, it can't, as can be demonstrated easily
(when implemented properly, of course)
BrianH
1-Dec-2010
[5401]
Submitted as #1787, with the "when implemented properly" workarounds 
that Ladislav was mentioning. Note: Just because there is a solution 
to a problem doesn't make it not a problem - it just makes it a problem 
that can be solved.
Ladislav
1-Dec-2010
[5402]
aha, so, now the get-words can set parse to a different series (INTO 
does that as well!), but, what is restored, is just the index, not 
the series... (except for the return from INTO, when the series is 
restored as well
BrianH
1-Dec-2010
[5403x2]
Yup. A half-solution, but we have workarounds for the other half 
:)
One interesting thing is that you can switch from string to block 
parsing and back mid-rule using series switching :)
Ladislav
1-Dec-2010
[5405]
Well, since it has been solved for INTO, it should suffice to use 
the already existing INTO solution
BrianH
1-Dec-2010
[5406x2]
Yup, that would be preferred. And please mention that in a ticket 
comment to #1787 :)
Otherwise I will mention this in a comment and attribute the idea 
to you :)
Ladislav
1-Dec-2010
[5408]
so, Oldes, you should try this, which should be the exact equivalent 
of your rule, except for the fact, that it does not call Parse recursively:

some [
    thru {<h2><a} thru ">" copy name to {<}
    ; copy the DOC
    copy doc to {^/ </div>}
	; remember the DOC-END
	doc-end:
	; switch to DOC parsing
	:doc
    thru {<pre class="code">} copy code to {</pre} (
        probe name
        probe code
	)
    any [
        thru {<h5>} copy arg to {<}
        thru {<ol><p>} copy arg-desc to {</p></ol>}
        (printf ["  * " 10 " - "] reduce [arg arg-desc])
    ]
    ; switch to original input
    :doc-end
]
BrianH
1-Dec-2010
[5409]
Thanks, Ladislav :)
Steeve
2-Dec-2010
[5410]
Submitted as #1787, with the 

when implemented properly" workarounds that Ladislav was mentioning. 
Note: Just because there is a solution to a problem doesn't make 
it not a problem - it just makes it a problem that can be solved."


Geez, I'm not a Sissy ,But I pointed the workaround  from the beginning. 
Sometimes I just have the weird feeling I'm not trusted enough.
Sorry, I stop the whinning now :-)
Ladislav
2-Dec-2010
[5411x4]
Yes, Steeve, I know, that this has been discussed a while ago. Nevertheless, 
it is worth the effort to have it in a comment to the ticket.
(does not matter much to me who puts it in, though)
I just wanted to make sure to point at INTO, since it is already 
implemented, and working fine.
(and doing the same thing, at least in principle)
BrianH
2-Dec-2010
[5415x3]
Yes, and a good point it was too.
Steeve, I'm sure that the reason it was so easy for me to come up 
with workarounds off the top of my head on a weak-brain day was because 
I had seen them before when you pointed them out and didn't remember 
it directly. In any case, I'm sure your stuff was great.
The ticket was for documentation purposes, as well as a request. 
It was to summarize the conversation from before.
Oldes
2-Dec-2010
[5418x2]
Steeve, Ladislav... sorry, but your version is not working. The main 
SOME rule finds only one match and than stops. Maybe I should give 
you a simple test string so you could test it first.
hm.. it works on simple test, don't know why it stops for my real 
data.
Ladislav
2-Dec-2010
[5420]
that is interesting, (my version differs from Steeve's), but should 
be as similar to your version as possible
Oldes
2-Dec-2010
[5421x2]
My simplified test is:
parse test: {[{1}{2}][{3}{4}][]} [
	some [
	    thru {[} 
	    ; copy the DOC
	    copy doc to {]}
		; remember the DOC-END
		doc-end:
		; switch to DOC parsing
		:doc
		(print "start")
		any [
			thru "{" copy n to "}" (probe n)
		]
	    ; switch to original input
	    (print "end")
	    :doc-end
	]
]
that's working as expected.
I understand the principe, but as I say, on real file it stops.
Ladislav
2-Dec-2010
[5423x3]
may be a Parse bug, e.g.
so, it is worth testing
What does "stops" mean, BTW?
Oldes
2-Dec-2010
[5426]
you can test real data as well:)

print "loading data"
data: read/string http://www.imagemagick.org/api/magick-image.php
ask "parsing version 1"
parse/all data [
	some [
	    thru {<h2><a} thru ">" copy name to {<}
	    doc-start: to {^/ </div>} doc-end: :doc-start
		[
	        thru {<pre class="code">} copy code to {</pre} (
	            probe name
	            probe code
	        )
	        any [
	            thru {<h5>}
				here: if (lesser? index? here index? doc-end)
				copy arg to {<}
	            thru {<ol><p>} copy arg-desc to {</p></ol>}
	            (printf ["  * " 10 " - "] reduce [arg arg-desc])
	        ] 
	    ]
	    :doc-end
	]
]
ask "parsing version 2"
parse/all data [
	some [
	    thru {<h2><a} thru ">" copy name to {<}
	    ; copy the DOC
	    copy doc to {^/ </div>}
		; remember the DOC-END
		doc-end:
		; switch to DOC parsing
		:doc
	    thru {<pre class="code">} copy code to {</pre} (
	        probe name
	        probe code
		)
	    any [
	        thru {<h5>} copy arg to {<}
	        thru {<ol><p>} copy arg-desc to {</p></ol>}
	        (printf ["  * " 10 " - "] reduce [arg arg-desc])
	    ]
	    ; switch to original input
	    :doc-end
	]
]
Ladislav
2-Dec-2010
[5427x2]
aha, it needs to be written this way:

parse/all data [
	some [
	    thru {<h2><a} thru ">" copy name to {<}
	    ; copy the DOC
	    copy doc to {^/ </div>}
		; remember the DOC-END
		doc-end:
		; switch to DOC parsing
		; we need OPT to be able switch back
		:doc opt [
	        thru {<pre class="code">} copy code to {</pre} (
	            probe name
	            probe code
		    )
		    any [
		        thru {<h5>} copy arg to {<}
		        thru {<ol><p>} copy arg-desc to {</p></ol>}
		        (printf ["  * " 10 " - "] reduce [arg arg-desc])
		    ]
		]
	    ; switch to original input
	    :doc-end
	]
]
since OPT was needed, it is provable, that the "inner parse" fails 
sometimes, which does not look desirable, and may provoke your attention, 
Oldes
Oldes
2-Dec-2010
[5429x2]
I've got it.. there is missing <pre> in the third doc so that's why 
Steeve's version fails.
(I wonder what they use to document the ImageMagick project.. it 
does not look like fully automated documentation. There are also 
some typos in the spec names.)
Steeve
14-Jan-2011
[5431]
I'm working on an incremental lexer able to perform line-by-line 
analysis of any plain text documents.

the idea is to allow editing without having to reparse all the document.

The syntactical rules will be regular parse rules easy to understand 
and to modify, to facilitate
the creation of different model of document.

Of course, the first target is a rebol parser, but the make-doc format 
is also in my short range.

If anyone already have deep thoughts about the subject, please share 
your opinions.
I will come with a proto soon enough.
BrianH
14-Jan-2011
[5432]
Will the REBOL parser be using the R3 incremental parser, TRANSCODE 
?