r3wp [groups: 83 posts: 189283]
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

World: r3wp

[Parse] Discussion of PARSE dialect

Fork
28-Dec-2009
[4751x3]
Hm.  Version:   2.100.96.2.5  I quit and restarted.
And it stopped doing that.  I'll see if I can get it to do it again.
Is a sequence of things one of the complex rules that you can't use 
in a thru?
BrianH
28-Dec-2009
[4754x2]
Yes. You can express a sequence of characters in a string as a string 
literal, but not a sequence of types in a block. You are going to 
need first sets and the other LL tricks for that.
Fortunately typesets work for block parsing like bitsets do for string 
parsing, so first sets are easy.
Fork
28-Dec-2009
[4756]
>> parse [a b c] [(value: none) copy value to 3 skip to end (probe 
value)] 
[a b]
== true


>> parse [a b c] [(value: none) copy value thru 3 skip to end (probe 
value)]
[a b]
== true
Pekr
28-Dec-2009
[4757]
brian - so we can use things like any-string! or other typesets to 
match?
Fork
28-Dec-2009
[4758]
Should the latter be [a b c] ?
Pekr
28-Dec-2009
[4759x3]
I would expect that ...
>> parse [a b c][?? 3 skip ??]
3: [a b c]
end!: []
== true
to/thru were reimplemented to allow multiple options. There are cases, 
where they are not supposed to work, but in above case I would regard 
it being a bug .... unless some guru finds a theory showing us why 
it should be regarded being a correct result :-)
BrianH
28-Dec-2009
[4762]
Fork, the fact that both of those examples work incorrectly instead 
of throwing an error is a bug in PARSE. It should be CureCoded.
Fork
28-Dec-2009
[4763x3]
FYI still seeing some erratic behavior with ?? at head of the parse 
rule
>> parse [a b c] [?? copy value thru 1 skip to end]              
            
co? : [a b c]
== true
(That question mark not visible in the terminal, showed up when I 
pasted here)
BrianH
28-Dec-2009
[4766]
Seems like a Unicode to ANSI translation error.
Fork
28-Dec-2009
[4767]
Indeterminate, e.g. just ran it again and:
BrianH
28-Dec-2009
[4768]
But no such characters should be output by ??
Fork
28-Dec-2009
[4769]
>> parse [a b c] [?? copy value thru 1 skip to end]              
            
coo:: [a b c]
== true
BrianH
28-Dec-2009
[4770]
Definitely another bug. CureCode it.
Fork
28-Dec-2009
[4771]
Well, I should find a way to reproduce it before doing that.  Left 
a note about how getting a CureCode account didn't work the other 
day.
kcollins
29-Dec-2009
[4772]
Fork, are you seeing these outputs "coo", "thte", etc. on a Linux 
build of R3? I have seen similar corrupted output with Linux R3 when 
testing TCP client code, as documented in Curecode #1322.
Ladislav
29-Dec-2009
[4773x3]
Regarding the QUOTE keyword: the original proposal was to treat blocks 
as in quote [1 2] as sequences of elements, not as embedded blocks, 
wouldn't you prefer that behaviour?
Re the THRU problem: you can use


    parse [1 2 3] [?? while [integer! block! accept | skip | reject] 
    ?? integer!]
I overlooked, that you used the STRING! datatype:


    parse [1 2 3] [?? while [integer! string! accept | skip | reject] 
    ?? integer!]
Fork
29-Dec-2009
[4776x2]
Ladislav: I didn't realize you could use "while" as the second argument 
to copy, I thought it only worked with to and thru...
kcollins: I'm using OS/X, I still haven't found a way to reproduce 
it.  Comes and goes.
Ladislav
29-Dec-2009
[4778x2]
COPY should accept any rule, not just the ones you mentioned
e.g. 

    parse [a b c] [?? copy value thru 1 skip to end]   

should have preferably been

    parse [a b c] [?? copy value 1 skip to end]
Pekr
30-Dec-2009
[4780]
What is the difference between BREAK and ACCEPT? Both "break" out 
of the rule, both with success (IMO).
Ladislav
30-Dec-2009
[4781]
Carl made a distinction in R3 blog, but they currently work the same, 
as far as I can tell, so, the only difference I see is, that ACCEPT 
is more self-explanatory.
Carl
31-Dec-2009
[4782x2]
Right: synonyms.
I'm still running into some problems with PARSE... mainly from the 
expectation of what ANY and SOME should do.

For example:
>> parse "" [any [copy tmp to end]]
>> tmp
== ""
Steeve
31-Dec-2009
[4784]
what do you expect in this case ?
Carl
31-Dec-2009
[4785x3]
In the rewrite of DECODE-CGI, that behavior of ANY forces me to write:

parse "" [any [end break | copy tmp to end]]


This seems wrong to me if we define ANY as a MATCHing function, not 
as a LOOP function. This topic has been debated a bit between a few 
of us, but I think it deserves more attention.
In other words, is ANY smart about the input?  If there is no input, 
why should it even try?


Of course, in the past we've used ANY a bit like WHILE -- as a LOOPing 
method, not really as a MATCHing method.
It's a small thing, and maybe too late to change. I wanted to point 
it out.
Steeve
31-Dec-2009
[4788x2]
We have so much alternatives that i don't see this as a burden
any [and skip copy tmp to end]
any [copy tmp [skip to end]]
etc...
Carl
31-Dec-2009
[4790]
There are a few ways to do it, but that is not my point.
Steeve
31-Dec-2009
[4791]
I see your point, but what if the ANY block contains production rules 
?

parse "" [any [and skip copy tmp to end break | insert "1" and insert 
"2"]]

(i know, stupid example)
Gregg
31-Dec-2009
[4792x2]
We have some cool new parse enhancements; really, really nice some 
of them. What I think will add the most value to PARSE--and maybe 
this is just me--are practical examples, idioms, and best practices.
For example


- Parsing an input that has nested structures, and how to collect 
the values you want.
- Showing the user where the parse failed.
- How to avoid infinite parse loops.
- How to safely modify the input stream.

More advanced examples would be great too of course.
Pekr
1-Jan-2010
[4794]
Carl - first "error" in parse rewrite with some/any is the auto protection 
for non advancing input. It is like writting in BASIC

10 Print "Hello"
20 goto 10


... and not expecting it to run forever, because some magical internal 
mechanism kicks-in. If I write the code which could cause infinite 
loop, then be it. For me it causes the opposite reaction - some/any 
are not safe to use, let us use while instead ....


something like: parse str [some [to "abc"]] is so obvious and self 
explanatory, that actually not looping forever almost feels like 
parse error. But - even if I don't like it, maybe most such infinite 
loop hits are more difficult to notice, so that actually the prevention 
might be ok, I don't know. As for me though, I would probably prefer 
some internal capability to detect such case, and some debug option 
to show last rule/position, where it happens ...


I am not fluent enough with parse theory, but maybe it also relates 
to your loop vs matching note above ...
BrianH
6-Jan-2010
[4795x5]
BenBran:
Not sure where to put this so asking here:


I downloaded a web script and it has a  snippet I don't understand:
buffer: make string! 1024         ;; contains the browser request
file: "index.html"
parse buffer ["get" ["http"  |   "/ "  |  copy file to " " ]]

what does:

copy file to " "

mean or do?
tia
The copy and to are parse operations. COPY copies the data covered 
by the next operation, the TO. TO covers the data from the current 
parse position until the first instance it can find of its argument.
So, copy file to " " is the equivalent of this regular REBOL code:
file: if find data " " [copy/part data find data " "]
Sort of. The actual code is a little more complex, more like this:

either tmp: find data " " [file: if 0 < offset? data tmp [copy/part 
data tmp]] [break]
The break being a parse match fail, and file being set to none for 
a zero-length match.
BenBran
6-Jan-2010
[4800]
I get whats happening now.  If i compare buffer and file I see the 
clipped text:

>> probe file
== "index.html"

>> probe buffer
{GET /a.html HTTP/1.1
Host: localhost

User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US) AppleWebKit/531.21.8 
(KHTML, like Gecko) Version/4.0.4 Safar
i/531.21.10

Accept: application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5
Accept-Language: en-US
Accept-Encoding: gzip, deflate
Connection: keep-alive
Address: 127.0.0.1}

>>probe parse buffer ["get" ["http" | "/ " | copy file to " "]]
== false

>> probe file
== "/a.html"
 
Should I have been able to see the results instead of  == false?