r3wp [groups: 83 posts: 189283]
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

World: r3wp

[Parse] Discussion of PARSE dialect

BrianH
29-Jun-2006
[1064]
Reichart, I figured as much (hence the "dry" comment). I'll look 
over the Wikibook and see if I can help.
Volker
29-Jun-2006
[1065]
Your points are ok,only wanted to try somewhat shorter
BrianH
29-Jun-2006
[1066]
Volker, it still might be a good point that you can skip a step with 
parse, depending on the listener. Parse is more of a compiler-interpreter 
really. The real point I was making was about the lookahead.
Volker
29-Jun-2006
[1067]
I can plug in handcrafted parsers with some cocos too.
JaimeVargas
29-Jun-2006
[1068]
I agree brian parrse allows you to write interpreter easily. Regarding 
compilation I guess it does that too. But the problem is more difficult.
Volker
29-Jun-2006
[1069]
aah. a compiler-compiler produces sourcecode to be compiled, but 
you can interpret data with it.
BrianH
29-Jun-2006
[1070]
Most compiler-compilers have fixed lookahead. Bactracking is equivalent 
to unlimited lookahead.
Volker
29-Jun-2006
[1071]
i guess that depends on the coco. the point is, a bnf by default, 
and code inside therules, instead of putting things in vars andprocess 
later. IMHO.
BrianH
29-Jun-2006
[1072x2]
Jaimie, I meant that parse is itself an interpreter, not a compiler. 
It interprets compiler specs (or interpreter specs, etc.).
Volker, I've used a lot of compiler-compilers before and reviewed 
many more, and unlimited lookup or backtracking are rare.
JaimeVargas
29-Jun-2006
[1074]
Brian, In this you are right, is an parse is an interpreter that 
allows easy construction of other interpreter, which the emphasis 
on DSLs.
Volker
29-Jun-2006
[1075]
then the advantages of parse are beeing like a compiler-compiler 
and habving unlimited lookup etc?
BrianH
29-Jun-2006
[1076x3]
Yup :)
I'm not sure whether not having a seperate tokenizer is a plus or 
a minus, though.
I guess you could think of block parsing as using load as a tokenizer.
Volker
29-Jun-2006
[1079x2]
IMHO that would add overhead for the simple things.
and you can use parse to tokenize first?
BrianH
29-Jun-2006
[1081]
Two rounds of parsing, one for tokenizing and one to parse? Interesting. 
That would work if you don't have control over the source syntax 
- otherwise load works pretty well for simple languages.
Volker
29-Jun-2006
[1082]
Thats where i got the idea: tokenize first and use block-parser :)
BrianH
29-Jun-2006
[1083]
I've been using that approach for XML processing.
Volker
29-Jun-2006
[1084]
sounds good. if one finds a good tokenized representation. I am not 
an xml-guru :(
BrianH
29-Jun-2006
[1085x2]
My next personal project is to go through the XML/XSL/REST specs 
and create exactly that. I already have an efficient structure, I 
just need to fill out the semantics to support the complete logical 
model of XML.
I am also not an XML guru, but I will be by the time I'm done :)
Volker
29-Jun-2006
[1087]
After i read " go through the XML/XSL/REST specs" ithought soo. Beeing 
undecised ifiprefer to run away or participate curiously.
BrianH
29-Jun-2006
[1088x2]
Well, I know enough to know where to look to figure out the rest.
Still, "run away" is a common and sensible reaction to XML.
Volker
29-Jun-2006
[1090]
*nod*
BrianH
29-Jun-2006
[1091]
Later, I must run errands...
Volker
29-Jun-2006
[1092]
cu
Gordon
29-Jun-2006
[1093]
I'm a bit stuck because this parse stop after the first iteration. 
 Can anyone give me a hint as to why it stops after one line.

Here is some code:

data: read to-file Readfile

print length? data
224921


d: parse/all data [thru QuoteStr copy Note to QuoteStr thru QuoteStr 
thru quotestr

    copy Category to QuoteStr thru QuoteStr thru quotestr copy Flag to 
    QuoteStr
    thru newline (print index? data)]
1
== false


Data contains hundreds of "memos" in a csv file with three fields: 

 Memo, Category and Flag ("0"|"1")  all fileds are enclosed in quotes 
 and separated by commas.
  

It would be real simple if the Memo field didn't contain double quoted 
words; then 
parse data none
would even work; but alas many memos contain other "words".
It would even be simple if the memos didn't contain commas, then
parse data "," or parse/all data ","
would work; but alas many memos contain commas in the body.
JaimeVargas
29-Jun-2006
[1094]
Does every field is quoted?
MikeL
29-Jun-2006
[1095]
Gordon, can you post a copy of short lines of the data?
Izkata
29-Jun-2006
[1096]
if QuoteStr = "\"", then this looks like it to me:
Note
, "Category", "Flag" 
Note
, "Category", "Flag"

But you don't have a loop or anything - try this:
d: parse/all data [
   some [

      thru QuoteStr copy Note to QuoteStr thru QuoteStr thru quotestr

      copy Category to QuoteStr thru QuoteStr thru quotestr copy Flag to 
      QuoteStr
      thru newline (print index? data)
   ]
]
Gordon
29-Jun-2006
[1097]
James: Yes every field is quoted.

Izkata:  Sorry, I left that out.
QuoteStr: to-char 34
probe QuoteStr
==  #"^""
Izkata
29-Jun-2006
[1098]
hm, I was thinking in C++.... very unusual for me lol
Gordon
29-Jun-2006
[1099]
Do you need to loop?  I thought parse looped by itself
ie: data: parse data none
Izkata
29-Jun-2006
[1100x2]
not as far as I know
This change in the parse looks like it works:

>> data: {"Note", "Category", "Flag"
{    "Note", "Category", "Flag"
{    "Note", "Category", "Flag"
{    "Note", "Category", "Flag"
{    }
== {"Note", "Category", "Flag"
Note
, "Category", "Flag"
Note
, "Category", "Flag"
Note
, "Category", "Flag"
}
>> QuoteStr: to-char 34
== #"^""
>> d: parse/all data [
[    some [

[        X: thru QuoteStr copy Note to QuoteStr thru QuoteStr thru 
quotestr

[        copy Category to QuoteStr thru QuoteStr thru quotestr copy 
Flag to QuoteStr
[        thru newline (print index? :X)
[        ]
[    ]
1
29
57
85
== true
Gordon
29-Jun-2006
[1102x2]
Okay, trying it now.  I see that the phrase: "print index? data" 
stays stuck on "1".  


I see that you have posted a new example.  I'll try that.  Be right 
back.
I'm pretty sure that you are right in that I have to loop throught 
the "Data".  That was my big stumbling block and the rest is just 
logic to figure out.  Thanks a bunch.
Izkata
29-Jun-2006
[1104]
No problem  (I'm glad I could actually help  '^^ )
Gordon
29-Jun-2006
[1105x2]
In the phrase.  "Print index :x", what does putting a colon before 
a variable do again?
Oops I meant "Print index? :x"
Izkata
29-Jun-2006
[1107]
Not sure - I remember seeing it in others' parse rules, so I just 
put it there and it worked  '^^
Take it out and see what happens lol
Gordon
29-Jun-2006
[1108]
:)
Izkata
29-Jun-2006
[1109]
I think it was like get-word or something
BrianH
29-Jun-2006
[1110x3]
; Did you try this?
data: read/lines to-file Readfile
fields: [note category flag]
foreach x data [
    set fields parse x ","
    ; do something
]
In particular, remember not to use parse/all
>> parse {"Hello, World", "Blah"} ","
== ["Hello, World" "Blah"]
Gordon
29-Jun-2006
[1113]
Hi BrianH;

  Yes I did try that and the problem was that even though I specified 
  the "," as the delimiter, it came across an embedded quote #"^"" 
  and split the input at the quote.  Rebol Shouldn't have split it 
  up that way, to my understanding.  I will post some simple data to 
  test.