r3wp [groups: 83 posts: 189283]
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

World: r3wp

[Parse] Discussion of PARSE dialect

Oldes
1-Feb-2009
[3466x2]
In the real life for example for syntax highlighting of complex HTML 
page with mixed CSS and JS (etc) with separate lexers for each language.
I think that I must use stack to store the lexers. The above is not 
enough.
Maarten
2-Feb-2009
[3468x3]
This weekend I got an interesting idea: algebraic (and recursive) 
data types are well known for their ability to implement parsers. 
And they are a great data modeling tool.

E.g: 

data Bill = Name BankAccount | 
                   Company CreditCard

data CreditCard = CVC2 CCNumber CCExpiryDate 


However, the opposite also holds, i.e you can model data domain using 
named parse rules without actions just as easy. Now, what if you 
would combine two dialects: one to define data structures and a separate 
one to attach actions. 

E.g.

Post: [ message [string!] author [string!] timestamp [date!] ]
Comments: [ some posts]
blog [ 1 post comments]


action 'JSON 'Post [  .... the action to convert the Post to JSON 
here ...]

action 'XHTML 'POST [ ..... the action to convert Post to XHTML here...]

process some-data 'JSON

-> this gives back the data processed as for the JSON actions. It 
is a bit SAX like, with the difference that this models classes of 
action and separates them from the data in stead of scattering some 
lose actions. And, the data modeling still holds.
To sum it all up: "dynamic (pluggable) parse actions"
Then make actions for data to go to JSON, XML, XHTML, back and forth 
to a database,....
[unknown: 5]
2-Feb-2009
[3471]
It's  great idea Maarten.
Maarten
2-Feb-2009
[3472]
Yes, it could make dialects fly.
Chris
2-Feb-2009
[3473]
Trying to understand: given the above, you could do? -

	>> process ["Post" "me" 2/2/09 "Comment" "you" 2/2/09] 'JSON
	== {... some JSON ...}
	>> process [1 2 3 4] 'JSON
	== none


Also, how would the data be available to the action code?  Like this? 
--

	action 'REBOL 'Post [
		mold compose [what (message) who (author) when (timestamp)]
	]
Oldes
2-Feb-2009
[3474]
I really like REBOL when I'm able to do things like:
c1: context [
	n: 1

 lexer: [copy x 1 skip (prin reform ["in context:" n "=> "] probe 
 x if x = "." [root-lexer: c2/lexer]) | end skip]
]
c2: context [
	n: 2
	d: charset "0123456789"

 lexer: [copy x some d (prin reform ["in context:" n"=> "] probe x 
 root-lexer: c1/lexer) | end skip] 
]
root-lexer: c1/lexer
parse "abcd.123efgh" [ some [() root-lexer]]
Maarten
3-Feb-2009
[3475x2]
Chris:

1) Yes, actually, that would be yhe idea

2) I think the data dialect would be a strict subset of parse, forcing 
you to use set-word/parse-rule pairs Hence, the set-words are available 
in the action.
e.g.:

post: [ message: [string!] timestamp: [date!] ]  would make message 
and timestamp magically available in the action
Graham
9-Feb-2009
[3477x2]
For those of Scottish descent, does this work for you?


fix-scots: func [ result /local rule][

    rule: [ thru " Mc" mark: skip ( uppercase/part skip result -1 + index? 
    mark 1) ]
    parse result [ some rule ]
    result
]
Or, are there some other funny capitalization rules I need to do?
Chris
9-Feb-2009
[3479]
Mc and Mac.
Steeve
9-Feb-2009
[3480]
uh !? what are those skips ???
Graham
9-Feb-2009
[3481x3]
But I see Macdonald ... and not MacDonald ..
skip not necessary ..
how do you decided whether it's MacDonald or Macdonald??
Steeve
9-Feb-2009
[3484]
indeed :)
Chris
9-Feb-2009
[3485x2]
I'd say MacDonald, but I'm not one, so don't know.
One side of my family have the convenient Ross, the other dropped 
the Mc to leave Gill (back in time somewhere)
Graham
9-Feb-2009
[3487x3]
They call it a big Mac not a big Mc ... odd
when it's McDonalds
I guess they're being inclusive
Chris
9-Feb-2009
[3490]
As far as I'm aware, Mc and Mac are interchangeable.
Graham
9-Feb-2009
[3491x2]
In legal documents? Interesting.
I'm grabbing my phone book ....
BrianH
9-Feb-2009
[3493]
My family switched away from the Scottish spelling too, back in the 
19th century when that branch came to the US.
Chris
9-Feb-2009
[3494]
Didn't say that, just usage.
BrianH
9-Feb-2009
[3495]
Each family picks one spelling and sticks with it nowadays, mostly 
because of those legal documents.
Graham
9-Feb-2009
[3496x2]
Yep, my phone book has the Macleans between the Mcleans
so the alphabetical ordering system they're using treats mc and mac 
the same
Chris
9-Feb-2009
[3498]
B: from what name?
BrianH
9-Feb-2009
[3499x2]
Phone book sorting - that's really complex :(
Halle
Chris
9-Feb-2009
[3501]
Sounds nordic...
BrianH
9-Feb-2009
[3502x2]
To Hawley, the English spelling. To reduce prejudice in the US.
It's old Celtic.
Graham
9-Feb-2009
[3504x2]
Apple MacIntosh ??
I think I'll skip Macs
Chris
9-Feb-2009
[3506]
As opposed to MacKintosh.
Steeve
9-Feb-2009
[3507]
you can't guess, you need the list of all clans :)
Janko
14-Feb-2009
[3508x4]
hi, it's me again with parse problems...  I need this concretely 
to parse out web-page meta tags.. but I distilled the problem out 
of it to a minimal example..
doc1: "start A 1 end start B 2 end"  how can you get value of  2 
out
It works with a because it's first , but becasuse it enters the "parse" 
with it and then doesn't match it doesn't again test the B 

>> parse doc1 [ "start" "A" copy R to "end" (print R) to end ]
 1
== true
>> parse doc1 [ "start" "B" copy R to "end" (print R) to end ]
== false
I thought it will recheck if I put it into something like SOME [ 
] but it doesn't 


parse doc1 [ SOME [ "start" "B" copy R to "end" (print R) to end 
] ]
kib2
14-Feb-2009
[3512]
Maybe ? parse/all doc1 [ thru "B" copy number to "end" (print number) 
]
But I'm beginning with parse, so I'm not an expert
Janko
14-Feb-2009
[3513x2]
This would work in this case but I need to get "2" only if sequence 
before it is exactly previous two "start" "B" XX "end" ...  there 
can be "B" in other places of the string and it musn't take that 
(I am used on using thru and to too but I musn't use them in this 
case for this reason as it might just skip to some "B"
>> doc1: "start A 1 end xyz B 2 end" ;; in this case it must not 
take 2
== "start A 1 end xyz B 2 end"

>> parse doc1 [ "start" thru "B" copy R to "end" (print R) to end 
] ;; but it will that's why I can't u
se to\thru
 2
== true
Anton
14-Feb-2009
[3515]
some ["start" ["A" | "B"] copy R to "end" "end"]