World: r3wp
[Parse] Discussion of PARSE dialect
older newer | first last |
Steeve 1-Feb-2009 [3464] | Not really Oldes... but what is your purpose ? isn't that a little obfuscated again You said it's just an example, but why can't you use the normal way ? I would like to know... parse "..." [ some [ #"." lexer2 | lexer1 ] ] |
Oldes 1-Feb-2009 [3465x3] | No... I mean the rules inside my real lexers (which decides that it's required to change the main rule) are more complicated. |
In the real life for example for syntax highlighting of complex HTML page with mixed CSS and JS (etc) with separate lexers for each language. | |
I think that I must use stack to store the lexers. The above is not enough. | |
Maarten 2-Feb-2009 [3468x3] | This weekend I got an interesting idea: algebraic (and recursive) data types are well known for their ability to implement parsers. And they are a great data modeling tool. E.g: data Bill = Name BankAccount | Company CreditCard data CreditCard = CVC2 CCNumber CCExpiryDate However, the opposite also holds, i.e you can model data domain using named parse rules without actions just as easy. Now, what if you would combine two dialects: one to define data structures and a separate one to attach actions. E.g. Post: [ message [string!] author [string!] timestamp [date!] ] Comments: [ some posts] blog [ 1 post comments] action 'JSON 'Post [ .... the action to convert the Post to JSON here ...] action 'XHTML 'POST [ ..... the action to convert Post to XHTML here...] process some-data 'JSON -> this gives back the data processed as for the JSON actions. It is a bit SAX like, with the difference that this models classes of action and separates them from the data in stead of scattering some lose actions. And, the data modeling still holds. |
To sum it all up: "dynamic (pluggable) parse actions" | |
Then make actions for data to go to JSON, XML, XHTML, back and forth to a database,.... | |
[unknown: 5] 2-Feb-2009 [3471] | It's great idea Maarten. |
Maarten 2-Feb-2009 [3472] | Yes, it could make dialects fly. |
Chris 2-Feb-2009 [3473] | Trying to understand: given the above, you could do? - >> process ["Post" "me" 2/2/09 "Comment" "you" 2/2/09] 'JSON == {... some JSON ...} >> process [1 2 3 4] 'JSON == none Also, how would the data be available to the action code? Like this? -- action 'REBOL 'Post [ mold compose [what (message) who (author) when (timestamp)] ] |
Oldes 2-Feb-2009 [3474] | I really like REBOL when I'm able to do things like: c1: context [ n: 1 lexer: [copy x 1 skip (prin reform ["in context:" n "=> "] probe x if x = "." [root-lexer: c2/lexer]) | end skip] ] c2: context [ n: 2 d: charset "0123456789" lexer: [copy x some d (prin reform ["in context:" n"=> "] probe x root-lexer: c1/lexer) | end skip] ] root-lexer: c1/lexer parse "abcd.123efgh" [ some [() root-lexer]] |
Maarten 3-Feb-2009 [3475x2] | Chris: 1) Yes, actually, that would be yhe idea 2) I think the data dialect would be a strict subset of parse, forcing you to use set-word/parse-rule pairs Hence, the set-words are available in the action. |
e.g.: post: [ message: [string!] timestamp: [date!] ] would make message and timestamp magically available in the action | |
Graham 9-Feb-2009 [3477x2] | For those of Scottish descent, does this work for you? fix-scots: func [ result /local rule][ rule: [ thru " Mc" mark: skip ( uppercase/part skip result -1 + index? mark 1) ] parse result [ some rule ] result ] |
Or, are there some other funny capitalization rules I need to do? | |
Chris 9-Feb-2009 [3479] | Mc and Mac. |
Steeve 9-Feb-2009 [3480] | uh !? what are those skips ??? |
Graham 9-Feb-2009 [3481x3] | But I see Macdonald ... and not MacDonald .. |
skip not necessary .. | |
how do you decided whether it's MacDonald or Macdonald?? | |
Steeve 9-Feb-2009 [3484] | indeed :) |
Chris 9-Feb-2009 [3485x2] | I'd say MacDonald, but I'm not one, so don't know. |
One side of my family have the convenient Ross, the other dropped the Mc to leave Gill (back in time somewhere) | |
Graham 9-Feb-2009 [3487x3] | They call it a big Mac not a big Mc ... odd |
when it's McDonalds | |
I guess they're being inclusive | |
Chris 9-Feb-2009 [3490] | As far as I'm aware, Mc and Mac are interchangeable. |
Graham 9-Feb-2009 [3491x2] | In legal documents? Interesting. |
I'm grabbing my phone book .... | |
BrianH 9-Feb-2009 [3493] | My family switched away from the Scottish spelling too, back in the 19th century when that branch came to the US. |
Chris 9-Feb-2009 [3494] | Didn't say that, just usage. |
BrianH 9-Feb-2009 [3495] | Each family picks one spelling and sticks with it nowadays, mostly because of those legal documents. |
Graham 9-Feb-2009 [3496x2] | Yep, my phone book has the Macleans between the Mcleans |
so the alphabetical ordering system they're using treats mc and mac the same | |
Chris 9-Feb-2009 [3498] | B: from what name? |
BrianH 9-Feb-2009 [3499x2] | Phone book sorting - that's really complex :( |
Halle | |
Chris 9-Feb-2009 [3501] | Sounds nordic... |
BrianH 9-Feb-2009 [3502x2] | To Hawley, the English spelling. To reduce prejudice in the US. |
It's old Celtic. | |
Graham 9-Feb-2009 [3504x2] | Apple MacIntosh ?? |
I think I'll skip Macs | |
Chris 9-Feb-2009 [3506] | As opposed to MacKintosh. |
Steeve 9-Feb-2009 [3507] | you can't guess, you need the list of all clans :) |
Janko 14-Feb-2009 [3508x4] | hi, it's me again with parse problems... I need this concretely to parse out web-page meta tags.. but I distilled the problem out of it to a minimal example.. |
doc1: "start A 1 end start B 2 end" how can you get value of 2 out | |
It works with a because it's first , but becasuse it enters the "parse" with it and then doesn't match it doesn't again test the B >> parse doc1 [ "start" "A" copy R to "end" (print R) to end ] 1 == true >> parse doc1 [ "start" "B" copy R to "end" (print R) to end ] == false | |
I thought it will recheck if I put it into something like SOME [ ] but it doesn't parse doc1 [ SOME [ "start" "B" copy R to "end" (print R) to end ] ] | |
kib2 14-Feb-2009 [3512] | Maybe ? parse/all doc1 [ thru "B" copy number to "end" (print number) ] But I'm beginning with parse, so I'm not an expert |
Janko 14-Feb-2009 [3513] | This would work in this case but I need to get "2" only if sequence before it is exactly previous two "start" "B" XX "end" ... there can be "B" in other places of the string and it musn't take that (I am used on using thru and to too but I musn't use them in this case for this reason as it might just skip to some "B" |
older newer | first last |