r3wp [groups: 83 posts: 189283]
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

World: r3wp

[Parse] Discussion of PARSE dialect

Janko
31-Jan-2009
[3451]
uh, that is some advanced parse :) .. I will need a couple of days 
to think it through
Oldes
31-Jan-2009
[3452]
this one is better:

convert-input: func[input [string!] /local output space eol not-eol 
tmp][
	probe input
	output: copy ""
	
	space: charset " ^-"
	eol:   charset "^/^M"
	not-eol: complement eol

	li-rule: [
		[
			#"-" any space (append output {<li class="minus">})
			|
			#"+" any space (append output {<li class="plus">})
		]
		copy tmp any not-eol (
			if tmp [append output join tmp "</li>"]
		)
	]

	parse/all input [
		opt li-rule
		some [

   () ;<-- to be able escape from the parse loop if there is any infinite 
   loop
			copy tmp some eol (append output tmp)
			[
				li-rule
				|
				copy tmp some not-eol (if tmp [append output tmp])
				| end
			]
		]
	]
	output
]

probe convert-input {+ start
- line 1
+ line 2
+ line 3

- line 4
+ line 5
end}
Steeve
31-Jan-2009
[3453]
hmm... is that not enough ?

convert: func [input /local out data get-line][
	out: make string! length? input 
	get-line: [copy data [thru newline | to end]]
	parse/all input [
		any [
		  end break 

  | #+" get-line (append out rejoin [{<li class="plus">} trim data 
  "</li>"]) 

  | #"-" get-line (append out rejoin [{<li class="minus"} trim data 
  "</li>"]
		| get-line  (append out data) 
		]
	]
	out
]
Oldes
31-Jan-2009
[3454]
Yes.. if you don't want to teach Janko, how to use charsets with 
parse.
Steeve
31-Jan-2009
[3455]
even with charsets, don't use obfuscated parsing rules when it's 
not requested.
Brock
31-Jan-2009
[3456x2]
I'll try to explain complement.  I like to think of a charset being 
a list of valid chars that can be tested for.  However, say you need 
all characters of the alphabet minus a few.  Instead of defining 
multiple ranges of characters as in charset "A-FH-K N-T V-Wa-z0-9" 
which effectively skips the chars G L & U, you could simply state 
complement[GLU], which would exclude these three characters from 
the charset but include all others.
If there's something more specific or a technically better way to 
state the above please ad your infput
PeterWood
1-Feb-2009
[3458x2]
Try http://en.wikipedia.org/wiki/Complement_(set_theory)
Thought the Rebol Help refers to the one's complement  - http://en.wiktionary.org/wiki/one%27s_complement
Janko
1-Feb-2009
[3460x2]
Very interesting, both versions (Oldes and Steeve) , thanks a lot.. 
I think I understood most of it now
Thanks for explanation on complement, I understand it now
Tomc
1-Feb-2009
[3462]
complement on charsets is defining what is not in the set you want.
Oldes
1-Feb-2009
[3463]
Is there any better way how to change the main parse rules during 
parse like this one? (just a simple example..in real life the lexers 
would be more complicated :)  
d: charset "0123456789"

lexer1: [copy x 1 skip (probe x if x = "." [lexer: lexer2]) | end 
skip]
lexer2: [copy x some d (probe x lexer: lexer1) | end skip] 
lexer: lexer1
parse "abcd.123efgh" [ some [() lexer]]
Steeve
1-Feb-2009
[3464]
Not really Oldes... but what is your purpose ? isn't that a little 
obfuscated again 

You said it's just an example, but why can't you use the normal way 
? I would like to know...
 
parse "..." [
   some [
	#"." lexer2
            | lexer1
  ]
]
Oldes
1-Feb-2009
[3465x3]
No... I mean the rules inside my real lexers (which decides that 
it's required to change the main rule) are more complicated.
In the real life for example for syntax highlighting of complex HTML 
page with mixed CSS and JS (etc) with separate lexers for each language.
I think that I must use stack to store the lexers. The above is not 
enough.
Maarten
2-Feb-2009
[3468x3]
This weekend I got an interesting idea: algebraic (and recursive) 
data types are well known for their ability to implement parsers. 
And they are a great data modeling tool.

E.g: 

data Bill = Name BankAccount | 
                   Company CreditCard

data CreditCard = CVC2 CCNumber CCExpiryDate 


However, the opposite also holds, i.e you can model data domain using 
named parse rules without actions just as easy. Now, what if you 
would combine two dialects: one to define data structures and a separate 
one to attach actions. 

E.g.

Post: [ message [string!] author [string!] timestamp [date!] ]
Comments: [ some posts]
blog [ 1 post comments]


action 'JSON 'Post [  .... the action to convert the Post to JSON 
here ...]

action 'XHTML 'POST [ ..... the action to convert Post to XHTML here...]

process some-data 'JSON

-> this gives back the data processed as for the JSON actions. It 
is a bit SAX like, with the difference that this models classes of 
action and separates them from the data in stead of scattering some 
lose actions. And, the data modeling still holds.
To sum it all up: "dynamic (pluggable) parse actions"
Then make actions for data to go to JSON, XML, XHTML, back and forth 
to a database,....
[unknown: 5]
2-Feb-2009
[3471]
It's  great idea Maarten.
Maarten
2-Feb-2009
[3472]
Yes, it could make dialects fly.
Chris
2-Feb-2009
[3473]
Trying to understand: given the above, you could do? -

	>> process ["Post" "me" 2/2/09 "Comment" "you" 2/2/09] 'JSON
	== {... some JSON ...}
	>> process [1 2 3 4] 'JSON
	== none


Also, how would the data be available to the action code?  Like this? 
--

	action 'REBOL 'Post [
		mold compose [what (message) who (author) when (timestamp)]
	]
Oldes
2-Feb-2009
[3474]
I really like REBOL when I'm able to do things like:
c1: context [
	n: 1

 lexer: [copy x 1 skip (prin reform ["in context:" n "=> "] probe 
 x if x = "." [root-lexer: c2/lexer]) | end skip]
]
c2: context [
	n: 2
	d: charset "0123456789"

 lexer: [copy x some d (prin reform ["in context:" n"=> "] probe x 
 root-lexer: c1/lexer) | end skip] 
]
root-lexer: c1/lexer
parse "abcd.123efgh" [ some [() root-lexer]]
Maarten
3-Feb-2009
[3475x2]
Chris:

1) Yes, actually, that would be yhe idea

2) I think the data dialect would be a strict subset of parse, forcing 
you to use set-word/parse-rule pairs Hence, the set-words are available 
in the action.
e.g.:

post: [ message: [string!] timestamp: [date!] ]  would make message 
and timestamp magically available in the action
Graham
9-Feb-2009
[3477x2]
For those of Scottish descent, does this work for you?


fix-scots: func [ result /local rule][

    rule: [ thru " Mc" mark: skip ( uppercase/part skip result -1 + index? 
    mark 1) ]
    parse result [ some rule ]
    result
]
Or, are there some other funny capitalization rules I need to do?
Chris
9-Feb-2009
[3479]
Mc and Mac.
Steeve
9-Feb-2009
[3480]
uh !? what are those skips ???
Graham
9-Feb-2009
[3481x3]
But I see Macdonald ... and not MacDonald ..
skip not necessary ..
how do you decided whether it's MacDonald or Macdonald??
Steeve
9-Feb-2009
[3484]
indeed :)
Chris
9-Feb-2009
[3485x2]
I'd say MacDonald, but I'm not one, so don't know.
One side of my family have the convenient Ross, the other dropped 
the Mc to leave Gill (back in time somewhere)
Graham
9-Feb-2009
[3487x3]
They call it a big Mac not a big Mc ... odd
when it's McDonalds
I guess they're being inclusive
Chris
9-Feb-2009
[3490]
As far as I'm aware, Mc and Mac are interchangeable.
Graham
9-Feb-2009
[3491x2]
In legal documents? Interesting.
I'm grabbing my phone book ....
BrianH
9-Feb-2009
[3493]
My family switched away from the Scottish spelling too, back in the 
19th century when that branch came to the US.
Chris
9-Feb-2009
[3494]
Didn't say that, just usage.
BrianH
9-Feb-2009
[3495]
Each family picks one spelling and sticks with it nowadays, mostly 
because of those legal documents.
Graham
9-Feb-2009
[3496x2]
Yep, my phone book has the Macleans between the Mcleans
so the alphabetical ordering system they're using treats mc and mac 
the same
Chris
9-Feb-2009
[3498]
B: from what name?
BrianH
9-Feb-2009
[3499x2]
Phone book sorting - that's really complex :(
Halle