World: r3wp

Join the discussions in the REBOL3 world...

[Parse] Discussion of PARSE dialect

older newer	first last
BrianH 29-Dec-2008 [3419]	The test code won't be in the wiki.
PeterWood 29-Dec-2008 [3420]	That doesn't appear logical to me. In his blog Carl specifically stated that proposals without test ocde would not be considered. You are saying the opposite.
BrianH 29-Dec-2008 [3421x3]	He didn't say that to me, nor did he specify any format for the test cases in his initial version of the proposals wiki.
	We will have test cases once the test case syntax is specified.
	They won't go in the wiki though, at least not the main page. The page is too big already.
Janko 31-Jan-2009 [3424]	Hi, I need am asking for some help with parse again... are there any detailed docs with examples about parse?
Josh 31-Jan-2009 [3425]	One that I used when I was learning was Brett's http://www.codeconscious.com/rebol/parse-tutorial.html
Graham 31-Jan-2009 [3426x2]	Brett has lots of examples on parse
Graham 31-Jan-2009 [3426x2]	oops ... snap!
[unknown: 5] 31-Jan-2009 [3428]	http://www.rebol.com/docs/core23/rebolcore-15.html
Janko 31-Jan-2009 [3429x6]	aha, I remeber I learned a lot from that green page too.. thanks for links so far , I will read the pages and hopefully I will find something related to the problems I have
	thanks paul for your link too, I couldn't find that page on google ( I did the bret's one)
	the last problem I had and steeve and oldes propsoed solutions... I got steeve's one but I don't get what "complement charset" in olde's does.. >>str: "a.b.c.d!e?f. " chars: complement charset ".!?" >> parse str [any chars tmp: to end (uppercase tmp)] str == "a.B.C.D!E?F. "<<
	I think my problem is of this kind: http://www.mail-archive.com/[rebol-list-:-rebol-:-com]/msg16347.html
	or in terms of Brett's examples: == true >> a: copy "dog cat" parse a [ ANY [ thru "dog" (print 1) \| thru "cat" (print 2) ] ] 1 2 == true >> a: copy "cat dog" parse a [ ANY [ thru "dog" (print 1) \| thru "cat" (print 2) ] ] 1 == true
	basically similar problem that last time as I see now.. so by looking at that mailing list answers I have 2 solutions ... I use parse 3 times on a string.. or maybe I use Ladislav's parseen which he said solves this.. but I don't yet know how :)
[unknown: 5] 31-Jan-2009 [3435]	What do you want to accomplish?
Janko 31-Jan-2009 [3436]	=heading=
[unknown: 5] 31-Jan-2009 [3437]	is that your answer?
Janko 31-Jan-2009 [3438x6]	no .. I am writing example
	S WORKS IF IN THIS ORDER =heading= {comment some comment} - line 1 - line 2 -------------> <h1>heading</h1> <p>comment some comment</p> <li>line 1<li> <li>line 2</li> THIS DOESN'T WORK =heading= {comment some comment} =heading= - line 1 - line 2 =heading= {comment some comment} ADDITIONAL (SIMILAR) PROBLEM - line 1 + line 2 + line 3 - line 4 + line 5 -----------------> <li class="a">line 1</li> <li class="a">line 2</li> ...
	------------> this arrow means that I convert that to that
	basically it seems to me right now, PARSE is mega powerfull for anything that comes in somewhat PREDEFINED order, like dialects and many other things (I could do mulitple html extraction programs with it for some search project I was making without hitting this limitation - it was predefined order too).. but it seems to get limited at things that repeat/exchange themselves at random etc--
	ups my last example with lists was bad
	again ADDITIONAL (SIMILAR) PROBLEM - line 1 + line 2 + line 3 - line 4 + line 5 -----------------> <li class="minus">line 1</li> <li class="plus">line 2</li> <li class="plus">line 3</li> <li class="minus">line 4</li> ...
Oldes 31-Jan-2009 [3444x3]	Complement: >> c1: charset "1" == make bitset! 64#{AAAAAAAAAgAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA=} >> find c1 "1" == true >> find c1 "2" == none >> c2: complement c1 == make bitset! 64#{/////////f////////////////////////////////8=} >> find c2 "2" == true >> find c2 "1"
	== none
	>> ? complement USAGE: COMPLEMENT value DESCRIPTION: Returns the one's complement value. COMPLEMENT is an action value. ARGUMENTS: value -- (Type: logic number char tuple binary string bitset image) >> ? union USAGE: UNION set1 set2 /case /skip size DESCRIPTION: Creates a new set that is the union of the two arguments. UNION is a native value. ARGUMENTS: set1 -- first set (Type: series bitset) set2 -- second set (Type: series bitset) REFINEMENTS: /case -- Use case-sensitive comparison /skip -- Treat the series as records of fixed size size -- (Type: integer) >>
Janko 31-Jan-2009 [3447]	Oldes thanks, I have looked at docs of complement, but the fact is that I don't know the meaning of word itself "Returns the one's complement value." .. I imagine it returns everything except the values you give it, but that seems strange?
Oldes 31-Jan-2009 [3448x3]	convert-input: func[input [string!] /local stops rest opened-tags b e][ probe input space: charset " ^-" stops: charset "-+^/" rest: complement stops opened-li?: false parse/all input [ some [ () ;<-- to be able escape from the parse loop if there is any infinite loop b: #"^/" e: ( if opened-li? [ e: change/part b "</li>^/" 1 opened-li?: false ] ) :e b: [ #"-" any space e: ( e: change/part b {<li class="minus">} e opened-li?: true ) \| #"+" any space e: ( e: change/part b {<li class="plus">} e opened-li?: true ) ] :e \| to #"^/" \| end ] ] if opened-li? [ append input "</li>" ] input ] probe convert-input { - line 1 + line 2 + line 3 - line 4 + line 5}
	Now I see that the above example will require newline at start of the input. And that I'm not using the 'stops and 'rest at all:)
	but is you use something like: any rest it will give you any chars which are not defined in the 'stops charset
Janko 31-Jan-2009 [3451]	uh, that is some advanced parse :) .. I will need a couple of days to think it through
Oldes 31-Jan-2009 [3452]	this one is better: convert-input: func[input [string!] /local output space eol not-eol tmp][ probe input output: copy "" space: charset " ^-" eol: charset "^/^M" not-eol: complement eol li-rule: [ [ #"-" any space (append output {<li class="minus">}) \| #"+" any space (append output {<li class="plus">}) ] copy tmp any not-eol ( if tmp [append output join tmp "</li>"] ) ] parse/all input [ opt li-rule some [ () ;<-- to be able escape from the parse loop if there is any infinite loop copy tmp some eol (append output tmp) [ li-rule \| copy tmp some not-eol (if tmp [append output tmp]) \| end ] ] ] output ] probe convert-input {+ start - line 1 + line 2 + line 3 - line 4 + line 5 end}
Steeve 31-Jan-2009 [3453]	hmm... is that not enough ? convert: func [input /local out data get-line][ out: make string! length? input get-line: [copy data [thru newline \| to end]] parse/all input [ any [ end break \| #+" get-line (append out rejoin [{<li class="plus">} trim data "</li>"]) \| #"-" get-line (append out rejoin [{<li class="minus"} trim data "</li>"] \| get-line (append out data) ] ] out ]
Oldes 31-Jan-2009 [3454]	Yes.. if you don't want to teach Janko, how to use charsets with parse.
Steeve 31-Jan-2009 [3455]	even with charsets, don't use obfuscated parsing rules when it's not requested.
Brock 31-Jan-2009 [3456x2]	I'll try to explain complement. I like to think of a charset being a list of valid chars that can be tested for. However, say you need all characters of the alphabet minus a few. Instead of defining multiple ranges of characters as in charset "A-FH-K N-T V-Wa-z0-9" which effectively skips the chars G L & U, you could simply state complement[GLU], which would exclude these three characters from the charset but include all others.
Brock 31-Jan-2009 [3456x2]	If there's something more specific or a technically better way to state the above please ad your infput
PeterWood 1-Feb-2009 [3458x2]	Try http://en.wikipedia.org/wiki/Complement_(set_theory)
PeterWood 1-Feb-2009 [3458x2]	Thought the Rebol Help refers to the one's complement - http://en.wiktionary.org/wiki/one%27s_complement
Janko 1-Feb-2009 [3460x2]	Very interesting, both versions (Oldes and Steeve) , thanks a lot.. I think I understood most of it now
Janko 1-Feb-2009 [3460x2]	Thanks for explanation on complement, I understand it now
Tomc 1-Feb-2009 [3462]	complement on charsets is defining what is not in the set you want.
Oldes 1-Feb-2009 [3463]	Is there any better way how to change the main parse rules during parse like this one? (just a simple example..in real life the lexers would be more complicated :) d: charset "0123456789" lexer1: [copy x 1 skip (probe x if x = "." [lexer: lexer2]) \| end skip] lexer2: [copy x some d (probe x lexer: lexer1) \| end skip] lexer: lexer1 parse "abcd.123efgh" [ some [() lexer]]
Steeve 1-Feb-2009 [3464]	Not really Oldes... but what is your purpose ? isn't that a little obfuscated again You said it's just an example, but why can't you use the normal way ? I would like to know... parse "..." [ some [ #"." lexer2 \| lexer1 ] ]
Oldes 1-Feb-2009 [3465x3]	No... I mean the rules inside my real lexers (which decides that it's required to change the main rule) are more complicated.
	In the real life for example for syntax highlighting of complex HTML page with mixed CSS and JS (etc) with separate lexers for each language.
	I think that I must use stack to store the lexers. The above is not enough.
Maarten 2-Feb-2009 [3468]	This weekend I got an interesting idea: algebraic (and recursive) data types are well known for their ability to implement parsers. And they are a great data modeling tool. E.g: data Bill = Name BankAccount \| Company CreditCard data CreditCard = CVC2 CCNumber CCExpiryDate However, the opposite also holds, i.e you can model data domain using named parse rules without actions just as easy. Now, what if you would combine two dialects: one to define data structures and a separate one to attach actions. E.g. Post: [ message [string!] author [string!] timestamp [date!] ] Comments: [ some posts] blog [ 1 post comments] action 'JSON 'Post [ .... the action to convert the Post to JSON here ...] action 'XHTML 'POST [ ..... the action to convert Post to XHTML here...] process some-data 'JSON -> this gives back the data processed as for the JSON actions. It is a bit SAX like, with the difference that this models classes of action and separates them from the data in stead of scattering some lose actions. And, the data modeling still holds.
older newer	first last