r3wp [groups: 83 posts: 189283]
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

World: r3wp

[Parse] Discussion of PARSE dialect

Henrik
25-Nov-2006
[1533]
>> parse/case "AAABBBaaaAAA" "A"
== ["" "" "" "BBBaaa" "" ""]
>> parse/case "BAAABBBaaaAAA" "A"
== ["B" "" "" "BBBaaa" "" ""]
>> parse/case "BA" "A"
== ["B"]

hmmm...
Ladislav
25-Nov-2006
[1534]
it's OK, because every A means one closing #"^"". The first A was 
used to close the "...a" string
Anton
25-Nov-2006
[1535]
Yep, makes sense to me.
Ingo
26-Nov-2006
[1536]
This may make it easier for some, just exchange the "A"s for "," 
and mentally read it like you would read a csv file:

>> parse/case ",,,BBBaaaBBB,,,aaa" ","
== ["" "" "" "BBBaaaBBB" "" "" "aaa"]
Anton
26-Nov-2006
[1537]
It's like cutting a piece of wood. You only cut twice but you end 
up with three pieces.
Maxim
26-Nov-2006
[1538]
but parse does have an inconsistency:
>> parse/all "/1/2/3/" "/"
==  ["" "1" "2" "3"]

>> parse/all "/1/2/3" "/"
== ["" "1" "2" "3"]


two different strings on entry, the same output.  IMHO the first 
example shoul have an extra trailing ""  in the block.
Anton
26-Nov-2006
[1539]
Is that an inconsistency or are we just not sure what the definition 
of the separator string is ?
Maxim
26-Nov-2006
[1540]
huh? not sure get what you mean... how can the above be desired? 
 it mangles symmetricity of data and tokenizing?  for example it 
strips end / of a dir...
Anton
26-Nov-2006
[1541]
I'm with you, but what is the documented definition of the parse 
separator ?
Maxim
27-Nov-2006
[1542]
the function's doc string doesn't even mention it !  its a special 
mode ...   in the dict it says:


There is also a simple parse mode that does not require rules, but 
takes a string of characters to use for splitting up the input string.

so not very explicit.
Anton
27-Nov-2006
[1543x2]
That's pretty much how I remember it.
So the problem might be that we don't know how it's supposed to work. 
Maybe the implementor wasn't too clear how it should work either. 
From memory there was an "inconsistent case" which actually had a 
use - for something like splitting command-line args. But anyway, 
a clearer definition would be good.
Maxim
27-Nov-2006
[1545]
at least the above oddity should be documented, cause one can get 
bitten until encountering the above... in my case, it renders the 
above almost useless, as I cannot trust the output.
Gabriele
27-Nov-2006
[1546]
that parse mode was intended to make parsing CSV easier. may not 
work with all the CSV variants though.
Maxim
27-Nov-2006
[1547]
do you agree that the docs are misleading in their current form?
Gabriele
27-Nov-2006
[1548]
they are at least incomplete.
Anton
27-Nov-2006
[1549]
Better to have a simple and consistent core and enable particular 
modes for specific uses with refinements.
Pekr
5-Dec-2006
[1550x2]
I would like to ask - could there be anything done to produce parsers 
for XML related MLs? Or do you guys find existing parse facilities 
strong enough, and simply put XML is too complex, that we lack full 
XML spec parser?
Just asking, because today I read a bit about ODF and OpenXML (two 
document formats for office apps). There is probably open space for 
small apps, parsing some info from inside the documents etc. (meta-data 
programming) ... just curious ... or will it be better to wait for 
full-spec XML MLs libs, doing the job given, and link to those libraries?
BrianH
5-Dec-2006
[1552]
Such a thing has been on my todo list for a while, but I've been 
a little busy lately with non-REBOL projects :(
Gregg
5-Dec-2006
[1553]
I don't want to deal with XML beyond simple well-formed XML, too 
complex. I don't, personally, have any interest in doing generic 
XML toolkit stuff at this point. I can see value in it for some people, 
but I'd rather write REBOL dialects. :-)
Maxim
8-Dec-2006
[1554x2]
geomol's xml2rebxml handles XML pretty well.  one might want to change 
the parse rules a little to adapt the output, but it actually loads 
all the xml tags, empty tags and attributes.  it even handles utf-8, 
CDATA chunks, and converts some of the & chars.
I am using an adapted form of it commercially so far.  I have implemented 
full schema validation and loading (in rebol) but its proprietary 
code I can't release.  So guys, it can be done !
Allen
10-Dec-2006
[1556]
I'm starting to see some abandonment of XML in favour of JSON .. 
mainly in web 2.0 .  but it will not replace xml where validation 
 is required.
BrianH
11-Dec-2006
[1557]
You really have to trust your source when using JSON to a browser 
though. Standard usage is to load with eval - only safe to use on 
https sites because of script injection.
[unknown: 9]
11-Dec-2006
[1558]
XML and JSON sucks...
Maxim
11-Dec-2006
[1559]
is there a way to make block parsing case sensitive?

this doesn't seem to work:
parse/case [A a] [some ['A (print "upper") | 'a (print "lower")]]
Gabriele
11-Dec-2006
[1560x2]
words are not case sensitive.
>> strict-equal? 'A 'a
== true
Maxim
11-Dec-2006
[1562x3]
I was just hoping case could have been an exception... it would be 
very usefull especially when parsing code from other languages...
(I meant using /case within parse)
well, seems like I'll be doing string parsing then  :-)
Gabriele
11-Dec-2006
[1565x3]
you could take advantage of this bug:
>> alias 'a "aa"
== aa
>> strict-equal? 'A 'a
== false
but it will be fixed eventually :P
Maxim
11-Dec-2006
[1568x2]
hehe... I would not want the bug to get too comfortable,  less it 
becomes a feature  ;-)
you know what they say...  "features are bugs with experience"
Josh
11-Dec-2006
[1570x2]
I don't know
Whoops
Joe
24-Dec-2006
[1572x4]
s: 		"str"
s2:		"str 1^/ str 2 ^/ str 3"


rules:	[
		any [
			end break
			| copy value [to "^/" | to end]		(print value)
		]
		]
	

parse		s rules
print		"---"
parse		s2 rules
i run the above on core 2.6 and it loops forever . This was a bug 
fixed in 2.3 but it looks like the bug still exists
sorry, not a bug. I was inspired by the example in the changes page 
and it is missing the  thru "^/" after the to "^/"
parse item [
    any [
        "word" (print "got word")
        | copy value [to "abc" | to end]
            (print value) break
    ]
]
Gabriele
25-Dec-2006
[1576x2]
not a bug - you are not skipping the newline, so to "^/" will always 
match. you are not getting to the end.
>> rules: [
[    any [
[        end break
[        |
[        copy value [to newline | to end] (print value) opt skip
[        ]
[    ]
== [
    any [
        end break
        |
        copy value [to newline | to end] (print value) opt skip
    ]
]
>> parse s2 rules
str 1
str 2
str 3
== true
Joe
25-Dec-2006
[1578x2]
yes, thanks gabriele - happy holidays ! i find the opt skip not very 
intuitive !
wouldn't  to newline thru newline be easier to understand than opt 
skip
Volker
25-Dec-2006
[1580]
could be opt newline
Gabriele
26-Dec-2006
[1581x2]
joe, if you don't care about parse returning true you can just use 
skip (without opt, which is there for the end case)
also, if you don't care about your value having the newline in it, 
you can just replace to newline with thru newline.