r3wp [groups: 83 posts: 189283]
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

World: r3wp

[Parse] Discussion of PARSE dialect

Oldes
25-Sep-2006
[1420]
I would like to know if string based parsing witch would handle all 
curent rebol datatypes can be faster or same fast as block parsing
Geomol
25-Sep-2006
[1421]
Gabriele, yes it works with strings. But I have words! Thing is, 
I parse the string input from the user and produce words in an internal 
format. Then I parse those words for the final output, which can 
be different formats. I would expect parse/case to be case-sensitive, 
when parsing words, but parse/case is only for strings, therefore 
my suggestion.
Gabriele
25-Sep-2006
[1422]
what i'd suggest is - if case is important, don't make them into 
words :)
Geomol
25-Sep-2006
[1423]
:D But it makes so much sense to work with words.
Gabriele
26-Sep-2006
[1424]
sure, but you can only have 8k or them (unless you make sure they 
never end up in system/words), so if you also counted case...
Maxim
26-Sep-2006
[1425]
another way to counter the word limit is to use #issue datatype.
Oldes
26-Sep-2006
[1426x2]
And there is some parse example how to deal with recursions while 
parsing strings? If you parse block, it's easy detect, what is string! 
and what is other type, but if you need to parse string, it's not 
so easy to detect for example strings like {some text {other "text"}}
(it should be a question - is there such a example?)
Rebolek
26-Sep-2006
[1428x2]
Words should be non-case sensitive, but is it always the case? I've 
found this today accidentaly:

>> a: [small Small]
== [small Small]
>> find/case a to word! "small"
== [small Small]
>> find/case a to word! "Small"
== [Small]
so /case with words works, at least in 'find
Oldes
26-Sep-2006
[1430]
if it's working in find, it should be working on parse as well
Gabriele
26-Sep-2006
[1431]
well... case insensitivity for words is done via automatic aliasing 
of words that differ in case only. (i know this because we found 
a bug related to this :)
Rebolek
26-Sep-2006
[1432]
so internally, words are case-sensitive?
Ladislav
26-Sep-2006
[1433]
yes
Anton
27-Sep-2006
[1434]
Here's an idea to toss into the mix:

I am thinking of a new notation for strings using underscore (eg. 
 _"hello"_  ) in a parse block, which allows to specify whether they 
are delimited by whitespace or not. This would allow you to enable/disable 
the necessity for delimiters per-string. eg:

parse input [

 _"house"_   ; a complete word surrounded both sides by whitespace

 _"hous"   ;  this would match "house", "housing", "housed" or even 
 "housopoly" etc.. but left side must be whitespace

 "ad"_ ; this would match "ad", "fad", "glad" and right side must 
 be whitespace
]

But this would need string datatype to change.

On the other hand, I could just set underscore _ to a charset of 
whitespace, then use that with parse/all eg:

	_: charset " ^-^/"

parse/all input [
	[ _ "house" _ ]
]


though that wouldn't be as comfortable. Maybe I can create parse 
rules from a simpler dialect which understands the underscore _.
Just an idea...
MikeL
27-Sep-2006
[1435]
Anton, Andrew had defined white space patterns in his patterns.r 
script which seems usable then you can use [ ws* "house" ws*] or 
other combinations as needed without underscore.  Andrew's solution 
for this and a lot of other things have given me some good mileage 
over the past few years.   WS*: [some WS]   and WS?: [any WS].   
 It makes for clean parse scripts clear once you adopt it.
Gregg
27-Sep-2006
[1436]
I think either approach above can work well. I like the "look" of 
the underscore, and have done similar things with standard function 
names. For SOME, ANY, and OPT, the tag chars I prefer are +, *, and 
? resepctively; which are EBNF standard.
Anton
27-Sep-2006
[1437x2]
Oh yes, I've seen Andrew's patterns.r. I was just musing how to make 
it more concise without even using a short word like WS.  Actually 
the use case which sparked this idea was more of a "regex-level" 
pattern matcher, just a simple pattern matcher where the user writes 
the pattern to match filenames and to match strings appearing in 
file contents.
Gregg, + * ? could be a good idea. I'll throw that into my mix-bowl.
Gregg
28-Sep-2006
[1439]
I also have a naming convention I've been playing with for a while, 
where parse rule words have an "=" at the end (e.g. date=) and parse 
variables--values set during the parse process--have it at the beginning 
(e.g. =date). The idea is that it's sort of a cross between BNF syntax 
for production rules and set-word/get-word syntax; the goal being 
to easily distinguish parse-related words. By using the same word 
for a rule and an associated variable, with the equal sign at the 
head or tail, respectively, it also makes it easier to keep track 
of what gets set where, when you have a lot of rules.
Maxim
28-Sep-2006
[1440x3]
simple and clean, good idea!
I'm just starting to be able to actually USE parse for dialecting. 
 So far I've been almost solely using it to replace regexp functionality.
so many years of reboling (since core 1.2) , and still parse remains 
largely untaimed by myself.
Graham
29-Sep-2006
[1443x9]
This was I thought a simple task .. to parse a csv file....
COHEN

,"WILLIAM   ",""," 305782","123 "C" AVENUE","CORONADO ","CA","92118","560456788","(619)555-2730","( 
  )   -   0","08/22/1927","M","SHARP CORONADO/MISSI","","","","","POLLICK","JAMES 
    ","","MOUNTAIN","RODERICK  ","",
this seems to be a difficult line as there is an embedded quote viz 
"123 "c" Avenue"
this is Gabriele's published parser 


CSV-parser: make object! [ line-rule: [field any [separator field]] 
field: [[quoted-string | string] (insert tail fields any [f-val copy 
""])] string: [copy f-val any str-char] quoted-string: [{"} copy 
f-val any qstr-char {"} (replace/all f-val {""} {"})] str-char: none 
qstr-char: [{""} | separator | str-char] fields: [] f-val: none separator: 
#";" set 'parse-csv-line func [ "Parses a CSV line (returns a block 
of strings)" line [string!] /with sep [char!] "The separator between 
fields" ] [ clear fields separator: any [sep #";"] str-char: complement 
charset join {"} separator parse/all line line-rule copy fields ] 
]
which was written to cope with embedded quotes, but fails where there 
is an empty field eg , "" ,
This is Joel Neely's from the same day ...

readcsv: make object! [

	all-records: copy []
	one-record:  copy []
	one-segment: copy ""
	one-field:   copy ""

	noncomma:    complement charset ","
	nonquote:    complement charset {"}

	segment: [
		copy one-segment any nonquote
		(if found? one-segment [append one-field one-segment])
	]

	quoted: [
		{"} (one-field: copy "")
		segment
		any [{""} (append one-field {"}) segment]
		{"}
	]

	unquoted: [copy one-field any noncomma]
	field:    [[quoted | unquoted] (append one-record one-field)]
	record:   [field any ["," field]]

	run: func [f [file!] /local line] [
		all-records: copy []
		foreach line read/lines f [
			one-record: copy []
			either parse/all line record [
				append/only all-records one-record
			][
				print ["parse failed:" line]
			]
		]
		all-records
	]
]
which reports an error with this line.
this might fix Gabriele's parser ..

CSV-parser: make object! [
	line-rule: [field any [separator field]]

 field: [[quoted-string | string] (insert tail fields any [f-val copy 
 ""])]
	string: [copy f-val any str-char] 

 quoted-string: [{"} copy f-val any qstr-char {"} (if found? f-val 
 [ replace/all f-val {""} {"}])]
	str-char: none qstr-char: [{""} | separator | str-char]
	fields: []
	f-val: none
	separator: #";" set 'parse-csv-line func [
		"Parses a CSV line (returns a block of strings)"
		line [string!]
		/with sep [char!] "The separator between fields"
	] [
		clear fields
		separator: any [sep #";"]

  str-char: complement charset join {"} separator parse/all line line-rule 
  copy fields
	]
]
perhaps not.
sqlab
29-Sep-2006
[1452]
Why you do not use split?
Gabriele
29-Sep-2006
[1453x2]
graham, iirc my version is meant to handle embedded quotes when properly 
escaped, i.e. you should have "123 ""C"" AVENUE" there for it to 
work.
i actually wonder why are quotes used in that line. they are only 
needed if the field contains the separator.
Graham
29-Sep-2006
[1455]
split will work if there are no embedded commas I guess
Anton
3-Oct-2006
[1456]
What's the parse rule to go backwards ?
	-1 skip  ?
Oldes
3-Oct-2006
[1457x2]
maybe this will help:

x: [1 2 3 4 5] parse x [any [x: set d number! (probe x probe d x: 
next x) :x]]
you can set the x to another position if you need
Anton
3-Oct-2006
[1459]
Ah yes - very good :)
Maxim
3-Oct-2006
[1460x3]
my god, I think I finally  -get-  Parse... call me the village idiot. 
 I used to use parse, now I also understand subconciously it  ;-)
that should read "... I also understand  it subconciously"
(parse rule inversion ;-)
Izkata
3-Oct-2006
[1463]
That's a ~very~ good example, Oldes... it should be put in the docs 
somewhere (if it isn't already.)  I didn't understand how get-words 
and set-words worked in parse, either, before..
Volker
3-Oct-2006
[1464]
Nice demo of parse-position main features :)
Rebolek
4-Oct-2006
[1465]
I've got following PARSE problem:


I've got string - "<good tag><bad tag><other tag><good tag>" and 
I want to keep "good tag" and "<>" in other tags change to let's 
say "X" (I need to change it to HTML entities but that doesn't matter 
now). So result will look like: "<good tag>Xbad tagXXother tagX<good 
tag>"


I'm working on it for last few hours but still not found sollution. 
Is there any?
Anton
4-Oct-2006
[1466]
string: "<good tag><bad tag><other tag><good tag>"
entity: "<ENTITY>"
parse/all string [
	any [
		to "<" start: skip
		to ">" end: skip 
		(if not find copy/part start end "good tag" [
			change/part start entity 1

   ; fix up END (for when your entity is other than a 1-character long 
   string)
			end: skip end (length? entity) - 1
			change/part end entity 1
			; fix up END again
			end: skip end (length? entity) - 1
		])
		:end skip
	]
	to end
]
string

;== {<good tag><ENTITY>bad tag<ENTITY><ENTITY>other tag<ENTITY><good 
tag>}
Rebolek
4-Oct-2006
[1467x3]
Anton nice thanks. But I also need it to work on this: string: "<good 
tag><bad tag> 3 > 5 <other tag><good tag with something inside>". 
I almost got it, but that non-symmetric "3 > 5" is still problem 
for me.
I'll probable replace everything and then just revert the "good tag" 
back. It's not very elegant, but...
(hm, 3 > 5. my examples are not very 'real-life' :-))