r3wp [groups: 83 posts: 189283]
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

World: r3wp

[Parse] Discussion of PARSE dialect

Gabriele
29-Sep-2006
[1453x2]
graham, iirc my version is meant to handle embedded quotes when properly 
escaped, i.e. you should have "123 ""C"" AVENUE" there for it to 
work.
i actually wonder why are quotes used in that line. they are only 
needed if the field contains the separator.
Graham
29-Sep-2006
[1455]
split will work if there are no embedded commas I guess
Anton
3-Oct-2006
[1456]
What's the parse rule to go backwards ?
	-1 skip  ?
Oldes
3-Oct-2006
[1457x2]
maybe this will help:

x: [1 2 3 4 5] parse x [any [x: set d number! (probe x probe d x: 
next x) :x]]
you can set the x to another position if you need
Anton
3-Oct-2006
[1459]
Ah yes - very good :)
Maxim
3-Oct-2006
[1460x3]
my god, I think I finally  -get-  Parse... call me the village idiot. 
 I used to use parse, now I also understand subconciously it  ;-)
that should read "... I also understand  it subconciously"
(parse rule inversion ;-)
Izkata
3-Oct-2006
[1463]
That's a ~very~ good example, Oldes... it should be put in the docs 
somewhere (if it isn't already.)  I didn't understand how get-words 
and set-words worked in parse, either, before..
Volker
3-Oct-2006
[1464]
Nice demo of parse-position main features :)
Rebolek
4-Oct-2006
[1465]
I've got following PARSE problem:


I've got string - "<good tag><bad tag><other tag><good tag>" and 
I want to keep "good tag" and "<>" in other tags change to let's 
say "X" (I need to change it to HTML entities but that doesn't matter 
now). So result will look like: "<good tag>Xbad tagXXother tagX<good 
tag>"


I'm working on it for last few hours but still not found sollution. 
Is there any?
Anton
4-Oct-2006
[1466]
string: "<good tag><bad tag><other tag><good tag>"
entity: "<ENTITY>"
parse/all string [
	any [
		to "<" start: skip
		to ">" end: skip 
		(if not find copy/part start end "good tag" [
			change/part start entity 1

   ; fix up END (for when your entity is other than a 1-character long 
   string)
			end: skip end (length? entity) - 1
			change/part end entity 1
			; fix up END again
			end: skip end (length? entity) - 1
		])
		:end skip
	]
	to end
]
string

;== {<good tag><ENTITY>bad tag<ENTITY><ENTITY>other tag<ENTITY><good 
tag>}
Rebolek
4-Oct-2006
[1467x3]
Anton nice thanks. But I also need it to work on this: string: "<good 
tag><bad tag> 3 > 5 <other tag><good tag with something inside>". 
I almost got it, but that non-symmetric "3 > 5" is still problem 
for me.
I'll probable replace everything and then just revert the "good tag" 
back. It's not very elegant, but...
(hm, 3 > 5. my examples are not very 'real-life' :-))
Anton
4-Oct-2006
[1470]
Such unmatched tags cause a headache for any parser.
Rebolek
4-Oct-2006
[1471]
YES
Anton
4-Oct-2006
[1472x2]
What are the HTML entities by the way ?
&lt;, and &gt;  ?
BrianH
4-Oct-2006
[1474]
Yes.
Rebolek
4-Oct-2006
[1475]
Anton: yes. I have to check lot of XML files full of errors (actually 
it's Vista documentation, so it's understandable...)
Anton
4-Oct-2006
[1476x3]
Ok, give this a burl.
string: "<good tag><bad tag> 3 > 5 <other tag><good tag with something 
inside>"

string: " > >> < <<good tag><bad tag> 3 > 5 <other tag><good tag 
etc> >> > "

; (1) search for end tags >, they are erroneous so replace them

; (2) search for start tags <, if there is more than one, replace 
all except the last one

; (3) search for end tag >, check tag body and replace if necessary

entity: "&entity;"
ntag: complement charset "<>" ; non tag
parse/all result: copy string [
	any [
		; (1)
		any [
			any ntag start: ">" end: (

    change/part start entity 1 end: skip start length? entity  ;print 
    [1 index? start]
			) 
			:end
		]
	
		; (2)
		(start: none stop?: none)
		any [
			any ntag start: "<" end:   ;(print [2 mold start])
			any ntag "<" (  ;print "found a second start tag"

    change/part start entity 1 end: skip start length? entity  ;(print 
    [2.1 mold copy/part start end]) 
				start: none
			) :end
		]
		(if none? start [stop?: 'break]) stop?
		
		; ok, we found at least one start tag
		;(print ["OK we found at least one start tag" mold start])
		:start skip
		
		; (3)
		any ntag end: ">"   ;(print [3 mold copy/part start end])
		(if not find copy/part start end "good tag" [
			;print ["found a bad tag" mold copy/part start end]
			change/part start entity 1

   ; fix up END (for when your entity is other than a 1-character long 
   string)
			end: skip end (length? entity) - 1
			change/part end entity 1
			; fix up END again
			end: skip end (length? entity) - 1
		])
		:end skip
	]
	to end
]
result
All you need to do now is define two separate entity strings for 
< and >  and then use the right one when replacing.
Rebolek
4-Oct-2006
[1479]
great, I'll test it, thanks
Anton
4-Oct-2006
[1480x2]
Holy ---- ! where did two and a half hours go ?
oh no.. maybe I only spent one and a half hours on it, but still...!
Rebolek
4-Oct-2006
[1482]
Erhm sorry ;)
Anton
4-Oct-2006
[1483]
Ahh don't worry about that.
Ladislav
4-Oct-2006
[1484x2]
this looks like an alternative:
result: ""
parse/all string [
	any [
		; starting good tag
		copy s ["<good tag" thru ">"] (append result s) |
		; ending good tag
		"</good tag>" (append result "</good tag>") |
		; entity replacement
		"<" (append result "&lt;") | ">" (append result "&gt;") |
		copy s skip (append result s)
	]
]
print result
Volker
4-Oct-2006
[1486]
In this case you may also look at load/markup ;)
Tomc
4-Oct-2006
[1487]
what Volker said.


s: "<good tag><bad tag> 3 > 5 <other tag><good tag with something 
inside>"
b: load/markup s
while [not tail? b][
	either tag? first b
		[ either find/match first b "good tag"
			[print first b]
			[print rejoin["X" to string! first b "X"]]
		]
		[print first b]
	b: next b
]
Oldes
5-Oct-2006
[1488x3]
I think there is some limit in load/markup - I would not used it 
for large data
And Rebolek, you can use this my code to remove unwanted tags (It's 
already here - posted a few days befere - but with a little bug - 
this should be OK as I'm using it)

remove-tags: func[html /except allowed-tags /local new x tag name 
tagchars][
	if not string? html [return html]
	new: make string! length? html
	tagchars: charset [#"a" - #"z" #"A" - #"Z"]
	parse/all html [
		any [
			copy x to {<} copy tag thru {>}  (
				if not none? x [insert tail new x]
				if all [
					except
					parse/all tag ["<" opt #"/" copy name some tagchars to end]
					find allowed-tags name
				][	insert tail new tag ]
			)
		]
		copy x to end (if not none? x [insert tail new x])
	]
	new
]
I'm thinking about to improve it to be able remove unwanted tag attributes 
as well
Rebolek
5-Oct-2006
[1491x2]
Thanks to everybody, I used Ladislav's example, as it is easily extendible 
to support more HTML entities than just "<" and ">"
Oldes I'm not removing any tags, I'm just 'translating' unwanted 
tags to html-entities
Oldes
5-Oct-2006
[1493x3]
It's up to you how you moddify it, do what you need:-)
And if you are converting html-entities, you can find useful this 
http://oldes.multimedia.cz/rebol/html-entities_latest.rip
Do you think it would be possible to make BNF to PARSE RULES converter?
Rebolek
5-Oct-2006
[1496]
Don't know. Is 'parse Turing-complete? :)
Oldes
5-Oct-2006
[1497x4]
With such a converter we should theoretically be able to easily parse 
any language
http://www.garshol.priv.no/download/text/bnf.html
...There are actually lots of programs that can be given (E)BNF grammars 
as input and automatically produce code for parsers for the given 
grammar. In fact, this is the most common way to produce a compiler: 
by using a so-called compiler-compiler that takes a grammar as input 
and produces parser code in some programming language....
It looks like interesting project for long winter evenings:-)
Anton
5-Oct-2006
[1501x2]
Well, I just spent two days making a matching algorithm for searching 
file contents, and I was considering making a "compile-rules" function 
(possibly similar to Gabriele or someone else's). Looks like I don't 
have to make that for now, but my mind is in this place at the moment. 
I long for the day when I don't have to use filesystems at all (which 
obviates the need for file search programs) - hopefully we can stick 
all our info in a database soon. Probably an associative database.
While on this topic - Was it Gregg or Sunanda who made a mini dialect 
for a file contents matcher ? That's the algorithm I just made, and 
I'm now interested to review other implementations. While developing 
I also came to an apparent cross-roads, a choice between a simple, 
"digital", logical algorithm or a more "fuzzy" algorithm with a ranking 
system like Google. This reminded me of a discussion a while back 
where this point was made.