r3wp [groups: 83 posts: 189283]
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

World: r3wp

[Parse] Discussion of PARSE dialect

Anton
13-Jul-2009
[4019]
The above problem reduces to:

	spacer: charset " "
	parse/all " " [to spacer]
	
	** Script Error: Invalid argument: make bitset! #{
	0000000001000000000000000000000000000000000000000000000000000000
	}
	** Near: parse/all " " [to spacer]

The reason is Rebol2 parse does not allow "to subrule".
(Pointed out by sqlab, thanks.)
	
Here's a way to do it using COMPLEMENT (suggested by Graham):
	
	spacer: charset " ^-^/"  ; Space, tab, newline.

 non-spacer: complement spacer ; All chars except the three above.
	whatever: [some non-spacer]
	spaces: [some spacer]
	rule: ["a" spaces copy varb whatever spaces "c"]
	parse/all "a b c" rule ;== true
BrianH
13-Jul-2009
[4020]
Anton, this sounds like that question asked on stackoverflow.com, 
linked earlier here in this group.
Anton
13-Jul-2009
[4021x2]
You're right. Some guy has been cross-posting the same question.
Oh, and I see you gave a good reply in the stackoverflow.com site.
PatrickP61
17-Jul-2009
[4023]
Hi All,  I'm new to PARSE, so I've come here to learn a little more. 
 I'm working on and off on a little testing project of my own for 
R3.

My goal is to navigate through some website(s), capture Rebol code, 
and the expeceted responses such as this page: 
http://rebol.com/r3/docs/functions/try.html

I'd like to capture the text inside a block like this:
[ "cmd" {if error? try [1 + "x"] [print "Did not work."]}
rsp
   {Did not work.} 
cmd
  {if error? try [load "$10,20,30"] [print "No good"]}
rsp
  {No good}]


Can anyone point me to some parse example code which can "tear apart" 
an HTTP page based on text and the type of text?

I realize I may be biting off a bit more than I can chew, but I'd 
still like to give it a try.
Thanks in advance.
Paul
17-Jul-2009
[4024]
You can just set your block that you want to parse to a word.  Such 
as:

blk: [ "cmd" {if error? try [1 + "x"] [print "Did not work."]}
rsp
   {Did not work.} 
cmd
  {if error? try [load "$10,20,30"] [print "No good"]}
rsp
  {No good}]

; and then do this:

>> parse blk [some [set s string! (print s)]]
PatrickP61
17-Jul-2009
[4025x4]
Hi Paul,  I may have mis-stated what I'm after.  You see the site 
 http://rebol.com/r3/docs/functions/try.htmlhas displayable rebol 
code and responses within the html.  If you captured the html code 
you would find something like this:
<html>
<head>
...(additional html code and text)...

<title>REBOL 3   Functions: try</title>TRY returns an error value 
if an error happened,
otherwise it returns the normal result of the block.</p>

<pre>if error? try [1 + "x"] [print "Did not work."]             
                    <-- in this e.g. the tag <pre> will preceed the 
rebol command until the next tag

<span class="eval">Did not work.</span></pre>                    
      <-- the tag <span class="eval">  will preceed the response 

<pre>if error? try [load "$10,20,30"] [print "No good"]          
          <-- this is the next rebol command

<span class="eval">No good</span></pre>                          
       <-- this is the next response
<h2 id="section-3">Related</h2>


I want to be able to interrogate the html code, parse it and capture 
the rebol commands and responses (if any), then put that into your 
above block example.
I have this code which does this:

cmd-txt: "unasg"  cmd-term: "<"

pre-txt: "unasg"  pre-bgn:  "<pre>"               pre-end: "</pre>"

rsp-txt: "unasg"  rsp-bgn:  {<span class="eval">} rsp-end: {</span>}
site-url:	http://rebol.com/r3/docs/functions/try.html

page-txt: to-string read site-url
probe parse page-txt [thru pre-bgn copy pre-txt to pre-end]
probe parse pre-txt  [copy cmd-txt to cmd-term]
probe parse pre-txt  [thru rsp-bgn copy rsp-txt to rsp-end]

print [{"cmd"} "{" cmd-txt "}"]
print [{"rsp"} "{" rsp-txt "}"]

will yield this:
cmd

 { if error? try [1 + "x"] [print "Did not work."]          <-- this 
 is close to what I want to do
}
rsp
 { Did not work. }


This is close to what I want, but it is not foolproof.  For example, 
I would like to capture all displayable text that is separated from 
any html tags.  In my code example, if a displayable greater than 
symbol  < was displayed, then the parse would stop prematurely.


I am guessing someone has already created some code to "pull apart" 
a html web page, separating displayable text from invisible markup 
code.
p.s.  I'm doing this in R3!
I think I may have found an example on REBOL.ORG called WebSplit.r 
 that may be helpful.  I welcome any other suggestions.
Graham
18-Jul-2009
[4029]
load/markup
Brock
18-Jul-2009
[4030x9]
more to what Graham is saying is, try...
>> load/markup http://rebol.com/r3/docs/functions/try.html


you will be returned a block of strings and tags, which you could 
use the tag? word to test if each element is a tag or not to seperate 
HTML from regular Strings.
This should help with the parse itself...
parse page-txt [
	thru pre-bgn 
	copy cmd-txt 
	to rsp-bgn
	thru rsp-bgn
	copy rsp-term 
	to rsp-end 
	(print ["Cmd: " cmd-txt "RSP: " rsp-term])
	to end
]
which would return...
Cmd:  if error? try [1 + "x"] [print "Did not work."]
RSP:  Did not work.
== true
sorry, missed the quotes around the returned set...
parse page-txt [
	thru pre-bgn 
	copy cmd-txt 
	to rsp-bgn
	thru rsp-bgn
	copy rsp-term 
	to rsp-end 
	(print ["Cmd: " mold cmd-txt "RSP: " mold rsp-term])
	to end
]
which would return...
Cmd:  {if error? try [1 + "x"] [print "Did not work."]
} RSP:  "Did not work."
if you want the RSP line on a separte line from the '{', then put 
a  the word   newline   before the string "RSP: ", or use  "^/RSP: 
", where the   "^/" is equivalent to a newline or carriage return.
with the result...
Cmd:  {if error? try [1 + "x"] [print "Did not work."]
}
RSP:  "Did not work."
if you want to capture multiple command and response blocks you wrap 
the parse block in...
any[ parse statements] 

.... excluding the to end statement which you would want to include 
only after   'any'   parse instances occured.
parse page-txt [
	any[
		thru pre-bgn 
		copy cmd-txt 
		to rsp-bgn
		thru rsp-bgn
		copy rsp-term 
		to rsp-end 
		(print ["Cmd: " mold cmd-txt "^/RSP: " mold rsp-term])
	]
	to end
]
PatrickP61
18-Jul-2009
[4039]
Excellent suggestions Brock and Graham -- That gives me a lot to 
play with!!  Thank you.
Normand
24-Jul-2009
[4040]
Does someone know of some scripts that parse documents written in 
LaTex.  I would need examples applying parse to the LaTex language.
Reichart
24-Jul-2009
[4041x2]
What about parsing another similar language?

http://www.rebol.org/view-script.r?script=qml-base.r
This is ell written too http://www.codeconscious.com/rebol/parse-tutorial.html
Normand
26-Jul-2009
[4043]
Thanks for those references.
Sunanda
14-Sep-2009
[4044]
Parse question on StackOverflow -- not yet answered:

http://stackoverflow.com/questions/1415340/rebol-parsing-rule-how-to-correct-the-rule-to-separate-paragraphs
PeterWood
14-Sep-2009
[4045]
Three answers now.
Carl
28-Sep-2009
[4046x2]
Steeve - move parse discussion here.
So... does such a function take an argument, such as the current 
index for the series being parsed?
BrianH
28-Sep-2009
[4048]
That's how REPLACE works...
Carl
28-Sep-2009
[4049]
REPLACE?
Steeve
28-Sep-2009
[4050x2]
i would say, no functions with parameters allowed
but if you can perform a do/next then parameters are allowed
BrianH
28-Sep-2009
[4052x2]
Yeah. If you pass a function as the replacement value, that function 
will be called with the series at the replacement position as a parameter, 
and its return value is used. ARRAY does something similar too. My 
changes.
Say that the series position parameter is passed with APPLY rules. 
If the function takes a parameter it sees it; if not it doesn't.
Steeve
28-Sep-2009
[4054]
Or we can provide 2 index by default, maintained by the parse engine.
& = the head of last rule matched
&&= the tail of the last matched rule.
BrianH
28-Sep-2009
[4055x3]
Not a bad idea, but... this is how it starts. This is what led to 
the rule! type suggestion :(
See, your method would allow REPLACE/part to be called directly.
Sorry, REMOVE/part.
Steeve
28-Sep-2009
[4058x2]
or even INSERT, REMOVE, CHANGE, without the need to develop a specific 
inlinned method for those functions
wer just need 2 pointers auto-handled by parse
BrianH
28-Sep-2009
[4060x3]
Except your replacement code for those functions was wrong. And would 
be wrong in this case. Those inline operations were added to reduce 
common errors, not to provide missing functionality.
I am also concerned about the security implications of having PARSE 
call functions outside of parens. In parens you know what you're 
getting. This is why QUOTE and IF require parens for the REBOL code 
they execute.
All of the added operations could have been done before with code 
in parens and/or explicit position setting. It's easier this way.
RobertS
28-Sep-2009
[4063]
I put a note up because of my silly misunderstanding of the intent 
of adding AND to PARSE.  But I get odd results with the likes of 
   parse "abeabd" [and [thru "e"] [thru "d'"]]  which behaves like 
ANY
BrianH
28-Sep-2009
[4064x2]
Not a silly misunderstanding, a bug, bug#1238 in particular.
One of 4 parse bugs in a83.
RobertS
28-Sep-2009
[4066]
OF course in STSC APL  "laod" was as good as "load" and in Smalltalk 
I still long for "slef" and "sefl" but I draw the line at "elfs" 
which is clearly unfit in the age of "octopuses"
BrianH
28-Sep-2009
[4067]
And that doesn't even count the stuff not implemented yet.
RobertS
28-Sep-2009
[4068]
I thought ONE (but no move) on the model of SOME and ANY when I was 
misunderstanding AND as "all"  as [ ONE [rule1 rule2 rule3 ] ]