World: r3wp
[Parse] Discussion of PARSE dialect
older newer | first last |
Anton 13-Jul-2009 [4019] | The above problem reduces to: spacer: charset " " parse/all " " [to spacer] ** Script Error: Invalid argument: make bitset! #{ 0000000001000000000000000000000000000000000000000000000000000000 } ** Near: parse/all " " [to spacer] The reason is Rebol2 parse does not allow "to subrule". (Pointed out by sqlab, thanks.) Here's a way to do it using COMPLEMENT (suggested by Graham): spacer: charset " ^-^/" ; Space, tab, newline. non-spacer: complement spacer ; All chars except the three above. whatever: [some non-spacer] spaces: [some spacer] rule: ["a" spaces copy varb whatever spaces "c"] parse/all "a b c" rule ;== true |
BrianH 13-Jul-2009 [4020] | Anton, this sounds like that question asked on stackoverflow.com, linked earlier here in this group. |
Anton 13-Jul-2009 [4021x2] | You're right. Some guy has been cross-posting the same question. |
Oh, and I see you gave a good reply in the stackoverflow.com site. | |
PatrickP61 17-Jul-2009 [4023] | Hi All, I'm new to PARSE, so I've come here to learn a little more. I'm working on and off on a little testing project of my own for R3. My goal is to navigate through some website(s), capture Rebol code, and the expeceted responses such as this page: http://rebol.com/r3/docs/functions/try.html I'd like to capture the text inside a block like this: [ "cmd" {if error? try [1 + "x"] [print "Did not work."]} rsp {Did not work.} cmd {if error? try [load "$10,20,30"] [print "No good"]} rsp {No good}] Can anyone point me to some parse example code which can "tear apart" an HTTP page based on text and the type of text? I realize I may be biting off a bit more than I can chew, but I'd still like to give it a try. Thanks in advance. |
Paul 17-Jul-2009 [4024] | You can just set your block that you want to parse to a word. Such as: blk: [ "cmd" {if error? try [1 + "x"] [print "Did not work."]} rsp {Did not work.} cmd {if error? try [load "$10,20,30"] [print "No good"]} rsp {No good}] ; and then do this: >> parse blk [some [set s string! (print s)]] |
PatrickP61 17-Jul-2009 [4025x4] | Hi Paul, I may have mis-stated what I'm after. You see the site http://rebol.com/r3/docs/functions/try.htmlhas displayable rebol code and responses within the html. If you captured the html code you would find something like this: <html> <head> ...(additional html code and text)... <title>REBOL 3 Functions: try</title>TRY returns an error value if an error happened, otherwise it returns the normal result of the block.</p> <pre>if error? try [1 + "x"] [print "Did not work."] <-- in this e.g. the tag <pre> will preceed the rebol command until the next tag <span class="eval">Did not work.</span></pre> <-- the tag <span class="eval"> will preceed the response <pre>if error? try [load "$10,20,30"] [print "No good"] <-- this is the next rebol command <span class="eval">No good</span></pre> <-- this is the next response <h2 id="section-3">Related</h2> I want to be able to interrogate the html code, parse it and capture the rebol commands and responses (if any), then put that into your above block example. |
I have this code which does this: cmd-txt: "unasg" cmd-term: "<" pre-txt: "unasg" pre-bgn: "<pre>" pre-end: "</pre>" rsp-txt: "unasg" rsp-bgn: {<span class="eval">} rsp-end: {</span>} site-url: http://rebol.com/r3/docs/functions/try.html page-txt: to-string read site-url probe parse page-txt [thru pre-bgn copy pre-txt to pre-end] probe parse pre-txt [copy cmd-txt to cmd-term] probe parse pre-txt [thru rsp-bgn copy rsp-txt to rsp-end] print [{"cmd"} "{" cmd-txt "}"] print [{"rsp"} "{" rsp-txt "}"] will yield this: cmd { if error? try [1 + "x"] [print "Did not work."] <-- this is close to what I want to do } rsp { Did not work. } This is close to what I want, but it is not foolproof. For example, I would like to capture all displayable text that is separated from any html tags. In my code example, if a displayable greater than symbol < was displayed, then the parse would stop prematurely. I am guessing someone has already created some code to "pull apart" a html web page, separating displayable text from invisible markup code. | |
p.s. I'm doing this in R3! | |
I think I may have found an example on REBOL.ORG called WebSplit.r that may be helpful. I welcome any other suggestions. | |
Graham 18-Jul-2009 [4029] | load/markup |
Brock 18-Jul-2009 [4030x9] | more to what Graham is saying is, try... >> load/markup http://rebol.com/r3/docs/functions/try.html you will be returned a block of strings and tags, which you could use the tag? word to test if each element is a tag or not to seperate HTML from regular Strings. |
This should help with the parse itself... parse page-txt [ thru pre-bgn copy cmd-txt to rsp-bgn thru rsp-bgn copy rsp-term to rsp-end (print ["Cmd: " cmd-txt "RSP: " rsp-term]) to end ] | |
which would return... Cmd: if error? try [1 + "x"] [print "Did not work."] RSP: Did not work. == true | |
sorry, missed the quotes around the returned set... parse page-txt [ thru pre-bgn copy cmd-txt to rsp-bgn thru rsp-bgn copy rsp-term to rsp-end (print ["Cmd: " mold cmd-txt "RSP: " mold rsp-term]) to end ] | |
which would return... Cmd: {if error? try [1 + "x"] [print "Did not work."] } RSP: "Did not work." | |
if you want the RSP line on a separte line from the '{', then put a the word newline before the string "RSP: ", or use "^/RSP: ", where the "^/" is equivalent to a newline or carriage return. | |
with the result... Cmd: {if error? try [1 + "x"] [print "Did not work."] } RSP: "Did not work." | |
if you want to capture multiple command and response blocks you wrap the parse block in... any[ parse statements] .... excluding the to end statement which you would want to include only after 'any' parse instances occured. | |
parse page-txt [ any[ thru pre-bgn copy cmd-txt to rsp-bgn thru rsp-bgn copy rsp-term to rsp-end (print ["Cmd: " mold cmd-txt "^/RSP: " mold rsp-term]) ] to end ] | |
PatrickP61 18-Jul-2009 [4039] | Excellent suggestions Brock and Graham -- That gives me a lot to play with!! Thank you. |
Normand 24-Jul-2009 [4040] | Does someone know of some scripts that parse documents written in LaTex. I would need examples applying parse to the LaTex language. |
Reichart 24-Jul-2009 [4041x2] | What about parsing another similar language? http://www.rebol.org/view-script.r?script=qml-base.r |
This is ell written too http://www.codeconscious.com/rebol/parse-tutorial.html | |
Normand 26-Jul-2009 [4043] | Thanks for those references. |
Sunanda 14-Sep-2009 [4044] | Parse question on StackOverflow -- not yet answered: http://stackoverflow.com/questions/1415340/rebol-parsing-rule-how-to-correct-the-rule-to-separate-paragraphs |
PeterWood 14-Sep-2009 [4045] | Three answers now. |
Carl 28-Sep-2009 [4046x2] | Steeve - move parse discussion here. |
So... does such a function take an argument, such as the current index for the series being parsed? | |
BrianH 28-Sep-2009 [4048] | That's how REPLACE works... |
Carl 28-Sep-2009 [4049] | REPLACE? |
Steeve 28-Sep-2009 [4050x2] | i would say, no functions with parameters allowed |
but if you can perform a do/next then parameters are allowed | |
BrianH 28-Sep-2009 [4052x2] | Yeah. If you pass a function as the replacement value, that function will be called with the series at the replacement position as a parameter, and its return value is used. ARRAY does something similar too. My changes. |
Say that the series position parameter is passed with APPLY rules. If the function takes a parameter it sees it; if not it doesn't. | |
Steeve 28-Sep-2009 [4054] | Or we can provide 2 index by default, maintained by the parse engine. & = the head of last rule matched &&= the tail of the last matched rule. |
BrianH 28-Sep-2009 [4055x3] | Not a bad idea, but... this is how it starts. This is what led to the rule! type suggestion :( |
See, your method would allow REPLACE/part to be called directly. | |
Sorry, REMOVE/part. | |
Steeve 28-Sep-2009 [4058x2] | or even INSERT, REMOVE, CHANGE, without the need to develop a specific inlinned method for those functions |
wer just need 2 pointers auto-handled by parse | |
BrianH 28-Sep-2009 [4060x3] | Except your replacement code for those functions was wrong. And would be wrong in this case. Those inline operations were added to reduce common errors, not to provide missing functionality. |
I am also concerned about the security implications of having PARSE call functions outside of parens. In parens you know what you're getting. This is why QUOTE and IF require parens for the REBOL code they execute. | |
All of the added operations could have been done before with code in parens and/or explicit position setting. It's easier this way. | |
RobertS 28-Sep-2009 [4063] | I put a note up because of my silly misunderstanding of the intent of adding AND to PARSE. But I get odd results with the likes of parse "abeabd" [and [thru "e"] [thru "d'"]] which behaves like ANY |
BrianH 28-Sep-2009 [4064x2] | Not a silly misunderstanding, a bug, bug#1238 in particular. |
One of 4 parse bugs in a83. | |
RobertS 28-Sep-2009 [4066] | OF course in STSC APL "laod" was as good as "load" and in Smalltalk I still long for "slef" and "sefl" but I draw the line at "elfs" which is clearly unfit in the age of "octopuses" |
BrianH 28-Sep-2009 [4067] | And that doesn't even count the stuff not implemented yet. |
RobertS 28-Sep-2009 [4068] | I thought ONE (but no move) on the model of SOME and ANY when I was misunderstanding AND as "all" as [ ONE [rule1 rule2 rule3 ] ] |
older newer | first last |