World: r3wp
[Parse] Discussion of PARSE dialect
older newer | first last |
Brock 10-Jul-2009 [4011] | Can anyone explain the error indicated in the last comment? |
Graham 10-Jul-2009 [4012] | You should define a complement to spaces and then change the parse rule to copy the complementary characters. |
Anton 11-Jul-2009 [4013] | The changed rule has copy varb to spaces Note: that's *to* spaces, not *thru* spaces. That is, the spaces are not consumed like they were in the previous rule. If you have a rule spaces then those spaces will be consumed. If you have a rule to spaces then the parse index will be moved to the head of those spaces, so the spaces themselves will not be consumed. So if you want the spaces also to be consumed (the parse index to be advanced through them), then you need a rule: to spaces spaces That's right, you have to repeat yourself a little bit. So the fixed version of the broken rule from the article should be: rule: ["a" spaces copy varb to spaces spaces "c"] (Feel free to post this answer to the article.) |
sqlab 13-Jul-2009 [4014] | no, I see it differently. the problem is that spaces is " [some spacer]" , and R2 does not allow "to subrule". |
Anton 13-Jul-2009 [4015x5] | You are right. (I was so confident I didn't test my code. Good thing nobody posted it.) |
Ok, this is tested: | |
. spacer: charset " ^-^/" ; Space, tab, newline. non-spacer: complement spacer ; All chars except the three above. whatever: [some non-spacer] spaces: [some spacer] rule: ["a" spaces copy varb whatever spaces "c"] parse/all "a b c" rule ;== true | |
Maybe someone who subscribed can post this code, with additional comments: | |
The above problem reduces to: spacer: charset " " parse/all " " [to spacer] ** Script Error: Invalid argument: make bitset! #{ 0000000001000000000000000000000000000000000000000000000000000000 } ** Near: parse/all " " [to spacer] The reason is Rebol2 parse does not allow "to subrule". (Pointed out by sqlab, thanks.) Here's a way to do it using COMPLEMENT (suggested by Graham): spacer: charset " ^-^/" ; Space, tab, newline. non-spacer: complement spacer ; All chars except the three above. whatever: [some non-spacer] spaces: [some spacer] rule: ["a" spaces copy varb whatever spaces "c"] parse/all "a b c" rule ;== true | |
BrianH 13-Jul-2009 [4020] | Anton, this sounds like that question asked on stackoverflow.com, linked earlier here in this group. |
Anton 13-Jul-2009 [4021x2] | You're right. Some guy has been cross-posting the same question. |
Oh, and I see you gave a good reply in the stackoverflow.com site. | |
PatrickP61 17-Jul-2009 [4023] | Hi All, I'm new to PARSE, so I've come here to learn a little more. I'm working on and off on a little testing project of my own for R3. My goal is to navigate through some website(s), capture Rebol code, and the expeceted responses such as this page: http://rebol.com/r3/docs/functions/try.html I'd like to capture the text inside a block like this: [ "cmd" {if error? try [1 + "x"] [print "Did not work."]} rsp {Did not work.} cmd {if error? try [load "$10,20,30"] [print "No good"]} rsp {No good}] Can anyone point me to some parse example code which can "tear apart" an HTTP page based on text and the type of text? I realize I may be biting off a bit more than I can chew, but I'd still like to give it a try. Thanks in advance. |
Paul 17-Jul-2009 [4024] | You can just set your block that you want to parse to a word. Such as: blk: [ "cmd" {if error? try [1 + "x"] [print "Did not work."]} rsp {Did not work.} cmd {if error? try [load "$10,20,30"] [print "No good"]} rsp {No good}] ; and then do this: >> parse blk [some [set s string! (print s)]] |
PatrickP61 17-Jul-2009 [4025x4] | Hi Paul, I may have mis-stated what I'm after. You see the site http://rebol.com/r3/docs/functions/try.htmlhas displayable rebol code and responses within the html. If you captured the html code you would find something like this: <html> <head> ...(additional html code and text)... <title>REBOL 3 Functions: try</title>TRY returns an error value if an error happened, otherwise it returns the normal result of the block.</p> <pre>if error? try [1 + "x"] [print "Did not work."] <-- in this e.g. the tag <pre> will preceed the rebol command until the next tag <span class="eval">Did not work.</span></pre> <-- the tag <span class="eval"> will preceed the response <pre>if error? try [load "$10,20,30"] [print "No good"] <-- this is the next rebol command <span class="eval">No good</span></pre> <-- this is the next response <h2 id="section-3">Related</h2> I want to be able to interrogate the html code, parse it and capture the rebol commands and responses (if any), then put that into your above block example. |
I have this code which does this: cmd-txt: "unasg" cmd-term: "<" pre-txt: "unasg" pre-bgn: "<pre>" pre-end: "</pre>" rsp-txt: "unasg" rsp-bgn: {<span class="eval">} rsp-end: {</span>} site-url: http://rebol.com/r3/docs/functions/try.html page-txt: to-string read site-url probe parse page-txt [thru pre-bgn copy pre-txt to pre-end] probe parse pre-txt [copy cmd-txt to cmd-term] probe parse pre-txt [thru rsp-bgn copy rsp-txt to rsp-end] print [{"cmd"} "{" cmd-txt "}"] print [{"rsp"} "{" rsp-txt "}"] will yield this: cmd { if error? try [1 + "x"] [print "Did not work."] <-- this is close to what I want to do } rsp { Did not work. } This is close to what I want, but it is not foolproof. For example, I would like to capture all displayable text that is separated from any html tags. In my code example, if a displayable greater than symbol < was displayed, then the parse would stop prematurely. I am guessing someone has already created some code to "pull apart" a html web page, separating displayable text from invisible markup code. | |
p.s. I'm doing this in R3! | |
I think I may have found an example on REBOL.ORG called WebSplit.r that may be helpful. I welcome any other suggestions. | |
Graham 18-Jul-2009 [4029] | load/markup |
Brock 18-Jul-2009 [4030x9] | more to what Graham is saying is, try... >> load/markup http://rebol.com/r3/docs/functions/try.html you will be returned a block of strings and tags, which you could use the tag? word to test if each element is a tag or not to seperate HTML from regular Strings. |
This should help with the parse itself... parse page-txt [ thru pre-bgn copy cmd-txt to rsp-bgn thru rsp-bgn copy rsp-term to rsp-end (print ["Cmd: " cmd-txt "RSP: " rsp-term]) to end ] | |
which would return... Cmd: if error? try [1 + "x"] [print "Did not work."] RSP: Did not work. == true | |
sorry, missed the quotes around the returned set... parse page-txt [ thru pre-bgn copy cmd-txt to rsp-bgn thru rsp-bgn copy rsp-term to rsp-end (print ["Cmd: " mold cmd-txt "RSP: " mold rsp-term]) to end ] | |
which would return... Cmd: {if error? try [1 + "x"] [print "Did not work."] } RSP: "Did not work." | |
if you want the RSP line on a separte line from the '{', then put a the word newline before the string "RSP: ", or use "^/RSP: ", where the "^/" is equivalent to a newline or carriage return. | |
with the result... Cmd: {if error? try [1 + "x"] [print "Did not work."] } RSP: "Did not work." | |
if you want to capture multiple command and response blocks you wrap the parse block in... any[ parse statements] .... excluding the to end statement which you would want to include only after 'any' parse instances occured. | |
parse page-txt [ any[ thru pre-bgn copy cmd-txt to rsp-bgn thru rsp-bgn copy rsp-term to rsp-end (print ["Cmd: " mold cmd-txt "^/RSP: " mold rsp-term]) ] to end ] | |
PatrickP61 18-Jul-2009 [4039] | Excellent suggestions Brock and Graham -- That gives me a lot to play with!! Thank you. |
Normand 24-Jul-2009 [4040] | Does someone know of some scripts that parse documents written in LaTex. I would need examples applying parse to the LaTex language. |
Reichart 24-Jul-2009 [4041x2] | What about parsing another similar language? http://www.rebol.org/view-script.r?script=qml-base.r |
This is ell written too http://www.codeconscious.com/rebol/parse-tutorial.html | |
Normand 26-Jul-2009 [4043] | Thanks for those references. |
Sunanda 14-Sep-2009 [4044] | Parse question on StackOverflow -- not yet answered: http://stackoverflow.com/questions/1415340/rebol-parsing-rule-how-to-correct-the-rule-to-separate-paragraphs |
PeterWood 14-Sep-2009 [4045] | Three answers now. |
Carl 28-Sep-2009 [4046x2] | Steeve - move parse discussion here. |
So... does such a function take an argument, such as the current index for the series being parsed? | |
BrianH 28-Sep-2009 [4048] | That's how REPLACE works... |
Carl 28-Sep-2009 [4049] | REPLACE? |
Steeve 28-Sep-2009 [4050x2] | i would say, no functions with parameters allowed |
but if you can perform a do/next then parameters are allowed | |
BrianH 28-Sep-2009 [4052x2] | Yeah. If you pass a function as the replacement value, that function will be called with the series at the replacement position as a parameter, and its return value is used. ARRAY does something similar too. My changes. |
Say that the series position parameter is passed with APPLY rules. If the function takes a parameter it sees it; if not it doesn't. | |
Steeve 28-Sep-2009 [4054] | Or we can provide 2 index by default, maintained by the parse engine. & = the head of last rule matched &&= the tail of the last matched rule. |
BrianH 28-Sep-2009 [4055x3] | Not a bad idea, but... this is how it starts. This is what led to the rule! type suggestion :( |
See, your method would allow REPLACE/part to be called directly. | |
Sorry, REMOVE/part. | |
Steeve 28-Sep-2009 [4058x2] | or even INSERT, REMOVE, CHANGE, without the need to develop a specific inlinned method for those functions |
wer just need 2 pointers auto-handled by parse | |
BrianH 28-Sep-2009 [4060] | Except your replacement code for those functions was wrong. And would be wrong in this case. Those inline operations were added to reduce common errors, not to provide missing functionality. |
older newer | first last |