World: r3wp
[Parse] Discussion of PARSE dialect
older newer | first last |
Gregg 10-Sep-2007 [2264] | First, you may need to spend some time with PARSE, so you're *really* comfortable with it. Taking on something like RTF--even just a subset--is going to be a sizable task. I would start by identifying the escapes (backslash words) and figuring out how you're going to maintain state as attributes are applied and removed. |
PatrickP61 10-Sep-2007 [2265] | Hey Gregg -- That is just what I've been doing. I have identified the following: 1. That all printable \ { and } will show up in RTF as backslash along with the special character like \\ \{ or \} any remaining \, {, or } will be RTF commands. 2. { } and ; identify groupings with the open brace and terminating the group with close brace within the RTF. The semicolon is used to terminate sub parameters for a particular command. 3. \xxx will always identify a particular command with an optional number appended to it. Example: \b means bold while \b0 meand bold off. What I am toying with is to define simple rules to break apart a string of the RTF commands and embedded text into two parts, the command part and a parameter part. (some parameters may be a block of multiple values). I'm studying the Parse command to see what I can do simply and progress from there. |
Steeve 16-Oct-2007 [2266x2] | i know your script Gabriele and other similar scripts , i just think we could be more concise to write a grammar using reflexive rules |
I am aware that it increases the complexity of the parser understanding but it is just an intellectual exercise for the moment | |
Graham 16-Nov-2007 [2268x4] | How to reliably break a block of text up by whitespace? |
I tried parse/all text "^/^- " but I still get large blocks of text as one | |
I guess I have to use charsets of whitespace and non-whitespace | |
just seems that it should be easier to split up a block of text by the whitespace | |
Sunanda 16-Nov-2007 [2272] | Have you tried parse/all trim/lines "..." " " |
Graham 16-Nov-2007 [2273x2] | it's getting fooled by "{" chars I think |
parse doesn't like " and { | |
Sunanda 16-Nov-2007 [2275] | That rings a bell --- I vaguely remember having to do stuff like replacing " or } with to-char 0 before doing some parses, and then changing back afterwards. That works if you have no to-char 0 in your strings |
Graham 16-Nov-2007 [2276] | I'll have to go back over my old scripts where I solved this before :( |
Oldes 16-Nov-2007 [2277] | If I remember well, this behaviour is because of CSV parsing - parse with delimiters (rules as a string) was designed mainly for that case. |
Graham 16-Nov-2007 [2278x2] | I'll try Gregg's split function |
Nice to have code snippets on line when the brain is too tired to create one's own | |
Brock 22-Nov-2007 [2280x3] | What's wrong with this? I'm trying to retrieve the "area" query string parameter out of this web log record... test: {10.200.55.63 - - [22/Oct/2007:10:32:57 -0500] "GET /irj/servlet/prt/portal/prtroot/com.cpc.km.Redirect?userid=KALEFBM&area=chm&Rurl=http://bjzprd /sellserve/displaysalesupdate.aspx?id=3815" 302 182} |
with the following parse statement... parse test [ thru "area=" copy new-area [to " " | to "?" | to "&"] to end (if debug? [print new-area]) ] | |
I expect the return to be just the characters chm, however the remainder of the querystring text is also being transfered. So the to "&" is not being considered within the rule. | |
Chris 22-Nov-2007 [2283] | I don't think you can use copy in that way. |
Brock 22-Nov-2007 [2284] | meaning I would nead to have 3 thru... copy... to... rules? |
Steeve 22-Nov-2007 [2285] | parse/all test [thru "&area=" copy val to "&"] print val |
Chris 22-Nov-2007 [2286x3] | Hmm, no - I'm wrong. Try parse/all first though (for to " ") |
Or, instead of parse, do -- select decode-cgi find/tail string "?" to-set-word 'area | |
string = test | |
Steeve 22-Nov-2007 [2289] | the problem comes from [to " " | to "?" | to "&"] |
Brock 22-Nov-2007 [2290] | @ Steeve, yes, but i'm not certain there will be a ? or & or space character, so I want to test for all three |
BrianH 22-Nov-2007 [2291] | Use charset "?& ". |
Steeve 22-Nov-2007 [2292] | use a charset instead. valid: complement charset "^-^/ ?&" parse/all test [thru "&area" copy val some valid to end] |
Chris 22-Nov-2007 [2293] | Yep, that'd be the surest... |
Steeve 22-Nov-2007 [2294] | oups, to late |
BrianH 22-Nov-2007 [2295x2] | Searching for tabs and newlines would not be necessary in this case, but yes. |
Be concise Steeve :) | |
Chris 22-Nov-2007 [2297] | Wouldn't work for the Rurl value though... |
Steeve 22-Nov-2007 [2298] | huhu |
Brock 22-Nov-2007 [2299] | seems this works... parse/all test [thru "area=" copy new-area some terminator to end (if debug? [print new-area])] where terminator: complement charset ["?" "&" " "]. In my earlier tests I didn't use the complement!! |
BrianH 22-Nov-2007 [2300x2] | Go thru the GET, thru the first ?, then process every variable seperately, especially of you allow unencoded strings for some variables. |
The value of the Rurl parameter is an unencoded string by the way. | |
Brock 22-Nov-2007 [2302] | yes, that was another issue I was going to need to tackle... I did some searching and couldn't find how to encode it easily. |
Chris 22-Nov-2007 [2303] | If it's consistently the last value, that makes it easier... |
BrianH 22-Nov-2007 [2304x3] | If you require that the argument value that is not url-encoded be the last, you can just do a to end or whatever the string terminator is. |
In this case that would be " | |
Be sure to parse the whole get line - otherwise you might miss (or catch) maliciously crafted calls to your site. | |
Brock 22-Nov-2007 [2307x2] | @ Chris: trying to accomodate variable placement within the string, but I can see that this can be a problem with the Rurl parameter. |
thanks for the input guys. | |
btiffin 24-Jan-2008 [2309] | I'm pondering attempting a PARSE lecture here on Altme; It'd be run twice, 9am EST, 9pm EST (or somesuch) Topic would be dialecting. I want to see if it would work, but I'm no where near a professor level rebol. So, think of it as a kindergarten lecture, as a trial. Plan; Post this message - see if there is feedback. Allow for some Q&A time for specific topics of interest. A week or two later, run a hour (probably less) of monologue (interruptions allowed for stuff that is just plain wrong ... but other than that participants would be asked to hold off on questions). Followed immediately with a Q&A, complaint, correction session. Then a DocBase page created with a merged transcript of the two timezoned lectures, things learned and hopefully something along the lines of a simple file management (or some such) dialect source code file. R2 related - for me the R3 DELECT still hasn't sunk in. If it works, then perhaps it could become a semi-regular activity...there is going to be a lot to discuss come "link to the rebol.dll" time. |
amacleod 24-Jan-2008 [2310] | sounds good |
Pekr 24-Jan-2008 [2311] | If it is not supposed to be interactive, you could as well prepare it in a form of DocBase article, and then run the session ... |
btiffin 24-Jan-2008 [2312] | Petr; true. It is meant to be interactive, but after a monologue phase. I worry a little bit as I have a sad tendency to be "almost right" with REBOL so I'd want the material vetted over before unleashing it on the innocent. |
SteveT 25-Jan-2008 [2313] | As 'the' newbie !! I'd 'Pay' for that! ;-) |
older newer | first last |