World: r3wp
[Parse] Discussion of PARSE dialect
older newer | first last |
Brock 30-Jan-2005 [3x8] | ID "TerritoryName" "Province" 3 "South Western Ontario Generalist" "ON" 45 "Wallaceburg" "ON" 47 "Surrey, Langley, White Rock, Port Moody" "BC" |
I read the file in using... | |
terrs: read %territory.txt ;terrid, territory, prov | |
;(clearer) | |
terrs: read %territory.txt | |
When I include the Header row the first line is not parsed properly, notice "ID" is not included in the first element of terr2.... >> terr2: parse/all terrs "^/" == ["ID" {^-"TerritoryName"^-"Province"} {3^-"South Western Ontario Generalist"^-"ON"} {45^-"Wa llaceburg"^-"ON"} {47^-"Surrey, Lang... | |
however, not including the header row the parse returns as expected... >> terr2: parse/all terrs "^/" == [{3^-"South Western Ontario Generalist"^-"ON"} {45^-"Wallaceburg"^-"ON"} {47^-"Surrey, Langl ey, White Rock, Port Moody"^-"BC"} {... | |
What is it about the first row of data in particular it starting with a string that prevents the parsing of the carriage return ("^/") to not work? | |
Chris 30-Jan-2005 [11] | It may be a recurrence of a problem parsing " at the beginning of a line -- doesn't seem to be a problem on the beta that I have open (1.2.54). As an alternative, you could always read/lines, or simply load the file... |
Geomol 30-Jan-2005 [12] | It's a bug, as I see it. |
eFishAnt 30-Jan-2005 [13] | what version is Brock using? |
Tomc 30-Jan-2005 [14x2] | note: block: parse/all read file newline could as well be written block read/lines file to side step the issue |
that said parse/all could do better at truly ignoring all as it is being asked to | |
Brock 30-Jan-2005 [16x3] | eFish... 1.2.40, also tried latest beta 1.2.57, and went back to 1.2.8 to see if it handled it any different. |
Tom: Yes, I was aware of read/lines and how it is similar (apparently not the same) as parse/all series "^/". read/lines worked just fine. I don't know why last night I wasn't happy with read/lines - must have been tired! | |
Chris: I tried 1.2.56.3.1, so if was fixed in .54, it missed the cut for the following releases. | |
Graham 30-Jan-2005 [19] | and what do you do if the text is not a file ? Write your own parse rule. |
Chris 30-Jan-2005 [20] | I think I messed up -- not fixed on my 1.2.54... |
Romano 30-Jan-2005 [21] | 1.2.57 >> parse/all {"a""b""c"de} "e" == ["a" "b" "c" "d"] Please, add the bug to RAMBO. |
Graham 8-Mar-2005 [22] | How do you break a parse rule and still have it be true ? |
Tomc 8-Mar-2005 [23] | to end |
Graham 8-Mar-2005 [24] | So, what is break used for then? |
Tomc 8-Mar-2005 [25] | break out of sub rules |
Graham 8-Mar-2005 [26] | oh .. ok. |
Graham 10-Mar-2005 [27] | Actually, what I was wondering was how to break out of an action in a rule and still let it return true |
sqlab 10-Mar-2005 [28] | In the new alphas you can do >> catch [parse/all " aa" [" " (throw true)]] == true |
Graham 10-Mar-2005 [29] | ahh.. that's what I need. |
Anton 10-Mar-2005 [30x2] | In a paren, you can change a 'break-rule from an empty block to [to end], then include that break-rule in your main parse-rule somewhere. |
(for a more backwards compatible solution) | |
JaimeVargas 10-Mar-2005 [32] | parse/all "first second" [["first" break] to end] |
Romano 10-Mar-2005 [33] | Graham: break do not invalidate the rule by itself |
Graham 10-Mar-2005 [34] | Anton, can you give an example of what you mean? |
Anton 10-Mar-2005 [35x3] | Oh dear, looks like my memory was not so great. I was using 'break in the early-exit rule here: extract-link-by-name: func ["Returns the first A link with the specified text in the source string." name [any-string!] "link text" str [any-string!] "html source string" /local start end terms non-terms link text early-exit ][ early-exit: [] if parse/all str [ some [ thru "<a " any " " ["name" | [ "href" any " " "=" any " " [{"} (terms: {"}) | (terms: " >")] ; set terms depending on how url starts, with double-quote? (non-terms: complement charset terms) start: some non-terms end: (link: copy/part start end) opt {"} any " " thru ">" copy text to "</a" ; <- was before just "<" (if text = name [early-exit: [to end break]]) ]] early-exit ] ][reduce ['link link 'name text]] ] |
Ah, here is an example of what I was thinking: >> rule: [] parse "aaabbbb" [any [rule ["a" (print 'a) | "b" (print 'b rule: [to end " "])]] to end] a a a b == true | |
When the first b is encountered, rule is set to a rule which cannot be succeed, thus breaking out of the outer any. | |
Graham 10-Mar-2005 [38] | so, rule ends up by being redefined? |
Graham 12-Mar-2005 [39x5] | I'm using these rules in my server side implemention to the top command one-line-rule: [copy line thru {^/} ( if line = ".^/" [ line: join "." line ] write-client line)] header-rule: [copy header thru {^/^/} (write-client header write-client )] msg is the email message including header and body lines is the number of lines requested by the TOP command parse msg compose [ header-rule (lines) one-line-rule ] Now, I can't check the parse syntax as rebol.com is down, but I seem to always get the whole email with my header-rule and not just the header. |
correction: header-rule: [copy header thru {^/^/} (write-client header)] | |
the rules work when tested at the console. So, I'm thinking something else is wrong. | |
The email is obtained by reading from a port in binary mode, so could this be due to the lack of line ending conversion ? | |
From a unix system to windows32? | |
Brett 12-Mar-2005 [44x2] | Hi Graham. Line ending in Internet protocols = Apples; Line/Paragraph representation in text files = Oranges. :-) Best not to compare them for this task. As far as I've seen, Internet protocols have an on-the-wire line ending of CRLF. So yes this is very likely a problem with your parse rules. |
With the port in binary mode - you must use CRLF to identify/transmit line endings. | |
Graham 12-Mar-2005 [46] | Hi Brett. Do you then see a problem with me parse rules? |
Tomc 12-Mar-2005 [47] | seems link to get all possible raw line endings something like copy line thru [ 1 2 ["^J" | "^M"]] might work |
Graham 12-Mar-2005 [48x2] | my one-line-rule seems to work already .. |
copy header thru [ 2 [ "^J" | "^M"] ] ... ? | |
Tomc 12-Mar-2005 [50] | unix would be a single "^J" |
Graham 12-Mar-2005 [51x2] | header is separated from body by one blank line. |
So, I need to look for two consecutive line endings to find the end of the header | |
older newer | first last |