r3wp [groups: 83 posts: 189283]
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

World: r3wp

[Parse] Discussion of PARSE dialect

[unknown: 5]
5-Aug-2007
[2190x4]
no that parses every letter
I'm beginning to wonder if it is a bug in parse
Anyone know how to parse a string such that the newlines and tabs 
are parsed?  I'm not getting the results I'm expecting.
Hoping to see someone elses command in case I'm just brain farting 
something.
PeterWood
5-Aug-2007
[2194]
>> a: join "line1" [newline "line2"]
== "line1^/line2"
>> parse a [any [newline (print "newline found") | skip]]
newline found
== true
Steeve
5-Aug-2007
[2195]
yeah, never had problems when parsing tabs and newlines
Geomol
6-Aug-2007
[2196]
When parsing strings without the /all refinement, words are separated 
by space. Example that work:

>> parse "word1 word2^-word3^/word4" ["word1" "word2" "word3" newline 
"word4"]
== true
You can also explicit specify the tab:

>> parse "word1 word2^-word3^/word4" ["word1" "word2" #(tab) "word3" 
newline "word4"]
== true

Actually the #(tab) seems to be ignored, because you can specify 
it anywhere:

>> parse "word1 word2^-word3^/word4" ["word1" "wo" #(tab) "rd2" "word3" 
newline "word4"]
== true

But you get false, if specifying the space (which may be a strange 
thing):

>> parse "word1 word2^-word3^/word4" ["word1" #" " "word2" "word3" 
newline "word4"]
== false
Also you need to specify newlines, they are not seen as space:

>> parse "word1 word2^-word3^/word4" ["word1" "word2" "word3" "word4"]
== false

If you need to parse for tabs at certain places, use: parse/all
I hope, it helps!
Gabriele
6-Aug-2007
[2197x2]
>> b: [#(tab)]
== [# (tab)]
>> length? b
== 2
>> first b
== #
>> type? first b
== issue!
>> second b
== (tab)
>> type? second b
== paren!
so your "tab" is ignored because it's not a tab at all. it's the 
same as doing "" (tab) in the rule, ie empty string (always matches) 
followed by code that basically does nothing.
Geomol
6-Aug-2007
[2199x4]
Aha! :-)
How did I get #(tab) in my head!?

So to explicit specitying the tab, it must be:

>> parse "word1 word2^-word3^/word4" ["word1" "word2" "^(tab)" "word3" 
newline "word4"]
== false

So that doesn't parse, meaning tabs are seen as spaces.
Sorry about any confusion! :-)
And I guess, tab can be specified in the rule-block more simple as 
in::
["word1" "word2" tab "word3" newline "word4"]
Gabriele
6-Aug-2007
[2203]
you need parse/all to be able to parse spaces and tabs.
[unknown: 5]
7-Aug-2007
[2204]
Thanks everyone for your posts on this
Geomol
7-Aug-2007
[2205]
Paul, it could be good to know, if you got it to work!?
Chris
7-Aug-2007
[2206]
G: #"^(tab)"
Geomol
7-Aug-2007
[2207]
Probably.
PatrickP61
20-Aug-2007
[2208x2]
Hi all,
Are there any good references to learn PARSE?
Henrik
20-Aug-2007
[2210]
There is a parse page on the Wikibook.
PatrickP61
20-Aug-2007
[2211]
Found it -- Thanks
Geomol
20-Aug-2007
[2212]
Also this about parsing: http://www.rebol.com/docs/core23/rebolcore-15.html
[unknown: 5]
24-Aug-2007
[2213]
parse/all  and used "^-" for tab
[unknown: 5]
31-Aug-2007
[2214x2]
Ok ran into an issue.  Is there an easy way to parse a string that 
has doublequotes in it together.  Such as {some chars "" some more 
chars"" and more}
I need the quotes to be single just one set and not two together 
and the parse to keep intact the string section because often it 
is a part of an html tag.
Robert
1-Sep-2007
[2216x2]
Paul, do a search & replace upfront. Much simpler than to create 
complex parse rules.
I often use this pattern. Do some basic action on the parse input, 
parse the first round, again do some other processing than using 
parse again. Much simpler and faster to get where you want to go.
Tomc
1-Sep-2007
[2218]
paul what rule are you using for your parse
[unknown: 5]
1-Sep-2007
[2219x3]
Thanks Robert, I'll look into that further as I did place with replace 
but because they were quotes it seemed that parse/all still wanted 
to break apart at a quote even though I told it only tabs.
Tom, for parse I only want to parse/all data tab.  Problem is that 
parse will break apart html tags and more.  I don't want to parse 
out tags because they will be needed to be left intact to some extent.
It just seems to me that parse/all data tab doesn't ONLY parse out 
the tabs but breaks at these doublequotes together.
Tomc
2-Sep-2007
[2222x3]
Paul how are you defining tab?  it seems to work for me.
str
== {some chars "" some more chars"" and more}
>> parse/all str "^-"
== [{some chars "" some more chars"" and more}]
there are no beraks at the double quotes.
[unknown: 5]
2-Sep-2007
[2225x11]
It looks like it breaks on html tags that might be broken.  For example, 
I was testing parse on a tab deliminated file and performing the 
following parse:
parse data "^-"
The problem is that some of these broken html tags cause parse to 
not work correctly.  The tags will contain double quotes (The result 
of an export from oscommerce).
sorry i was using parse/all data "^-"
Doesn't even have to be broken tags it appears
Just when a quote is preceeding the tag
data: {my string^-"<span style="font: 12px arial;>some text</span>"}
>> parse/all data "^-"

== ["my string" "<span style=" {font: 12px arial;>some text</span>"}]
Notice you get it breaking the string even where there is NOT a tab.
Is this a bug?
I've looked at this some more and it only seems to be a problem if 
the quote is preceeding the <span> tag.  If you move the quote around 
you get what is expected and get the correct expected parsing.
btiffin
2-Sep-2007
[2236x2]
Your example still doesn't seem to jive with the documentation.  
Reading the docs, I would expected two strings in the output block. 
 "my string" and the rest, in braces.  It has something to do with 
a double quote starting a parse sequence.   {"abc"def} parses as 
["abc" "def"]  { "abc"def"} parses as a single string as expected 
[{ "abc"def}]
typos; expected = expect  second example was supposed to be { "abc"def}


The space after the brace seems to trigger different behaviour than 
{" with no space after the brace.  Any character actually, the bad 
behaviour is only with brace immediately followed by double quote.
[unknown: 5]
3-Sep-2007
[2238]
btiffin use this example:       data: {my string^-"<span style="font: 
12px arial;>some text</span>"}
btiffin
3-Sep-2007
[2239]
Yeah, I think the weird parsing behavior is due to the fact that 
the tab seperator is followed immediately by a token that begins 
with double quote.  If you change the data to ... -^ "<span...  (note 
the space after the tab),  the behaviour changes. giving
>> parse/all data2 to string! tab

== ["my string" { "<span style="font: 12px arial;>some text</span>"}]

As I would expect.  You've uncovered something here.  parse seems 
dependent on quote as the first symbol in a token.