r3wp [groups: 83 posts: 189283]
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

World: r3wp

[Parse] Discussion of PARSE dialect

Janko
2-Dec-2009
[4611]
from Advocacy --> Graham [ to "A" | to "B" ] won't work as I want 
.. I will try to find a concrete example
Graham
2-Dec-2009
[4612]
this is a current parse limitation.
Janko
2-Dec-2009
[4613]
parse "start 111 end start 222 finish" [ some [ thru "start" copy 
NUMS [ to "finish | to "end" ] ] ]   this wont work
Graham
2-Dec-2009
[4614x2]
change it
[ to "end" | to "finish" ]
Janko
2-Dec-2009
[4616]
ok .. but I meant that you have "start 111 end start 222 finish start 
333 end "  then it won't work :)
Graham
2-Dec-2009
[4617]
change the rule again
Janko
2-Dec-2009
[4618]
I was trying to show an example where you have two possible endings 
and you want to process both (and you can differently with parens) 
) but you don't know in what order they will come or anything
Graham
2-Dec-2009
[4619x3]
In this case I would use block parsing ... then I'm no expert in 
parsing
parse string [ some [ "start" digits "end" | "start" digitis "finish 
]]
your problem is because you are using 'thru which breaks the other 
rule
Janko
2-Dec-2009
[4622x2]
yes , then you have to do charset parsing (but I don't know that 
yet :) ) .. I was just trying to say if there would be the way to 
say something like "to any [ "A" | "B" ] and it would go to the closest 
one A LOT of problems with parse would be easily solvable
you can use to but it still won't work
Graham
2-Dec-2009
[4624x3]
[ some [ "start" digits [ "end" | "finish" ] ]
should work
to go to the closest one .. means it has to try all the rules??
and see which has the best fit ?
Janko
2-Dec-2009
[4627x2]
no wgih is  the closest .. look at this example (I hope this will 
be better)
whigh = which
Graham
2-Dec-2009
[4629x2]
I know what you mean .. so you have to order your rules knowing what 
the data looks like
If you don't know what pattern the data is .. you can't parse it 
with anything.
Janko
2-Dec-2009
[4631x4]
parse "This is Apple . This is Windows ! This is Linux . This is 
Amiga ." [ some [ "This is" copy IT (print IT) to [ "." | "!" ] ]
The pattern is known ... the scentence starts with this is and can 
end with . or ! but they can come in any order .. if you try to parse 
with "." first you will get 
---- ops some errors upthere  .. just a sec
>> parse "This is Apple . This is Windows ! This is Linux . This 
is Amiga ." [
 some [ thru "This is" copy IT [to "." | to "!" ] (print IT) ]]
 Apple
 Windows ! This is Linux
 Amiga
this is the common to all problems where that I am describing .. 
if I had  > to [ "." | "!" ] and parse would find both and go to 
the one that is closer it would be solved.
Graham
2-Dec-2009
[4635]
charset [ #"!" #"." ]
Janko
2-Dec-2009
[4636x2]
ok , you again found a solution to my specific problem :))
BUT .. what if I want to have controll there .. or if for the sake 
of example it's a more complex multicharacter difference like "<DOT>" 
"<EXCLAMATION>"
Graham
2-Dec-2009
[4638]
Janko, best thing to do is show us a  string you can't parse ... 
and someone will show you how to do it.
Janko
2-Dec-2009
[4639x4]
>> parse "I like Apple . I like Windows ! I like Linux . I like Amiga 
." [

[     some [ thru "I like" copy IT [to "." ( prin "so so: ") | to 
"!" (prin "v
ery much: ") ] (print IT) ]]
so so:  Apple
so so:  Windows ! I like Linux
so so:  Amiga
I don't have real example right now :) I had them few times before 
and I also asked here about them and I solved with your help somehow
I just started talking about this as a general limitation of parse 
that I meed a lot of times and I suppose Paul could of meet it when 
trying to parse CSV
janko

,"some\"thing92!","graham" I am not sure but I think here you have 
the same problem
Gregg
2-Dec-2009
[4643x3]
It's not necessarily a PARSE limitation, but there are things we'd 
like PARSE to do that aren't always reasonable. :-)


TO and THRU can work very well, but that doesn't mean they'll work 
for every situation. You may have to use rules where you check for 
your target value or just SKIP, marking locations in the input as 
you go.
CSV parsing is an issue, because REBOL handles some inputs well, 
but fails for what may be a common way things are formatted. "CSV" 
isn't always as simple as it sounds.
That said, if you know the format (e.g. WRT quotes and escapes), 
it can be done with PARSE. It just may not be a one-liner.
Janko
2-Dec-2009
[4646x2]
I know parsing csv can be messy ... at least at this high level I 
don't know how to do it with escapes and commas in etc
and I know everything has limitations ... this functionality OR with 
taking the first that appears would just in practice solve me many 
cases
Graham
2-Dec-2009
[4648]
you have to turn off parse's default delimiters and use bitsets
Janko
2-Dec-2009
[4649]
(aha bitsets.. I was calling them charsets upthere)
Graham
2-Dec-2009
[4650x2]
BTW, Bolek wrote a regex engine in Rebol ...
http://www.mail-archive.com/[rebol-bounce-:-rebol-:-com]/msg01983.html
Ladislav
2-Dec-2009
[4652]
Janko: the only problem is, that you cannot use:

C: [to [A | B]]

, where A and B are "general rules", but you can always write:

C: [here: [A | B] :here | skip C]

, which would do what you want
Oldes
2-Dec-2009
[4653x2]
Just would like to remember that there is something like R3 where:

>> parse "I like Apple . I like Windows ! I like Linux . I like Amiga 
." [any ["I like " copy x to [" ." | " !"] (probe x) to "I like "]]
Apple
Windows
Linux
Amiga
And Janko... if you don't use charsets at all, I think you should 
give it a try. It's not so difficult. I think that if I can write 
parser to colorize PHP code, than you can parse everything.
Janko
3-Dec-2009
[4655x4]
Ladislav, thanks.. I didn't know you could set the position back 
with :here , that is interesting and probably expands what you can 
do with parse a lot.
Oldes if that is in R3 >> copy x to [" ." | " !"]  << this is exactly 
as I was proposing above :) , very nice!


I know I have to .. I haven't really needed them yet I guess, I solved 
some things less elegantly in other ways without them. I intend to 
take the plunge next time I need them.
yes, you are right .. if you can write partser for php then you can 
make anything with it. I always supposed parse with charsets is like 
low level step by one char in a looop and call "events" and change 
states , with which you can parse anything from xml to languages 
.. well but parse with charsets is still much more elegant
but it is a level less simple and nice to use than simple parse modes 
that's why the simple ones should be powerfull *if possible* too 
- you can't get a newbie impressed with charset parsing because he 
won't understand it probably.
Ladislav
3-Dec-2009
[4659x2]
Just to complete the list of possible equivalents to the

    C: [to [A | B]]

rule, here is a way how to do it in Rebol3 parse:

    C: [while [and [A | B] break | skip | reject]]


you can find other equivalent idioms at http://en.wikibooks.org/wiki/REBOL_Programming/Language_Features/Parse#Parse_idioms
I didn't know you could set the position back with :here

 - you can set the position back even without :here, the choice operator 
 is sufficient for you to be able to do that, see the above idioms 
 as an example