World: r3wp
[Parse] Discussion of PARSE dialect
older newer | first last |
Graham 2-Dec-2009 [4630] | If you don't know what pattern the data is .. you can't parse it with anything. |
Janko 2-Dec-2009 [4631x4] | parse "This is Apple . This is Windows ! This is Linux . This is Amiga ." [ some [ "This is" copy IT (print IT) to [ "." | "!" ] ] |
The pattern is known ... the scentence starts with this is and can end with . or ! but they can come in any order .. if you try to parse with "." first you will get ---- ops some errors upthere .. just a sec | |
>> parse "This is Apple . This is Windows ! This is Linux . This is Amiga ." [ some [ thru "This is" copy IT [to "." | to "!" ] (print IT) ]] Apple Windows ! This is Linux Amiga | |
this is the common to all problems where that I am describing .. if I had > to [ "." | "!" ] and parse would find both and go to the one that is closer it would be solved. | |
Graham 2-Dec-2009 [4635] | charset [ #"!" #"." ] |
Janko 2-Dec-2009 [4636x2] | ok , you again found a solution to my specific problem :)) |
BUT .. what if I want to have controll there .. or if for the sake of example it's a more complex multicharacter difference like "<DOT>" "<EXCLAMATION>" | |
Graham 2-Dec-2009 [4638] | Janko, best thing to do is show us a string you can't parse ... and someone will show you how to do it. |
Janko 2-Dec-2009 [4639x4] | >> parse "I like Apple . I like Windows ! I like Linux . I like Amiga ." [ [ some [ thru "I like" copy IT [to "." ( prin "so so: ") | to "!" (prin "v ery much: ") ] (print IT) ]] so so: Apple so so: Windows ! I like Linux so so: Amiga |
I don't have real example right now :) I had them few times before and I also asked here about them and I solved with your help somehow | |
I just started talking about this as a general limitation of parse that I meed a lot of times and I suppose Paul could of meet it when trying to parse CSV | |
janko ,"some\"thing92!","graham" I am not sure but I think here you have the same problem | |
Gregg 2-Dec-2009 [4643x3] | It's not necessarily a PARSE limitation, but there are things we'd like PARSE to do that aren't always reasonable. :-) TO and THRU can work very well, but that doesn't mean they'll work for every situation. You may have to use rules where you check for your target value or just SKIP, marking locations in the input as you go. |
CSV parsing is an issue, because REBOL handles some inputs well, but fails for what may be a common way things are formatted. "CSV" isn't always as simple as it sounds. | |
That said, if you know the format (e.g. WRT quotes and escapes), it can be done with PARSE. It just may not be a one-liner. | |
Janko 2-Dec-2009 [4646x2] | I know parsing csv can be messy ... at least at this high level I don't know how to do it with escapes and commas in etc |
and I know everything has limitations ... this functionality OR with taking the first that appears would just in practice solve me many cases | |
Graham 2-Dec-2009 [4648] | you have to turn off parse's default delimiters and use bitsets |
Janko 2-Dec-2009 [4649] | (aha bitsets.. I was calling them charsets upthere) |
Graham 2-Dec-2009 [4650x2] | BTW, Bolek wrote a regex engine in Rebol ... |
http://www.mail-archive.com/[rebol-bounce-:-rebol-:-com]/msg01983.html | |
Ladislav 2-Dec-2009 [4652] | Janko: the only problem is, that you cannot use: C: [to [A | B]] , where A and B are "general rules", but you can always write: C: [here: [A | B] :here | skip C] , which would do what you want |
Oldes 2-Dec-2009 [4653x2] | Just would like to remember that there is something like R3 where: >> parse "I like Apple . I like Windows ! I like Linux . I like Amiga ." [any ["I like " copy x to [" ." | " !"] (probe x) to "I like "]] Apple Windows Linux Amiga |
And Janko... if you don't use charsets at all, I think you should give it a try. It's not so difficult. I think that if I can write parser to colorize PHP code, than you can parse everything. | |
Janko 3-Dec-2009 [4655x4] | Ladislav, thanks.. I didn't know you could set the position back with :here , that is interesting and probably expands what you can do with parse a lot. |
Oldes if that is in R3 >> copy x to [" ." | " !"] << this is exactly as I was proposing above :) , very nice! I know I have to .. I haven't really needed them yet I guess, I solved some things less elegantly in other ways without them. I intend to take the plunge next time I need them. | |
yes, you are right .. if you can write partser for php then you can make anything with it. I always supposed parse with charsets is like low level step by one char in a looop and call "events" and change states , with which you can parse anything from xml to languages .. well but parse with charsets is still much more elegant | |
but it is a level less simple and nice to use than simple parse modes that's why the simple ones should be powerfull *if possible* too - you can't get a newbie impressed with charset parsing because he won't understand it probably. | |
Ladislav 3-Dec-2009 [4659x3] | Just to complete the list of possible equivalents to the C: [to [A | B]] rule, here is a way how to do it in Rebol3 parse: C: [while [and [A | B] break | skip | reject]] you can find other equivalent idioms at http://en.wikibooks.org/wiki/REBOL_Programming/Language_Features/Parse#Parse_idioms |
I didn't know you could set the position back with :here - you can set the position back even without :here, the choice operator is sufficient for you to be able to do that, see the above idioms as an example | |
It looks, that I could have used: C: [while [and [A | B] accept | skip | reject]] | |
Graham 3-Dec-2009 [4662x2] | Janko, charset is short for make bitset! so you can call them bitsets or charsets :) |
Ladislav, what 'choice operator? | |
BrianH 4-Dec-2009 [4664] | | |
jack-ort 11-Dec-2009 [4665] | Help! Still struggling to understand parse. How could I replace any and all SINGLE occurrences of the single-quote character anywhere in a string (beginning, middle or end) with TWO single-quotes? But if there are already TWO single-quotes together, I want to leave them alone. TIA for any and all help for a newbie! |
Maxim 11-Dec-2009 [4666x2] | easy, actually. you match double quotes first then fallback to single quotes, adding a new one and skiping one char... give me a minute I should get something working... |
R2? | |
jack-ort 11-Dec-2009 [4668] | yes, View 2.7.6 under Windows XP |
Steeve 11-Dec-2009 [4669x2] | >> parse/all str [ any [thru {"} [{"} | p: (insert p {"} skip) ]]] something like this (not tested) |
i think i misunderstood something, replace {"} by {'} maybe | |
Maxim 11-Dec-2009 [4671x2] | >> str: {1 ''2 '3 4 ' '5 ''6 '7 8 9 '0'} >> parse/all str [some [{''} | [{'} here: (insert here {'}) skip] | skip]] >> print str == {1 ''2 ''3 4 '' ''5 ''6 ''7 8 9 ''0''} |
note all ticks... ( ' ) are single quote chars in the above. | |
Steeve 11-Dec-2009 [4673] | same as mine, except i use THRU to speed up the process |
jack-ort 11-Dec-2009 [4674] | Thanks! I'm going to have to look @ this for awhile to understand why you even need to worry about the double-quote character. Much to learn.... Thanks Maxim and Steeve for the prompt replies! |
Maxim 11-Dec-2009 [4675] | print it out in the rebol console... you will see that my exampe doesn't nave any double quote characters.. they just look like so in altme's font ;-) |
Steeve 11-Dec-2009 [4676] | corrected version with thru: >> parse/all str [ any [thru {'} [{'} | p: (insert p {'} ) skip ]]] |
jack-ort 11-Dec-2009 [4677] | Ah! when you said "...you match double quotes first then fallback to single quotes, ..." I was thinking double-quote character, not double single-quotes. Need more coffee... Thanks very much! |
Maxim 11-Dec-2009 [4678] | ( I can see that being misleading when read hehehe :-) |
Rebolek 11-Dec-2009 [4679] | Just curious, I tested both versions and Steeve's version is about 2times faster than Maxim's :) |
older newer | first last |