World: r3wp
[Parse] Discussion of PARSE dialect
older newer | first last |
Maxim 27-Oct-2006 [1520x6] | ex: I am parsing : ABCZXYXYCBA |
I have rules to parse ABC explicitely and a fall back which can parse anything. | |
but I'd like to detect in my fall-back that it should stop, cause I know I'm at the end. | |
the length of the string is not known, until I hit the trailing CBA | |
note... the example is simple and consider each character a different matching condition. | |
also, in reality, each letter in the above over-simplification is a word... not just one char (and there is overlap) so I can't just match charsets. | |
Gabriele 27-Oct-2006 [1526] | is BREAK what you are looking for? |
Coccinelle 27-Oct-2006 [1527x2] | I'm not sure if I understand, but perhaps this : |
rule: ["CBA" | skip rule] parse "ABCZXYXYCBA" ["ABC" rule] | |
Ladislav 28-Oct-2006 [1529x2] | Maxim: I suppose that the trouble is, that your fall-back rule accepts empty string? If that is the case, then the rule like: [any [fall-back]] is an "endless loop". Therefore you may need something like: [any [end break | fall-back]] to be able to stop |
on the other hand, this may not be enough in some cases (e.g. if the fall-back rule isn't able to get to the end) | |
Maxim 28-Oct-2006 [1531] | the break seems to be what I am looking for,I'll test something out and if its not conclusive I will come back with a better example :-) thanks guys. |
Graham 25-Nov-2006 [1532] | Posted on reboltalk ... >> parse/case "AAABBBaaaBBBAAAaaa" "A" == ["" "" "" "BBBaaaBBB" "" "" "aaa"] how come there are only two "" after the BBBaaaBBB ? |
Henrik 25-Nov-2006 [1533] | >> parse/case "AAABBBaaaAAA" "A" == ["" "" "" "BBBaaa" "" ""] >> parse/case "BAAABBBaaaAAA" "A" == ["B" "" "" "BBBaaa" "" ""] >> parse/case "BA" "A" == ["B"] hmmm... |
Ladislav 25-Nov-2006 [1534] | it's OK, because every A means one closing #"^"". The first A was used to close the "...a" string |
Anton 25-Nov-2006 [1535] | Yep, makes sense to me. |
Ingo 26-Nov-2006 [1536] | This may make it easier for some, just exchange the "A"s for "," and mentally read it like you would read a csv file: >> parse/case ",,,BBBaaaBBB,,,aaa" "," == ["" "" "" "BBBaaaBBB" "" "" "aaa"] |
Anton 26-Nov-2006 [1537] | It's like cutting a piece of wood. You only cut twice but you end up with three pieces. |
Maxim 26-Nov-2006 [1538] | but parse does have an inconsistency: >> parse/all "/1/2/3/" "/" == ["" "1" "2" "3"] >> parse/all "/1/2/3" "/" == ["" "1" "2" "3"] two different strings on entry, the same output. IMHO the first example shoul have an extra trailing "" in the block. |
Anton 26-Nov-2006 [1539] | Is that an inconsistency or are we just not sure what the definition of the separator string is ? |
Maxim 26-Nov-2006 [1540] | huh? not sure get what you mean... how can the above be desired? it mangles symmetricity of data and tokenizing? for example it strips end / of a dir... |
Anton 26-Nov-2006 [1541] | I'm with you, but what is the documented definition of the parse separator ? |
Maxim 27-Nov-2006 [1542] | the function's doc string doesn't even mention it ! its a special mode ... in the dict it says: There is also a simple parse mode that does not require rules, but takes a string of characters to use for splitting up the input string. so not very explicit. |
Anton 27-Nov-2006 [1543x2] | That's pretty much how I remember it. |
So the problem might be that we don't know how it's supposed to work. Maybe the implementor wasn't too clear how it should work either. From memory there was an "inconsistent case" which actually had a use - for something like splitting command-line args. But anyway, a clearer definition would be good. | |
Maxim 27-Nov-2006 [1545] | at least the above oddity should be documented, cause one can get bitten until encountering the above... in my case, it renders the above almost useless, as I cannot trust the output. |
Gabriele 27-Nov-2006 [1546] | that parse mode was intended to make parsing CSV easier. may not work with all the CSV variants though. |
Maxim 27-Nov-2006 [1547] | do you agree that the docs are misleading in their current form? |
Gabriele 27-Nov-2006 [1548] | they are at least incomplete. |
Anton 27-Nov-2006 [1549] | Better to have a simple and consistent core and enable particular modes for specific uses with refinements. |
Pekr 5-Dec-2006 [1550x2] | I would like to ask - could there be anything done to produce parsers for XML related MLs? Or do you guys find existing parse facilities strong enough, and simply put XML is too complex, that we lack full XML spec parser? |
Just asking, because today I read a bit about ODF and OpenXML (two document formats for office apps). There is probably open space for small apps, parsing some info from inside the documents etc. (meta-data programming) ... just curious ... or will it be better to wait for full-spec XML MLs libs, doing the job given, and link to those libraries? | |
BrianH 5-Dec-2006 [1552] | Such a thing has been on my todo list for a while, but I've been a little busy lately with non-REBOL projects :( |
Gregg 5-Dec-2006 [1553] | I don't want to deal with XML beyond simple well-formed XML, too complex. I don't, personally, have any interest in doing generic XML toolkit stuff at this point. I can see value in it for some people, but I'd rather write REBOL dialects. :-) |
Maxim 8-Dec-2006 [1554x2] | geomol's xml2rebxml handles XML pretty well. one might want to change the parse rules a little to adapt the output, but it actually loads all the xml tags, empty tags and attributes. it even handles utf-8, CDATA chunks, and converts some of the & chars. |
I am using an adapted form of it commercially so far. I have implemented full schema validation and loading (in rebol) but its proprietary code I can't release. So guys, it can be done ! | |
Allen 10-Dec-2006 [1556] | I'm starting to see some abandonment of XML in favour of JSON .. mainly in web 2.0 . but it will not replace xml where validation is required. |
BrianH 11-Dec-2006 [1557] | You really have to trust your source when using JSON to a browser though. Standard usage is to load with eval - only safe to use on https sites because of script injection. |
[unknown: 9] 11-Dec-2006 [1558] | XML and JSON sucks... |
Maxim 11-Dec-2006 [1559] | is there a way to make block parsing case sensitive? this doesn't seem to work: parse/case [A a] [some ['A (print "upper") | 'a (print "lower")]] |
Gabriele 11-Dec-2006 [1560x2] | words are not case sensitive. |
>> strict-equal? 'A 'a == true | |
Maxim 11-Dec-2006 [1562x3] | I was just hoping case could have been an exception... it would be very usefull especially when parsing code from other languages... |
(I meant using /case within parse) | |
well, seems like I'll be doing string parsing then :-) | |
Gabriele 11-Dec-2006 [1565x3] | you could take advantage of this bug: |
>> alias 'a "aa" == aa >> strict-equal? 'A 'a == false | |
but it will be fixed eventually :P | |
Maxim 11-Dec-2006 [1568x2] | hehe... I would not want the bug to get too comfortable, less it becomes a feature ;-) |
you know what they say... "features are bugs with experience" | |
older newer | first last |