World: r3wp
[Parse] Discussion of PARSE dialect
older newer | first last |
BrianH 6-Nov-2008 [2883] | Like this: parse port rule1 ; cache gone parse port rule2 ; picks up where the previous left off |
Anton 6-Nov-2008 [2884x3] | Yes, I was just thinking what would happen if you var:, DISPENSE, then :var afterwards. Should DISPENSE update the index of var (and any other vars) when the internal parse index shortens ? |
Does your incremental parse example assume that there is enough data in the cache to complete each rule ? | |
Maybe we should study Doc's postgresql driver :) | |
BrianH 6-Nov-2008 [2887x4] | Oh, set-words and get-words would not work the same with R3 ports. You wouldn't be able to use them the same way in the code blocks for instance. This is because for R3 ports the position is a property of the port rather than the port reference, so those set-words would be setting the word to the numeric index of the current position of that port, and the get-word would correspond to seeking to that index. In the code blocks the words would only contain integer! values - you would need a separate reference to the port for them to make sense. |
The new port model would make PARSE on ports completely different. You would only be able to parse seekable ports if you want to use set-words and get-words, and you might just be able to rely on the internal port caching. This might be easier than we thought. | |
In theory you could even do something like block parsing on event ports, like SAX pull. Same seekable restrictions apply - no backtracking or position setting or getting unless the port supports seeking. | |
That would shunt the cache management into the port scheme :) | |
Anton 6-Nov-2008 [2891x2] | Ah, that makes sense. My model of how parse would handle ports was wrong. I was assuming it would work just like string parse, except working on a limited buffer, supplied by the port. |
Block parsing ? How are you going to do that when you can't even see the final ']' in the buffer yet ? | |
BrianH 6-Nov-2008 [2893x2] | With seekable ports the buffering is handled by the ports, rather than provided by them. I wonder if there will be cache control APIs :) |
By "something like block parsing", I mean ports that return other REBOL values than bytes or characters can be parsed as if the values were contained in a block and being parsed there. Any buffering of these values would be handled by the port scheme code. Only whole REBOL values would be returned by such ports, so any inner blocks returned would be parsed by INTO as actual blocks. | |
Anton 6-Nov-2008 [2895] | Hmm.. that could work. I suppose the outermost block that usually encompasses loaded rebol data would have to be "ignored". |
BrianH 6-Nov-2008 [2896x2] | No, it would be virtual :) |
Actually, there are no [ and ] in REBOL blocks once they are loaded. Block parse works on data structures. | |
Anton 6-Nov-2008 [2898] | 'Virtual' is the right word. |
Pekr 6-Nov-2008 [2899] | I thought along the Anton's thoughts - that it would work like parsing a string, using some limited buffer ... |
BrianH 6-Nov-2008 [2900x2] | Ports don't work like series in R3. If anything, port PARSE would simplify port handling by making seekable ports act more like series. |
I gotta suggest this to Carl :) | |
Anton 6-Nov-2008 [2902] | At least if you could add "3.12 port parsing" to the Parse_Project page... :) |
Pekr 6-Nov-2008 [2903] | OTOH - I never did some binary format parsing. Oldes has some experience here IIRC. Dunno how encoders/decoders will be built, maybe those will be in native C code anyway ... |
Tomc 6-Nov-2008 [2904] | the potential for backtracking is initiated by setting a placeholder i.e. :here caching only as far back as the earliest current placeholder may be sufficent |
BrianH 6-Nov-2008 [2905x5] | There are three operations that can cause you to change your position from the standard foward-on-recognition: get-words (:a), alternation ( | ) and REVERSE. You can check for alternation because it will always be within the current rule block. Get-words and REVERSE may be in inner blocks that may change. |
Here's an example of what you could do with the PARSE proposals: use [r d f] [ ; External words from standard USE statement parse f: read d: %./ r: [ use [d1 f p] [ ; These words override the outer words any [ ; Check for directory filename (d1: d) ; This maintains a recursive directory stack p: ; Save the position change [ ; This rule must be matched before the change happens ; Set f to the filename if it is a directory else fail set f into file! [to end reverse "/" to end] ; f is a directory filename, so process it ( d: join d f ; Add the directory name to the current path f: read d ; Read the directory into a block ) ; f is now a block of filenames. ] f ; The file is now the block read above :p ; Go back to the saved position into block! r ; Now recurse into the new block (d: d1) ; Pop the directory stack ; Otherwise backtrack and skip | skip ] ; end any ] ; end use ] ; end parse f ; This is the expanded directory block ] | |
I could probably save that p position word using FAIL and backtracking :) | |
Here's an revised version with more of the PARSE proposals: use [r d res] [ ; External words from standard USE statement parse res: read d: %./ r: [ use [ds f] [ ; These words override the outer words any [ ; Check for directory filename (ds: d) ; This maintains a recursive directory stack [ ; Save the position through alternation change [ ; This rule must be matched before the change happens ; Set f to the filename if it is a directory else fail set f into file! [to end reverse "/" to end] ; f is a directory filename, so process it ( d: join d f ; Add the directory name to the current path f: read d ; Read the directory into a block ) ; f is now a block of filenames. ] f ; The file is now the block read above fail ; Backtrack to the saved position | into block! r ; Now recurse into the new block ] (d: ds) ; Pop the directory stack ; Otherwise backtrack and skip | skip ] ; end any ] ; end use ] ; end parse res ; This is the expanded directory block ] | |
Sorry, somehow those became tabs :( | |
Pekr 6-Nov-2008 [2910] | Don't know why, but most of the time when parsing CSV structure I have to do something like: parse/all append item ";" ";" Simply put, to get all columns, I need to add the last semicolon to the input string ... |
BrianH 6-Nov-2008 [2911] | Show an example string that requires that hack and maybe we can help. |
Pekr 6-Nov-2008 [2912] | http://www.rebol.net/cgi-bin/rambo.r?id=3813& |
BrianH 6-Nov-2008 [2913] | I remember that. It shouldn't be as much of a problem when the ordinal functions return none rather than out-of-bounds errors.... Still, I'll bring it up. |
Tomc 6-Nov-2008 [2914] | comes from data using seperators instead of terminators ... I use '| and have a command line "tailpipe" script to fix data |
Steeve 7-Nov-2008 [2915] | is that all folks ? |
BrianH 7-Nov-2008 [2916x5] | Aside from a bugfix in the last example I gave (forgot the only) I would say yes for now. There will be more changes when Carl gets back to this so that we can discuss his proposals. Everyone else's proposals seem to have been covered except THROW (which also need Carl feedback). Incorporating COLLECT and KEEP into PARSE is both unnecessary and doesn't help at all for building hierarchical structures. PARSE doesn't have anything to do with parsing REBOL's syntax, so Graham's problems are out-of-scope. If you have more ideas this or the same group in the alpha world are the places to bring them up. |
Changes to simple parsing (not rule-based) are out of scope, but have been brought up nonetheless. Parsing or ports is also out of scope for the proposals document, but will also be brought up. Everything in Gabriele's PARSE REP page has been covered or rejected (except THROW). | |
Here is the page with the PARSE syntax requests - see for yourself: http://www.rebol.net/wiki/Parse_Project | |
Parsing or ports -> Parsing of ports | |
That page is it unless we get more suggestions. We haven't decided what makes the cut yet even for those. | |
Steeve 7-Nov-2008 [2921x2] | hum (i have to be a little bit rude), i just read your response on rebol.net about the opportunity to turn or not return into a more genralized EMIT functions (as i proposedl). I will not discuss about the difficulty to implement that idea (i don't have the sources). But what i can say, is that a COLLECT behaviour will be more usefull than all return break/return stuffs u posted. Have you inspected scripts in Rebol.org recently ? If u had done, you would see that many coders use parsing to collect data. The problem Graham, is that when i read your arguments, i have the unpleasant impression that your are alone to decide if an idea is bad or good. The narrow minded sentence " Incorporating COLLECT and KEEP into PARSE is both unnecessary and doesn't help at all for building hierarchical structures" suggest that you had not widely used parse in your code. I don't think you are the best people here to made these choices. Many script contributors on Rebol.org have made some masterfull piece using parse (not you). So when you reject an idea you should be more sensitive with this simple fact: many poeple here have an equal or better experience whit parsing than you. |
by the way, many people have proposed the idea you posted in the wiki (just read some scripts on Rebol.org) you should be a little bit less quick to credit you of ideas that are here since several years. | |
Anton 7-Nov-2008 [2923] | (Steeve, I think you are addressing BrianH, not Graham.) |
Steeve 7-Nov-2008 [2924x3] | really ? |
oh my... | |
yes it talk to BrianH, what do u mean ? | |
Anton 7-Nov-2008 [2927] | You wrote above, "The problem Graham, is that when i read your arguments..." |
Steeve 7-Nov-2008 [2928x2] | oh i see, my Apologies to Graham |
I was a little upset when I wrote it ;-) | |
Pekr 8-Nov-2008 [2930x2] | uhmm, well, Steeve, as for me, if my proposal is going to be implemented, I don't care if I am credited or not. Because - parser REPs are floating here or there for some 8 years maybe :-) As for BrianH and his judgements - he might not be better in parse than others, but I would not try to upset him - BrianH is our guru here. Along with Gabriele, Cyphre, and after loss of Ladislav, he is one of the most skilled rebollers. I think that his intention is to help REBOL being better. He might be also the one, who will bring JIT or compiler in the future, and he understand consequences of what he suggests ... |
I have to ask - what ppl are you referring to, regarding rebol.org? Why they are not here, or posting to blog? BrianH might be quick in his decision, because Carl selected him to collect the ideas, so let's forgive him a little bit of guru behaviour :-) And in the end, it is Carl who decides, if REP is going to be implemented or not. If you have another pov on some REP, why not to talk about it here, where more ppl can judge? | |
BrianH 8-Nov-2008 [2932] | I'm not angry, promise :) |
older newer | first last |