r3wp [groups: 83 posts: 189283]
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

World: r3wp

[Parse] Discussion of PARSE dialect

Anton
6-Nov-2008
[2870]
Yes, that's another mode, suitable for files (but not internet radio).
Pekr
6-Nov-2008
[2871]
I was thinking about Amiga like datatypes, done in REBOL. Such decoders 
could be slow though ....
BrianH
6-Nov-2008
[2872]
Interesting, but you wouldn't need DISPENSE if your rules don't have 
alternates to backtrack to (statically determinable).
Anton
6-Nov-2008
[2873x2]
Yeah, I hadn't thought of that.
Perhaps there are cases where alternates are not a good method of 
determining when to dispense buffer data ?
BrianH
6-Nov-2008
[2875x2]
Of course "statically determinable" means that you wouldn't be able 
to modify the rule block that PARSE is currently working on (which 
would likely crash PARSE anyways).
Well, if you have no alternate, you have no backtracking, so you 
can dispose on the way.
Anton
6-Nov-2008
[2877]
What about REVERSE ?
BrianH
6-Nov-2008
[2878]
Ah, that would require buffering. Darn.
Anton
6-Nov-2008
[2879]
And also set-words..
BrianH
6-Nov-2008
[2880]
We're back to Robert's "cache the whole thing" :(
Anton
6-Nov-2008
[2881]
and Anton's DISPENSE when you know you ain't goin' backwards from 
here.
BrianH
6-Nov-2008
[2882x2]
And crashing PARSE if you DISPENSE something that you need to go 
back to. That might be better done with incremental parsing.
Like this:
    parse port rule1
    ; cache gone
    parse port rule2 ; picks up where the previous left off
Anton
6-Nov-2008
[2884x3]
Yes, I was just thinking what would happen if you var:, DISPENSE, 
then :var afterwards.

Should DISPENSE update the index of var (and any other vars) when 
the internal parse index shortens ?
Does your incremental parse example assume that there is enough data 
in the cache to complete each rule ?
Maybe we should study Doc's postgresql driver :)
BrianH
6-Nov-2008
[2887x4]
Oh, set-words and get-words would not work the same with R3 ports. 
You wouldn't be able to use them the same way in the code blocks 
for instance. This is because for R3 ports the position is a property 
of the port rather than the port reference, so those set-words would 
be setting the word to the numeric index of the current position 
of that port, and the get-word would correspond to seeking to that 
index. In the code blocks the words would only contain integer! values 
- you would need a separate reference to the port for them to make 
sense.
The new port model would make PARSE on ports completely different. 
You would only be able to parse seekable ports if you want to use 
set-words and get-words, and you might just be able to rely on the 
internal port caching. This might be easier than we thought.
In theory you could even do something like block parsing on event 
ports, like SAX pull. Same seekable restrictions apply - no backtracking 
or position setting or getting unless the port supports seeking.
That would shunt the cache management into the port scheme :)
Anton
6-Nov-2008
[2891x2]
Ah, that makes sense. My model of how parse would handle ports was 
wrong. I was assuming it would work just like string parse, except 
working on a limited buffer, supplied by the port.
Block parsing ? How are you going to do that when you can't even 
see the final ']'  in the buffer yet ?
BrianH
6-Nov-2008
[2893x2]
With seekable ports the buffering is handled by the ports, rather 
than provided by them. I wonder if there will be cache control APIs 
:)
By "something like block parsing", I mean ports that return other 
REBOL values than bytes or characters can be parsed as if the values 
were contained in a block and being parsed there. Any buffering of 
these values would be handled by the port scheme code. Only whole 
REBOL values would be returned by such ports, so any inner blocks 
returned would be parsed by INTO as actual blocks.
Anton
6-Nov-2008
[2895]
Hmm.. that could work. I suppose the outermost block that usually 
encompasses loaded rebol data would have to be "ignored".
BrianH
6-Nov-2008
[2896x2]
No, it would be virtual :)
Actually, there are no [ and ] in REBOL blocks once they are loaded. 
Block parse works on data structures.
Anton
6-Nov-2008
[2898]
'Virtual' is the right word.
Pekr
6-Nov-2008
[2899]
I thought along the Anton's thoughts - that it would work like parsing 
a string, using some limited buffer ...
BrianH
6-Nov-2008
[2900x2]
Ports don't work like series in R3. If anything, port PARSE would 
simplify port handling by making seekable ports act more like series.
I gotta suggest this to Carl :)
Anton
6-Nov-2008
[2902]
At least if you could add "3.12 port parsing" to the Parse_Project 
page... :)
Pekr
6-Nov-2008
[2903]
OTOH - I never did some binary format parsing. Oldes has some experience 
here IIRC. Dunno how encoders/decoders will be built, maybe those 
will be in native C code anyway ...
Tomc
6-Nov-2008
[2904]
the potential for backtracking is initiated by setting a placeholder 
  i.e. :here  

caching only as far back as the earliest current placeholder may 
be sufficent
BrianH
6-Nov-2008
[2905x5]
There are three operations that can cause you to change your position 
from the standard foward-on-recognition: get-words (:a), alternation 
( | ) and REVERSE. You can check for alternation because it will 
always be within the current rule block. Get-words and REVERSE may 
be in inner blocks that may change.
Here's an example of what you could do with the PARSE proposals:

use [r d f] [ ; External words from standard USE statement
    parse f: read d: %./ r: [
        use [d1 f p] [ ; These words override the outer words
            any [
            ; Check for directory filename

                (d1: d) ; This maintains a recursive directory stack
                p: ; Save the position

                change [ ; This rule must be matched before the change happens

                    ; Set f to the filename if it is a directory else fail
                    set f into file! [to end reverse "/" to end]
                    ; f is a directory filename, so process it
                    (

                        d: join d f ; Add the directory name to the current path

                        f: read d   ; Read the directory into a block
                    )
                    ; f is now a block of filenames.
                ] f ; The file is now the block read above
                :p  ; Go back to the saved position
                into block! r ; Now recurse into the new block
                (d: d1) ; Pop the directory stack
            ; Otherwise backtrack and skip
                | skip
            ] ; end any
        ] ; end use
    ] ; end parse
    f ; This is the expanded directory block
]
I could probably save that p position word using FAIL and backtracking 
:)
Here's an revised version with more of the PARSE proposals:

use [r d res] [ ; External words from standard USE statement
    parse res: read d: %./ r: [
        use [ds f] [ ; These words override the outer words
            any [
            ; Check for directory filename

                (ds: d) ; This maintains a recursive directory stack
                [ ; Save the position through alternation

                    change [ ; This rule must be matched before the change happens

                        ; Set f to the filename if it is a directory else fail

                        set f into file! [to end reverse "/" to end]

                        ; f is a directory filename, so process it
                        (

                            d: join d f ; Add the directory name to the current path

                            f: read d   ; Read the directory into a block
                        )
                        ; f is now a block of filenames.
                    ] f ; The file is now the block read above
					fail ; Backtrack to the saved position
					|
					into block! r ; Now recurse into the new block
				]
                (d: ds) ; Pop the directory stack
            ; Otherwise backtrack and skip
                | skip
            ] ; end any
        ] ; end use
    ] ; end parse
    res ; This is the expanded directory block
]
Sorry, somehow those became tabs :(
Pekr
6-Nov-2008
[2910]
Don't know why, but most of the time when parsing CSV structure I 
have to do something like:

parse/all append item ";" ";" 


Simply put, to get all columns, I need to add the last semicolon 
to the input string ...
BrianH
6-Nov-2008
[2911]
Show an example string that requires that hack and maybe we can help.
Pekr
6-Nov-2008
[2912]
http://www.rebol.net/cgi-bin/rambo.r?id=3813&
BrianH
6-Nov-2008
[2913]
I remember that. It shouldn't be as much of a problem when the ordinal 
functions return none rather than out-of-bounds errors....
Still, I'll bring it up.
Tomc
6-Nov-2008
[2914]
comes from data using seperators instead of terminators ... I use 
'|  and have  a command line "tailpipe" script to fix data
Steeve
7-Nov-2008
[2915]
is that all folks ?
BrianH
7-Nov-2008
[2916x4]
Aside from a bugfix in the last example I gave (forgot the only) 
I would say yes for now. There will be more changes when Carl gets 
back to this so that we can discuss his proposals. Everyone else's 
proposals seem to have been covered except THROW (which also need 
Carl feedback). Incorporating COLLECT and KEEP into PARSE is both 
unnecessary and doesn't help at all for building hierarchical structures. 
PARSE doesn't have anything to do with parsing REBOL's syntax, so 
Graham's problems are out-of-scope. If you have more ideas this or 
the same group in the alpha world are the places to bring them up.
Changes to simple parsing (not rule-based) are out of scope, but 
have been brought up nonetheless. Parsing or ports is also out of 
scope for the proposals document, but will also be brought up. Everything 
in Gabriele's PARSE REP page has been covered or rejected (except 
THROW).
Here is the page with the PARSE syntax requests - see for yourself: 
http://www.rebol.net/wiki/Parse_Project
Parsing or ports -> Parsing of ports