r3wp [groups: 83 posts: 189283]
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

World: r3wp

[Parse] Discussion of PARSE dialect

Henrik
5-Mar-2008
[2472]
well, he doesn't like the serialization syntax and he won't reduce 
which is a security problem (always wise though)
btiffin
5-Mar-2008
[2473]
reduce/only is safe for that no?
Henrik
5-Mar-2008
[2474]
evaluating words can still be unsafe
btiffin
5-Mar-2008
[2475]
Gee, I guess to be secure you need   reduce/only exclude query system/words 
[integer! string! ...]
Henrik
5-Mar-2008
[2476]
or just act on words in your dialect :-)
btiffin
5-Mar-2008
[2477]
Yeah, but ...  :)
Ingo
5-Mar-2008
[2478]
I know it's already been beaten to death, but I guess you don't want 
to support all of rebols datatypes, so what is wrong with listing 
them explicitly?

>> types: ['string! | 'integer! ]                         
== ['string! | 'integer!]
>> data: ["age" integer! "name" string!]                        
== ["age" integer! "name" string!]
>> data2: ["age" integer! "name" string! "gobbledygook" object!]
== ["age" integer! "name" string! "gobbledygook" object!]
>> parse data [some [string! types]]                            
== true
>> parse data2 [some [string! types]]                           
== false
Gregg
5-Mar-2008
[2479]
I'm with Ingo on this. And as far as "being simple", this isn't really. 
:-) When I've needed to parse for datatypes, I either reduce/compose 
or set up rules for the types.
[unknown: 5]
5-Mar-2008
[2480x4]
Henrik, pretty much all of them.
Hi Ingo, I'm planning on supporting most of the REBOL datatypes which 
is very long when you consider that REBOL has 54 of them.
So setting types to all of those is not very efficient.  At this 
point using parse to do this is as Gregg said not "simple..
So my next question is if we were to wish for something to be added 
to REBOL to make this task easier and submit it to RAMBO what would 
be the best way to describe what is desired?
BrianH
5-Mar-2008
[2484x3]
We have already put together a set of requests to enhance PARSE. 
This problem could be solved by at least 3 of them.
You should probably exclude function types from your acceptable types 
to store in your database, as well as library! and a few others.
Right now, the only thing that is protecting REBOL from serialized 
functions and objects is the fact that their bindings are not deserialized 
properly. Small blessings, I guess. In the meantime, screen your 
data.
[unknown: 5]
5-Mar-2008
[2487]
Right now I have a solution in place for the database and have decided 
to continue to allow the types to be inputted.  The pro outweight 
the cons in my opinion with my application.
Gregg
6-Mar-2008
[2488]
So setting types to all of those is not very efficient.

 -- Do you mean in the parsing, or in the time it takes to set up 
 the rule(s)?
BrianH
6-Mar-2008
[2489]
You could write a script to generate the rules. It could be faster 
than writing them directly.
[unknown: 5]
6-Mar-2008
[2490]
I'm not worried about the coding, I'm concerned about the performance. 
 If I have to parse a million records or something then anything 
that cuts down on the amount of evaluation is necessary.
BrianH
6-Mar-2008
[2491x4]
I'm a little curious as to why you need to have the datatype of a 
field referenced in the record at all, if you are just using the 
REBOL data model. Wouldn't the data itself have a type? It seems 
to me that specified datatypes of fields would only need to be specified 
once per table.
This assumes that you aren't taking advantage of REBOL's type system 
to do SQLite-style manifest typing.
If you are doing type specifications to validate records, the fastest 
way to do it is to generate static validation rules based on the 
specification, then just apply the generated per row. Static validation 
rules would be faster than dynamic.
generated per row -> generated rule per row
[unknown: 5]
6-Mar-2008
[2495]
Brian, in my TRETBASE for example when a new table is created then 
one must set the fields and their datatypes such as:

["fname" string! "lname" string! "age" integer!]


but it will always be a format of [string! datatype! string datatype!....]
BrianH
6-Mar-2008
[2496]
That is the table spec, right? Not the row data?
[unknown: 5]
6-Mar-2008
[2497]
I have already got a solution for TRETBASE.
JohanAR
6-Mar-2008
[2498x2]
is it possible to write a parse rule that accepts something like 
[ "test" | 123 ] ?
damnit, found out already.. | was apparently a word! :D
BrianH
6-Mar-2008
[2500]
["test" '| 1 1 123]
JohanAR
16-Mar-2008
[2501]
I think my parse rules use lots of temporary variables.. How do you 
prefer to hide these?
BrianH
16-Mar-2008
[2502]
1: Hide them from whom, and why?

In general, if you want to hide something about your parse rules, 
you need to hide the parse rules altogether. That is not to say that 
it is a good idea; I've found that in most cases that someone wants 
to hide some code or variables in REBOL, they really want to do something 
else and the something else depends on the circumstances. What do 
you hope to accomplish?

2: You have to be careful with temporary variables.

REBOL parse rules are often recursive, and the temporary variables 
used with them are not. You have to be extra careful to not recurse 
to another trip through the same parse rule before you are done with 
the temporary variables in the first round, or put off setting the 
temps until just before they are used. It's not as hard as it sounds.
JohanAR
16-Mar-2008
[2503x2]
1. Hide them from myself :) I don't mind having lots of global variables 
in a small script, but I really don't like it in larger programs. 
To keep things well organized I prefer if variables aren't valid 
in a larger context than necessary, to avoid overwriting, accidental 
use etc. Does context [ ... ] add alot of overhead btw? Maybe I should 
try to use that more often
2. I don't use alot of recursion so far. some [...] usually works 
equally well in my applications. But it's definitely a valid point, 
and I'll try to keep it in mind
BrianH
16-Mar-2008
[2505]
The only execution overhead of context is when it is built - nothing 
extra at runtime. The memory overhead is minimal. Every word is defined 
in a context, even the global ones. Overall, using an object to wrap 
the temporary variables that your rules use is not a bad idea. As 
long as you are doing this to better manage your program and reduce 
the scope of errors, it is great.
btiffin
16-Mar-2008
[2506]
context [ ]  is just a shortcut for  make object! [ ]   and it's 
great.  The more we hide in objects the easier it will be share, 
or at the least, easier to use code from a variety of developer sources. 
 Programming in the Many is important  in our context as there are 
relativily few of us in the "many" - so far.  So when even our small 
stuff is shareable we all  win.
Gregg
17-Mar-2008
[2507]
I often use contexts with parsers, to contain the rules.
Oldes
17-Mar-2008
[2508]
what about 'use

tmp: 1 use [tmp][ parse "test" [copy tmp to end (probe tmp) ]] probe 
tmp
BrianH
17-Mar-2008
[2509x3]
Does a bind/copy on its code block every time it is used.
That kind of overhead is usually only worth it when you can't get 
rid of concurrent use any other way.
Wait, USE may not copy in R2 - that could be even worse.
Oldes
17-Mar-2008
[2512]
I should probably not to use the code evaluation so much directly 
in the parse rule block and rather call a function if I need a lot 
of temp variables to process the action.
Henrik
28-Apr-2008
[2513x2]
>> parse [>] [>]
== false
>> parse [>] ['>]
** Syntax Error: Invalid word -- '>

How do you parse that block?
(note this block can only be made without a space at the end in rebol 
2.7)
Oldes
28-Apr-2008
[2515]
I'm using help words like:
	slash: to-lit-word first [/]
	dslash: to-lit-word "//"
	rShift: to-lit-word ">>"
	UrShift: to-lit-word ">>>"
	_greater: to-lit-word ">"
	_less: to-lit-word "<"
	_noteql: to-lit-word "<>"
	_lesseql: to-lit-word "<="
	_greatereql: to-lit-word ">="
Henrik
28-Apr-2008
[2516]
nice, thanks
Oldes
28-Apr-2008
[2517]
>> parse to-block load "<" [_less]
== true
Henrik
28-Apr-2008
[2518]
yep, works
Henrik
11-May-2008
[2519x3]
if I have a rule-block that does not exist in the same context as 
the main parse block, is there a simple way to rebind it without 
composing it into the main parse block? my current solution is to 
bind it to a temp block and use the temp block as a rule in the main 
parse block, which is less than optimal, I think.
set 'html-gen func [
    "Low level HTML dialect"

    data [none! string! tag! url! number! time! date! get-word! word! 
    block!]
    /local cmd blk header row-blk start-tag dr tr pr wr
  ] [
    if get-word? data [data: get data]

    if any [url? data string? data number? data word? data time? data 
    date? data] [out data return true]
    if none? data [return true]

    dr: bind data-rules 'data ; this is the easiest way? can we not bind 
    directly in the parse block?
    tr: bind tag-rules 'data
    pr: bind page-rules 'data
    wr: bind word-rules 'data
    parse data [any [cmd: [dr | tr | pr | wr]]]
  ]
the five last lines in the function are the important ones.