World: r3wp

Join the discussions in the REBOL3 world...

[Parse] Discussion of PARSE dialect

older newer	first last
BrianH 6-Mar-2008 [2489]	You could write a script to generate the rules. It could be faster than writing them directly.
[unknown: 5] 6-Mar-2008 [2490]	I'm not worried about the coding, I'm concerned about the performance. If I have to parse a million records or something then anything that cuts down on the amount of evaluation is necessary.
BrianH 6-Mar-2008 [2491x4]	I'm a little curious as to why you need to have the datatype of a field referenced in the record at all, if you are just using the REBOL data model. Wouldn't the data itself have a type? It seems to me that specified datatypes of fields would only need to be specified once per table.
	This assumes that you aren't taking advantage of REBOL's type system to do SQLite-style manifest typing.
	If you are doing type specifications to validate records, the fastest way to do it is to generate static validation rules based on the specification, then just apply the generated per row. Static validation rules would be faster than dynamic.
	generated per row -> generated rule per row
[unknown: 5] 6-Mar-2008 [2495]	Brian, in my TRETBASE for example when a new table is created then one must set the fields and their datatypes such as: ["fname" string! "lname" string! "age" integer!] but it will always be a format of [string! datatype! string datatype!....]
BrianH 6-Mar-2008 [2496]	That is the table spec, right? Not the row data?
[unknown: 5] 6-Mar-2008 [2497]	I have already got a solution for TRETBASE.
JohanAR 6-Mar-2008 [2498x2]	is it possible to write a parse rule that accepts something like [ "test" \| 123 ] ?
JohanAR 6-Mar-2008 [2498x2]	damnit, found out already.. \| was apparently a word! :D
BrianH 6-Mar-2008 [2500]	["test" '\| 1 1 123]
JohanAR 16-Mar-2008 [2501]	I think my parse rules use lots of temporary variables.. How do you prefer to hide these?
BrianH 16-Mar-2008 [2502]	1: Hide them from whom, and why? In general, if you want to hide something about your parse rules, you need to hide the parse rules altogether. That is not to say that it is a good idea; I've found that in most cases that someone wants to hide some code or variables in REBOL, they really want to do something else and the something else depends on the circumstances. What do you hope to accomplish? 2: You have to be careful with temporary variables. REBOL parse rules are often recursive, and the temporary variables used with them are not. You have to be extra careful to not recurse to another trip through the same parse rule before you are done with the temporary variables in the first round, or put off setting the temps until just before they are used. It's not as hard as it sounds.
JohanAR 16-Mar-2008 [2503x2]	1. Hide them from myself :) I don't mind having lots of global variables in a small script, but I really don't like it in larger programs. To keep things well organized I prefer if variables aren't valid in a larger context than necessary, to avoid overwriting, accidental use etc. Does context [ ... ] add alot of overhead btw? Maybe I should try to use that more often
JohanAR 16-Mar-2008 [2503x2]	2. I don't use alot of recursion so far. some [...] usually works equally well in my applications. But it's definitely a valid point, and I'll try to keep it in mind
BrianH 16-Mar-2008 [2505]	The only execution overhead of context is when it is built - nothing extra at runtime. The memory overhead is minimal. Every word is defined in a context, even the global ones. Overall, using an object to wrap the temporary variables that your rules use is not a bad idea. As long as you are doing this to better manage your program and reduce the scope of errors, it is great.
btiffin 16-Mar-2008 [2506]	context [ ] is just a shortcut for make object! [ ] and it's great. The more we hide in objects the easier it will be share, or at the least, easier to use code from a variety of developer sources. Programming in the Many is important in our context as there are relativily few of us in the "many" - so far. So when even our small stuff is shareable we all win.
Gregg 17-Mar-2008 [2507]	I often use contexts with parsers, to contain the rules.
Oldes 17-Mar-2008 [2508]	what about 'use tmp: 1 use [tmp][ parse "test" [copy tmp to end (probe tmp) ]] probe tmp
BrianH 17-Mar-2008 [2509x3]	Does a bind/copy on its code block every time it is used.
	That kind of overhead is usually only worth it when you can't get rid of concurrent use any other way.
	Wait, USE may not copy in R2 - that could be even worse.
Oldes 17-Mar-2008 [2512]	I should probably not to use the code evaluation so much directly in the parse rule block and rather call a function if I need a lot of temp variables to process the action.
Henrik 28-Apr-2008 [2513x2]	>> parse [>] [>] == false >> parse [>] ['>] ** Syntax Error: Invalid word -- '> How do you parse that block?
Henrik 28-Apr-2008 [2513x2]	(note this block can only be made without a space at the end in rebol 2.7)
Oldes 28-Apr-2008 [2515]	I'm using help words like: slash: to-lit-word first [/] dslash: to-lit-word "//" rShift: to-lit-word ">>" UrShift: to-lit-word ">>>" _greater: to-lit-word ">" _less: to-lit-word "<" _noteql: to-lit-word "<>" _lesseql: to-lit-word "<=" _greatereql: to-lit-word ">="
Henrik 28-Apr-2008 [2516]	nice, thanks
Oldes 28-Apr-2008 [2517]	>> parse to-block load "<" [_less] == true
Henrik 28-Apr-2008 [2518]	yep, works
Henrik 11-May-2008 [2519x3]	if I have a rule-block that does not exist in the same context as the main parse block, is there a simple way to rebind it without composing it into the main parse block? my current solution is to bind it to a temp block and use the temp block as a rule in the main parse block, which is less than optimal, I think.
	set 'html-gen func [ "Low level HTML dialect" data [none! string! tag! url! number! time! date! get-word! word! block!] /local cmd blk header row-blk start-tag dr tr pr wr ] [ if get-word? data [data: get data] if any [url? data string? data number? data word? data time? data date? data] [out data return true] if none? data [return true] dr: bind data-rules 'data ; this is the easiest way? can we not bind directly in the parse block? tr: bind tag-rules 'data pr: bind page-rules 'data wr: bind word-rules 'data parse data [any [cmd: [dr \| tr \| pr \| wr]]] ]
	the five last lines in the function are the important ones.
Chris 11-May-2008 [2522x2]	Assuming you want to assign values to function locals from the external parse rules, you can a) bind as you are doing, b) create a larger context for the function encompassing your rules or c) compile the parse rule, either on creation of the function or for each instance. a) rule: [set tag tag!] test: func [data /local tag][bind rule 'data parse data rule tag] b) test: use [tag][ rule: [set tag tag!] func [data][parse data rule tag] ] c) rule: [set tag tag!] test: func [data /local tag] compose/only [parse data (rule) tag] Also, note that when you bind, it alters the original block -- no need to reassign to a new word.
Chris 11-May-2008 [2522x2]	When it comes to complex rules, I opt for b). At that, I'd go for context [] where there are a lot of associated words...
Henrik 12-May-2008 [2524]	the function is recursive, so that may put a twist on b). I forgot that detail with BIND on a) so thanks for that. c) seems to work best.
amacleod 15-May-2008 [2525x4]	I'm just not getting the hang of parsing. I've read tutorials an looked at scripts but when I try to adapt it to my work it fails.
	I'm trying to parse a tex document that I've formated into lines of text with blank lines between simialr to make doc format
	Most lines begin with a section number (2.), or a sub-section (2.3) or a sub-sub-section (2.3.5).
	I've got rules to find each: (some digit "." some space) etc. and it works. I've been able to copy the text following with (copy text thru end) but how do I copy the section number?
Oldes 15-May-2008 [2529x2]	ch_section: charset "0123456789." parse/all "2.1.3 line" [copy section some ch_section copy rest to end] probe reduce [section rest] ;== ["2.1.3" " line"]
Oldes 15-May-2008 [2529x2]	or something like that: ch_digits: charset "0123456789" r_section: [pos1: some [some ch_digits opt #"."] pos2: (section: copy/part pos1 pos2)] parse/all "2.3.4 line" [r_section copy rest to end] probe reduce [section rest] ;== ["2.3.4" " line"]
BrianH 16-May-2008 [2531x3]	If the section numbers always end with a period, you can do this: some [some digits "."] If the section numbers don't end with period you can do this: some digits any ["." some digits]
	Look up recursive descent parsing, and take a not of the difference between left recursion and right recursion.
	not -> note
Chris 16-May-2008 [2534]	Don't want to add too much, but with parse you can really build up a vocubulary based on the patterns you know: section: [integer! ["." \| 1 4 ["." integer!]]] ; -- or whatever rule covers all permutations chars-sp: charset " " space: [some chars-sp] parse/all [copy sn section space [to newline \| to end]] Vocabularies are easy to wrap in their own context too. Note also that [integer!] is a shorthand for [some digit] -- very useful : )
amacleod 16-May-2008 [2535x4]	Oldes, thanks for your suggestion. It works when I do a simple one line rule as you suggested but when I try to use multiple rules it fails. Example of what I'm trying to do: Example of the text document:
	3. CONSTRUCTION OF PORTABLE ALUMINUM LADDERS 3.1 Aluminum ladders are divided into two basic types of construction, viz:, solid beam and truss. 3.1.1 Solid Beam Aluminum Construction- This type of ladder has a solid side rail construction with aluminum rungs connecting with the side rails at fourteen inch intervals. The connection is generally either by a welded joint between rung and side rails, or by an expansion plug pinching the rung tightly to the side rails and internal backup plates. (Figure 2 A) 3.1.2 Aluminum Truss Construction- In the aluminum truss design, the top and bottom rails are connected to rung assemblies or rung blocks by rivets. The rungs are either welded or expansion plugged to the rung plate assemblies, which are supported by the top and bottom rails. (Figure 2B) 3.2 The base of the portable aluminum ladder is provided with either steel spikes or swiveling rubber safety shoes and aluminum spikes. For ladders equipped with the swiveling device, the rubber pads should be utilized when the ladder is to be raised and used on hard surfaces. (Figure 2A, 2B) 3. CONSTRUCTION OF PORTABLE ALUMINUM LADDERS
	space: charset " ^-" spaces: [some space] chars: complement charset " ^-^/" digit: charset "0123456789" digits: [some digit] section: [digits "." some space] sub-sec: [digits "." digits spaces] sub-sub-sec: [digits "." digits "." digits spaces] rules: [heading some parts done] (where heading is the first line of the text file] parts: [newline \| section format_section \| sub-section \| sub-sub-section] format_section: copy sec section copy rest to newline (print reduce [sec rest])
	If I use format_section code directly with parse it works but i get nothing when I redirect it to another line. THe above code is similar to what Carl used in his text to html script.
older newer	first last