Parse and and recursion local variables?
[1/23] from: petr::krenzelok::trz::cz at: 17-Mar-2007 12:15
Hi,
not being much skilled in parsing, I tried to do a little parser for my
own rsp-tag system (I know there are few robust systems out there, but I
want to learn via implementing my own). Basically the idea is to use
kind of comment tags, which still allow web designer to display html
content. The most attractive system for me was Gabriele's Temple, but it
is not finished, nor supported, so I want my own one, much simpler :-)
My tags look like:
<!--[section-x]-->
subsection html code
<!--/[section-x]-->
I want to detect particular sections and invoke particular modules/code
for them, submitting it section content. Basically it works, but I also
wanted to try my very primitive parser to do some recursion. And it
seems to me I tracked down, why it does not work - inside of nested
sections, when the recursive rule is applied, I think that
rsp-section-name is not kept local to particular recursion level? Can I
make it local by putting e.g. parse into function body, defining a word
(rsp-tag-nested) as a local variable? Or the issue is more complex and
my rules simply build insufficiently?
Also - if I don't use to [a | b | c], as we don't have any ;-), I have
to skip by one char. That makes stripping out html subsections a bit
difficult (where to put correct markers), as my rsp-html or html rules
simply are "1 skip". But maybe that could be solved by defining what the
proper html charset is?
Sorry if my questions are rather primitive to our parse gurus here :-)
-pekr-
---------------------
REBOL []
template: {
this is the beginning of html:-)
<b><!--[mark-x]-->Hello x!<!--/[mark-x]--></b>
<b><!--[mark-y]-->Hello y!<!--/[mark-y]--></b>
<b><!--[mark-z]-->Hello z!<!--/[mark-z]--></b>
<b><!--[mark-w]-->Hello w!
another html code
<!--[mark-u]-->
subsection html code
<!--/[mark-u]-->
finishing mark-w html code
<!--/[mark-w]--></b>
this is the end :-)
}
out: copy ""
;--- uncommend following to get incorrect recursion behavior
;rsp-begin: ["<!--[" copy rsp-tag-name to "]-->" "]-->"]
;rsp-end: ["<!--/[" rsp-tag-name "]-->"]
;--- comment following if enabling above ones ...
rsp-begin: ["<!--[" s: some [e: "]-->" (print copy/part s e) break | skip]]
rsp-end: ["<!--/[" s: some [e: "]-->" (print copy/part s e) break | skip]]
;just to distinguish for eventual debugging ...
html: [copy char skip (append out char)]
rsp-html: [copy char skip (append out char)]
rsp-section: [rsp-begin any [rsp-end break | rsp-section | rsp-html]
]
rsp-rules: [any [rsp-section | html] end]
parse/all template rsp-rules
probe out
halt
[2/23] from: lmecir:mbox:vol:cz at: 17-Mar-2007 17:54
Hi Pekr,
...
> My tags look like:
> <!--[section-x]-->
<<quoted lines omitted: 9>>
> (rsp-tag-nested) as a local variable? Or the issue is more complex and
> my rules simply build insufficiently?
I tried your code below uncommenting the marked lines and didn't reveal
what you meant by "incorrect behaviour". Can you tell me what you expected?
> ---------------------
> REBOL []
<<quoted lines omitted: 28>>
> probe out
> halt
-L
[3/23] from: petr:krenzelok:trz:cz at: 17-Mar-2007 18:40
OK Ladislav, here we go. I bet I am doing something incorrectly. Simply
put, once rsp-section recursion is applied for mark-u section, once it
returns back to one level up, rsp-tag-name remains to be set to mark-u,
whereas what I would like to achieve is - parser keeping that variable
local for certain iteration, so once it would return from mark-u section
back to finish parent mark-w section, having it set to mark-w. I tried
to enclose the example in the function and define rsp-tag-name as local
variable as you can see, but to no luck ... but maybe my aproach is
incorrent conceptually anyway :-)
Thanks,
Petr
-----------------------
REBOL []
template: {
this is the beginning of html:-)
<b><!--[mark-x]-->Hello x!<!--/[mark-x]--></b>
<b><!--[mark-y]-->Hello y!<!--/[mark-y]--></b>
<b><!--[mark-z]-->Hello z!<!--/[mark-z]--></b>
<b><!--[mark-w]-->Hello w!
another html code
<!--[mark-u]-->
subsection html code
<!--/[mark-u]-->
finishing mark-w html code
<!--/[mark-w]--></b>
this is the end :-)
}
;parse-test-recursion: func [/local rsp-tag-name][
out: copy ""
rsp-begin: ["<!--[" copy rsp-tag-name to "]-->" "]-->" (print
[rsp-tag-name "start"])]
rsp-end: ["<!--/[" rsp-tag-name "]-->" (print
[rsp-tag-name "end"])]
;rsp-begin: ["<!--[" s: some [e: "]-->" (print copy/part s e) break |
skip]]
;rsp-end: ["<!--/[" s: some [e: "]-->" (print copy/part s e) break |
skip]]
;just to distinguish for eventual debugging ...
html: [copy char skip (append out char)]
rsp-html: [copy char skip (append out char)]
rsp-section: [rsp-begin any [rsp-end break | rsp-section (print
["finished subsection" rsp-tag-name]) | rsp-html]]
rsp-rules: [any [rsp-section | html] end]
parse/all template rsp-rules
;]
;parse-test-recursion
probe out
halt
[4/23] from: lmecir:mbox:vol:cz at: 17-Mar-2007 19:12
Petr Krenzelok napsal(a):
> OK Ladislav, here we go. I bet I am doing something incorrectly. Simply
> put, once rsp-section recursion is applied for mark-u section, once it
<<quoted lines omitted: 7>>
> Thanks,
> Petr
you are not too far away, I guess. Instead of using one PARSE, you could
define your own PARSE-TAG function etc.
But, I take this as an opportunity to promote my way ;-). If you take a
look at http://www.fm.vslib.cz/~ladislav/rebol/parseen.r and try:
rsp-section: [
do-block [
use [rsp-tag-name] [
rsp-begin: [...]
rsp-end: [...]
[rsp-begin any [rsp-end (print ["finished section" rsp-tag-name]) break | rsp-section
| rsp-html]]
]
]
]
you may find out, that it works
-L
[5/23] from: volker:nitsch:gm:ail at: 17-Mar-2007 19:22
My typicall way to do that is my own stack.
stack: copy[]
parse rule[
(insert/only stack reduce[local1 local2])
rule ;recursion
(set[local local2] first stack remove stack)
]
On 3/17/07, Petr Krenzelok <petr.krenzelok-trz.cz> wrote:
> Hi,
> not being much skilled in parsing, I tried to do a little parser for my
<<quoted lines omitted: 58>>
> To unsubscribe from the list, just send an email to
> lists at rebol.com with unsubscribe as the subject.
--
-Volker
Any problem in computer science can be solved with another layer of
indirection. But that usually will create another problem.
David
Wheeler
[6/23] from: lmecir:mbox:vol:cz at: 17-Mar-2007 19:33
Ladislav Mecir napsal(a):
> Petr Krenzelok napsal(a):
>> OK Ladislav, here we go. I bet I am doing something incorrectly. Simply
<<quoted lines omitted: 15>>
> But, I take this as an opportunity to promote my way ;-). If you take a
> look at http://www.fm.vslib.cz/~ladislav/rebol/parseen.r and try:
sorry, correction, you would most probably need copy/deep [...] i.e.
rsp-section: [
do-block [
use [rsp-tag-name] copy/deep [
rsp-begin: [...]
rsp-end: [...]
[rsp-begin any [rsp-end (print ["finished section"
rsp-tag-name]) break | rsp-section | rsp-html]]
]
]
-L
[7/23] from: petr:krenzelok:trz:cz at: 17-Mar-2007 19:34
Volker Nitsch wrote:
> My typicall way to do that is my own stack.
> stack: copy[]
<<quoted lines omitted: 3>>
> (set[local local2] first stack remove stack)
> ]
Guys, got to go to have few beers tonight with my friends, but - that
kinda sucks ;-) I can guarantee you, that most novices, like me, will
swear like mad when developing and touching first recursion. I think,
that rebol's aproach of implicit global nature of words, even those
inside functions, is the main hell for novices :-)
One quick question - how is that with function recursion? Can I have
local
function value to one level of recursion call? And then shared
one? I know I can build my own stack, but ... :-) And if it is possible
with functions, parser should explicitly behave like that, even if it
breaks rebol rules :-) Will try with recursive function or stack later ...
-pekr-
[8/23] from: moliad:gmai:l at: 17-Mar-2007 19:09
hi Pekr,
I have not completely followed the code part, as its complex but the
recursion issue is simply by the very nature of parse... parse rules are not
a stacked function calls. they are branches of execution with automatic
series pointer rollback on error. so as you are traversing a series, you
really only jump and come back ... no stack push... as you have no
variables
to push on the parse. if rebol did an explicit copy of the
parse rules (thus localizing each rule at each instances), I can tell you
that memory consumption and speed drop would not only be dramatic, but would
render the parser unusable in any large dataset handling.
the current parser is a stream analyser.. where you decide what to do
next... the fact that the stream has a graph, tree, or recursive data
organisation is not parse's fault.
in the current implementation, we are able to parse 700MB files using a
15000 lines of parse rules (cause I know someone who has such a setup) and
it screams!... if we added any kind of copy... any real usage would just
crawl crash rebol and need GBs RAM. as we speak I have a 1300 line parse
rule which handles several kb of string in 0.02 seconds (or less).
So, this being said I know parse is a bitch to use at first... hell I gave
up trying each year for the last 7 years...but for some reason, I gave it
another try (again) in the last months... and well, I finally "GOT" it. its
a strange process, but it suddenly becomes so clear all becomes obvious.
The only thing I can say (from my own experience) ... don't give up...
really do go to the end of your implementation and eventually you might
GET
it too. ;-)
The only thing I can say about parse, is that its usually MUCH easier (and
faster too) to parse the original string data and construct your data set AS
a loadable string. in this way, you just brease through the data linearily
(VERY FAST, no stack issues) and append all nesting in the loadable string
as you go.
simple generic html loading example:
hope this helps
-MAx
;---------------------------------------------------
rebol []
;-
;- RULES
html: context [
output: ""
;protect the global space
data: none
attr: none
val: none
attrstr: none
alphabet: "abcdefghijklmnopqrstuvwxyz"
not-quotes: complement charset {"}
alpha: union charset alphabet charset
1234567890abcdefghijklmnopqrstuvwxyz_ABCDEFGHIJKLMNOPQRSTUVWXYZ
nalpha: complement alpha
path: not-quotes ;union alpha charset "%-+:&=./\"
space: charset [#" " #"^/" #"^-"]
spaces: [any space]
attribute: [
spaces
copy attr some alpha {=}
spaces [ copy val some alpha | {"} copy val any not-quotes {"}]
(append output rejoin [attr " {" val "}"])
]
in-tag: [
"<"
copy data some alpha
( append output join data "[^/" )
spaces any attribute spaces
">"
]
out-tag: [ ["</" thru ">"] ( append output "^/]") ]
content: [copy data to "<" (if all [data not empty? trim copy data]
[append output rejoin [ "{" data "}"]]) ]
;href: [ spaces {href="} copy attrs any path {"}]
;link-tag: ["<A" spaces some [href (append parsed-links attrs ) |
attribute ] ">"] ;[[copy ref-url href (print ref-url)] | attribute ]]
rule: [some [content [ out-tag | in-tag ] ] ]
]
parse/all {<html> <body> <h3 >tada</h3><p><FONT color="#000000" > there you
go :-)</FONT></p> </body> </html>} html/rule
probe load html/output
html-blk: load html/output
; XPATH anyone ;-)
probe html-blk/html/body/p/font/color
ask "..."
On 3/17/07, Petr Krenzelok <petr.krenzelok-trz.cz> wrote:
[9/23] from: lmecir:mbox:vol:cz at: 18-Mar-2007 17:37
Petr Krenzelok napsal(a):
> Volker Nitsch wrote:
>> My typicall way to do that is my own stack.
<<quoted lines omitted: 18>>
> breaks rebol rules :-) Will try with recursive function or stack later ...
> -pekr-
here are the results obtained:
mark-x start
mark-x end
mark-y start
mark-y end
mark-z start
mark-z end
mark-w start
mark-u start
mark-u end
finished subsection mark-w
mark-w end
...and here is the code:
include http://www.fm.vslib.cz/~ladislav/rebol/parseen.r
template: {
this is the beginning of html:-)
<b><!--[mark-x]-->Hello x!<!--/[mark-x]--></b>
<b><!--[mark-y]-->Hello y!<!--/[mark-y]--></b>
<b><!--[mark-z]-->Hello z!<!--/[mark-z]--></b>
<b><!--[mark-w]-->Hello w!
another html code
<!--[mark-u]-->
subsection html code
<!--/[mark-u]-->
finishing mark-w html code
<!--/[mark-w]--></b>
this is the end :-)
}
out: copy ""
;just to distinguish for eventual debugging ...
html: [copy char skip (append out char)]
rsp-html: [copy char skip (append out char)]
rsp-section: [
use [rsp-begin rsp-end rsp-tag-name] copy/deep [
rsp-begin: ["<!--[" copy rsp-tag-name to "]-->" "]-->" (print
[rsp-tag-name "start"])]
rsp-end: ["<!--/[" rsp-tag-name "]-->" (print
[rsp-tag-name "end"])]
[rsp-begin any [rsp-end break | do rsp-section (print ["finished
subsection" rsp-tag-name]) | rsp-html]]
]
]
parseen/all template [any [do rsp-section | html] end]
probe out
-L
[10/23] from: volker:nitsch:g:mail at: 18-Mar-2007 22:38
Am Samstag, den 17.03.2007, 19:34 +0100 schrieb Petr Krenzelok:
> Volker Nitsch wrote:
> > My typicall way to do that is my own stack.
<<quoted lines omitted: 16>>
> with functions, parser should explicitly behave like that, even if it
> breaks rebol rules :-) Will try with recursive function or stack later ...
I agree completely. That, the missing thru[a | b] and the clumsy
charsets are the biggest showstoppers for beginners IMHO. there are
workarounds, but they need a lot explaining/insights.
[11/23] from: moliad:gm:ail at: 18-Mar-2007 17:33
thru [a | b] is not only a problem in learning... its a valid extension
cause it can simply many complex rules, by not having to explicitely
implement all possible variations as rules.
but I am almost sure that it would lead to regexp like slowdown in some
rules ' :-/
why are charsets clumsy?
-MAx
On 3/18/07, Volker <volker.nitsch-gmail.com> wrote:
[12/23] from: petr:krenzelok:trz:cz at: 19-Mar-2007 10:02
> I agree completely. That, the missing thru[a | b] and the clumsy
> charsets are the biggest showstoppers for beginners IMHO. there are
> workarounds, but they need a lot explaining/insights.
>
I know that to/thru [a | b | c] MIGHT slow down parser. Noone says, it
has to do 3x find, evaluate index of match and return at lowest index.
It can internally behave just like any [a | b | c | skip]. The thing is,
that we said, that REBOL is about being easy. And above addition would
make parse easier to understand/use for novices. I am not sure it would
teach them to incline to bad habit. As for me, it is about - being able
to use parse, or not using parse at all! Or we need new docs, explaining
more properly how parse internally works, why you can't easily make your
variables local, even if you wish to, etc.
My opinion is, that when we find ourselves using workarounds or
shortcuts nearly all the time, we need to rethink the concept once
again, extend it, or introduce some mezzanine shortcut. I hope 'parse is
on the radar for R3, and that we get some helpers in-there ...
Petr
[13/23] from: volker:nitsch:gm:ail at: 19-Mar-2007 12:46
Am Sonntag, den 18.03.2007, 17:33 -0500 schrieb Maxim Olivier-Adlhoch:
> thru [a | b] is not only a problem in learning... its a valid extension
> cause it can simply many complex rules, by not having to explicitely
<<quoted lines omitted: 3>>
> why are charsets clumsy?
> -MAx
any [ thru [ a | b ] ]
is similar to as
any[ a | b | skip ]
and
thru[ a | b ]
to
some[ a break | b break | skip]
So its there. If you know how. As with stacks and parse-recursion ;)
Slowdown yes :)
charsets are clumsy because they are defined somewhere and not in the
rule. And they are look ugly, this #"c".
digits: charset[ #0" - #"9" ]
rule: [ some[ digit ] ]
Thats natural for BNF-academics. But
rule: [ some [ 0 - 9 ] ]
would be much nicer.
[14/23] from: petr:krenzelok:trz:cz at: 19-Mar-2007 13:14
Volker Nitsch napsal(a):
> My typicall way to do that is my own stack.
> stack: copy[]
<<quoted lines omitted: 3>>
> (set[local local2] first stack remove stack)
> ]
Volker,
thanks for the idea, with little bit of thinking I might get the final
result ...
-------------------
REBOL []
template: {
this is the beginning of html:-)
<b><!--[mark-x]-->Hello x!<!--/[mark-x]--></b>
<b><!--[mark-y]-->Hello y!<!--/[mark-y]--></b>
<b><!--[mark-z]-->Hello z!<!--/[mark-z]--></b>
<b><!--[mark-w]-->Hello w!
another html code
<!--[mark-u]-->
subsection html code
<!--[mark-q]-->
subsubsection html code
<!--/[mark-q]-->
<!--/[mark-u]-->
finishing mark-w html code
<!--/[mark-w]--></b>
this is the end :-)
}
stack: context [
add: func [values][insert/only stack reduce values]
remove: func [values][system/words/remove stack if not empty? stack
[set values first stack]]
stack: copy []
probe: does [system/words/probe stack]
]
out: copy ""
rsp-begin: ["<!--[" copy rsp-tag-name to "]-->" "]-->" (print
[rsp-tag-name "start"]) (stack/add [rsp-tag-name])]
rsp-end: ["<!--/[" rsp-tag-name "]-->" (print
[rsp-tag-name "end"]) (stack/remove [rsp-tag-name])]
;just to distinguish for eventual debugging ...
html: [copy char skip (append out char)]
rsp-html: [copy char skip (append out char)]
rsp-section: [rsp-begin any [rsp-end break | rsp-section (print ["back
at section" rsp-tag-name]) | rsp-html]]
rsp-rules: [any [rsp-section | html] end]
parse/all template rsp-rules
probe out
halt
[15/23] from: petr:krenzelok:trz:cz at: 19-Mar-2007 13:29
> charsets are clumsy because they are defined somewhere and not in the
> rule. And they are look ugly, this #"c".
<<quoted lines omitted: 3>>
> rule: [ some [ 0 - 9 ] ]
> would be much nicer.
Hopefully also rule: [some [0 .. 9]] as a new range datatype in R3? :-)
Petr
[16/23] from: anton:wilddsl:au at: 19-Mar-2007 23:34
How about this possible notation:
#"0-9"
which should create a charset just as:
charset [#"0" - #"9"]
Anton.
[17/23] from: christian:ensel:gmx at: 19-Mar-2007 22:22
> I know that to/thru [a | b | c] MIGHT slow down parser. Noone says, it
> has to do 3x find, evaluate index of match and return at lowest index.
> It can internally behave just like any [a | b | c | skip]. The thing is,
> that we said, that REBOL is about being easy. And above addition would
> make parse easier to understand/use for novices.
I think it would be really nice for all-but-guru-level parse-rule
authors to have the TO [RULE-1 | RULE-2 | ... | RULE-N] and THRU
[RULE-1 | RULE-2 | ... | RULE-N] rules working as abbrevations as Petr
describes them - matching rules in the order they are given instead of
matching at the lowest index. Just point out in prominent places that
the rules are greedy, point out that the first matching rule gets applied.
It's just more readable than Volker's ANY [RULE-1 | RULE-2 | ... |
RULE-N | SKIP] and THRU [RULE-1 BREAK | RULE-2 BREAK | ... | RULE-N
| SKIP] idioms. Which, by the way, I've just printed, framed and hung
up over my bed so that hopefully I'll never ever forget about them ...
Anyway, dreaming of Petr's TO and THRU, what immediatly springs to mind
then are TO/FIRST [RULE-1 | RULE-2 | ... | RULE-N] and THRU/FIRST
[RULE-1 | RULE-2 | ... | RULE-N] working in the reg-ex way. Looks to
me like a natural extension of the parse dialect without breaking
existing PARSE rules.
-- Christian
[18/23] from: petr:krenzelok:trz:cz at: 19-Mar-2007 22:54
> Anyway, dreaming of Petr's TO and THRU, what immediatly springs to mind
> then are TO/FIRST [RULE-1 | RULE-2 | ... | RULE-N] and THRU/FIRST
> [RULE-1 | RULE-2 | ... | RULE-N] working in the reg-ex way. Looks to
> me like a natural extension of the parse dialect without breaking
> existing PARSE rules.
>
Christian, actually what I had in mind was "the index" kind of thing applying FIRST occurance
of rule1 | rule 2 | etc. IIRC, when I first
proposed addition of above, I called new word inside parse dialect
'FIRST. That enhancement proposal was on-line at Robert's or Nenad's
site, don't remember. Or look here:
http://www.colellachiara.com/soft/Misc/parse-rep.html
http://www.fm.vslib.cz/~ladislav/rebol/rep.html
Petr
[19/23] from: moliad:gma:il at: 19-Mar-2007 17:08
btw,
what people do not immediately see is that to and thru do not match rules,
they only "match" charsets or strings, they really are like find and
find/tail .
actually, to and thru really only skip content, they don't match it. which
is a big difference with the rest of parse which must match it. Which is
why parsing is so hard. one must identify all the patterns which can match,
often innwards out . I too, was tempted into using to and thru when I
started... but I quickly understood that I could not go very far with parse,
and in fact, probably one of the reasons I eventually didn't get to use it.
inserting rules here would be extremely powerfull, but also very taxing and
possibly even impossible, as it would mean actually tring all the rules at
every byte. the first index of any of the rules which match completely
would be returned... the compound effect of having many of these rules could
lead to impposible parses or exponentially slow rules, which is usually not
the case with even very large parse dialects.
this is probably very close to internal regexp use in fact, but also why its
so slow and as such becomes almost unuable on any long string or when a few
conditional rule depths are built on any serious regexp string... I've used
regexp and was dumbfounded by how quickly it slows down... even on current
computers.
not trying to bust the bubble, just trying to explain why some of the things
are like they are. Parse is meant to be screaming fast, for many reasons...
a lot of rebol is built using it (view, LNS, etc).
we could argue that adding those things add to the options, true, but they
might also become the de facto use, since they are easier to adopt, yet in
the long run, might give a bad view of parse, which becomes "so slow". And
few of use would learn and use the "real" parsing.
Funny I'm sooo opinionated when a few months ago I was still clueless, ;-)
-MAx
On 3/19/07, Christian Ensel <christian.ensel-gmx.de> wrote:
[20/23] from: christian:ensel:gmx at: 20-Mar-2007 0:14
> Christian, actually what I had in mind was "the index" kind of thing
> applying FIRST occurance of rule1 | rule 2 | etc. IIRC, when I first
> proposed addition of above, I called new word inside parse dialect
> 'FIRST.
Petr, yes, I imagined FIRST too, but then had difficulties coming up
with a second (no pun intended) word to go for the difference between TO
and THRU. Hence the refinement suggestion.
But, Max is of course right in suggesting to leave TO and THRU untouched
for the sake of parsing speed alone.
> we could argue that adding those things add to the options, true, but they
> might also become the de facto use, since they are easier to adopt, yet in
> the long run, might give a bad view of parse, which becomes "so slow". And
> few of use would learn and use the "real" parsing.
I don't think I can agree with the reasoning on why to shy away from
including some means to use PARSE the reg-ex way, though, Max. There
might be reasons not to do it, may it be limited resources at RT or
other. But not to feature them just to prevent inappropiate use looks a
bit overcautious to me. (I don't see hammers getting too much bad press
because of those people trying to drive screws into walls with them.
It's just not the hammer's fault; and most people know this.)
At least I'm confident that technically there's a way to include
something as suggested without negative impact on existing PARSE's
appliance. Be it a new key-word MATCH [RULE-1 | RULE-2 | ...] with /TO
xor /THRU and optional /FIRST refinement or whatever else. I've seen
people asking for something like this for so many years now ...
- Christian
[21/23] from: petr:krenzelok:trz:cz at: 20-Mar-2007 7:21
Christian Ensel wrote:
>> Christian, actually what I had in mind was "the index" kind of thing
>> applying FIRST occurance of rule1 | rule 2 | etc. IIRC, when I first
<<quoted lines omitted: 4>>
> with a second (no pun intended) word to go for the difference between TO
> and THRU. Hence the refinement suggestion.
Christian, the problem is, that 'parse can't handle paths, or am I
wrong? So - the refinement way is unlikely. But maybe I am wrong? Could
anyone elaborate, if our keywords could use refinements eventually?
Petr
[22/23] from: petr:krenzelok:trz:cz at: 20-Mar-2007 7:41
Maxim Olivier-Adlhoch wrote:
> btw,
> what people do not immediately see is that to and thru do not match rules,
<<quoted lines omitted: 6>>
> started... but I quickly understood that I could not go very far with parse,
> and in fact, probably one of the reasons I eventually didn't get to use it.
Max, what do you mean by "only skip content"? I am not C guru, but how
actually do you think, that 'find works? IMO it HAS TO check each char
of unindexed content, to find out, if the string you search for is
contained in the searched string, no? And if so, it actually works in
the "match" way anyway?
The obstacle of to/thru is exactly the lack of lowest index match
returned first. Becase if you search [[to "some string" | to "other
string"]], you usually want to stop at the lowest index, whereas such
rule would apply for whatever "some string" occurence.
Now who says, that to/thru [a | b | c] should work 3x 'find way
internally? My understanding is, that you expect that you think it would
do index? find a, index? find b, index? find c, find minimum of indexes,
returning such rule as a match. So yes, here it is a slow down, because
if the string will be long, and you will have many options to search
for, it can get slow.
But, why should it necessarily work that way? Internally it could work
[a | b | c | skip] way, no? Or am I missing something?
PS: I would like some official places (Carl, Gabriele, Ladislav) to tell
us, what is planned for parse. In fact, I started this thread to raise
the community voice, because I think this topi was raised in 2001
already? However, I am not sure ML is not kind of dead channel to RT,
du not remember when last Carl was here, so not sure if RT tracks it.
But at least this topic is covered here, so users can search it on
rebol.org, which is always good ...
Petr
[23/23] from: chris-ross:gill at: 20-Mar-2007 8:13
Petr Krenzelok wrote:
> Christian, the problem is, that 'parse can't handle paths, or am I
> wrong? So - the refinement way is unlikely. But maybe I am wrong? Could
> anyone elaborate, if our keywords could use refinements eventually?
Parse can now handle paths. Currently refinements are literal (that is
-- parse [/local] [/local] -- holds), so I don't see any reason they
couldn't technically be used in the dialect. Whether they should or
not is another matter...
Also, if the feature was implemented, I'd be in favour of 'first-of --
eg. [first-of [rule-1 | rule-2]] -- seems a little more descriptive
than 'first or 'match.
- Chris
Notes
- Quoted lines have been omitted from some messages.
View the message alone to see the lines that have been omitted