r3wp [groups: 83 posts: 189283]
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

World: r3wp

[Parse] Discussion of PARSE dialect

Janko
14-Feb-2009
[3543x2]
maybe your solution for A | B would work.. I will try
ha, yes it works .. briliant!
Anton
14-Feb-2009
[3545]
it does ?
Janko
14-Feb-2009
[3546x4]
yes :) thanks a lot!
>> T: K: D: "" parse doc [ SOME [ thru "<meta" "name=" skip [ "description" 
(V: 'D) | "keywords" (V: 'K)] skip "content=" m: skip (m1: first 
m ) copy T to m1
(set V T) ]  to end ] ?? K ?? D

K: {Company Directory, Join Us, Advanced Search, Trade Leads, Forum, 
Trade Shows, Advertising, Translation, fair trade, trade portal, 
business to business, tr
ade leads, trade events, china export, china manufacturer}

D: {New international trade portal and company directory for Asia, 
Europe and North America. Our priority No.1 is to create and maintain 
a safe, well lit busi

ness-to-business marketplace, by assisting our members in identifying 
new trustworthy business partners!}

== {New international trade portal and company directory for Asia, 
Europe and North America. Our priority No.1 is to create and mai...
>>
it is also not dependant on the order of things which I still have 
to figure out why is that .. it works no matter which one is before 
the other
I intended to make a blogpost .. "REBOL parse challenge" and present 
this problem and ask if people can provide solutions in other languages 
that would be more elgant ... (in similar note as the "arc challenge" 
... now that it seems even more hard nut to crack I should probably 
really do it .. does anyone think this would be easy to solve using 
the conventional language? (I think not)
Anton
14-Feb-2009
[3550]
I'm sure there are some elegant solutions in other languages too.
Janko
14-Feb-2009
[3551]
hm.. would this be nicely solvable with a regex? .. I think it would 
be quite a pain by using regular string functions like strpos substr 
etc... having the same requirenments (one or more spaces/tabs/newlines 
" or ' , undefined order)
Anton
14-Feb-2009
[3552]
I don't know - I only learn regex when I have to .. then a short 
time later I forget.
Janko
14-Feb-2009
[3553]
yes, me also
Anton
14-Feb-2009
[3554]
perl could do it pretty quick, I'm sure.
Janko
14-Feb-2009
[3555x4]
perl pro would certanly use regex (that is the initial home of it) 
:) ... I think parse and regex are best for some different problems, 
I am just not sure if this one is better solved with one or the other
regex I imagine sucks at structured stuff , where you have to make 
some sort of state machine , for example I don't think regex can 
well parse xml ... state machines are exelent at that but they do 
require more code than parse would
I will see with the "parse challenge" .. if I would want to be really 
*sneaky* I could ask if anyone can solve this in perl comunity .. 
and if their solution would suck more than rebol's then make the 
blogpost  :)
but I am not like that ;)
Anton
14-Feb-2009
[3559x2]
Yeah, I'm not really sure what that would prove. :)
What would you build a state machine with, which would generate so 
much code ?
Janko
14-Feb-2009
[3561]
I don't fully understand your question?
Anton
14-Feb-2009
[3562x2]
You say "state machines ... require more code". What code ? Obviously, 
you can build a state machine in any language, but I guess I'm wondering 
what ... ohh... I'm so tired after all those cheese sandwiches....
Anyway, I think I understand what you're saying. A state machine 
is big and clunky, expressing everything you don't want to hear about, 
while parse allows you to express your target more directly, cutting 
through anything you don't want without having to specify it.
Janko
14-Feb-2009
[3564]
I don't know the exact term for this but I build many parsers for 
things like xml, wiki text and some other custom things in various 
lower level langauges using simple state machine (at least that's 
how I called it)... To my understanding you can parse anything with 
something like that, also structured nested data with it but it of 
course takes some more coding than this rebol solution... what I 
mean as a state machine is a loop that accepts characters or words 
and has a predefined number of states and code for what to do at 
each state and when to switch to another state etc..
Anton
14-Feb-2009
[3565]
Right, yes. We agree.
Janko
14-Feb-2009
[3566]
ok :)
Anton
14-Feb-2009
[3567]
What is the next problem ?
Janko
14-Feb-2009
[3568]
that was the big stopper that you just solved for me.. there are 
no other problems for now .. just the wilingness to type in all the 
code :) ..
Anton
14-Feb-2009
[3569x2]
I know what it could be - eg:
<img src=afile.jpg>
<img src="afile.jpg>
<img src='afile.jpg'>
The first one without any quotes causes a little bit of a problem 
(solvable).
Janko
14-Feb-2009
[3571x2]
maybe you can make OPT [ " | ' ] ?
copy to [ " | > | ' ] ?
Anton
14-Feb-2009
[3573]
You have to use a variable to store which one was used, then parse 
until that character is encountered again.
Janko
14-Feb-2009
[3574x2]
yes, thats how I did it
>> "content=" m: skip (m1: first m ) copy T to m1<<
Anton
14-Feb-2009
[3576]
So you did.
Janko
14-Feb-2009
[3577]
in meta tags example
Anton
14-Feb-2009
[3578]
But when no quotes are used, it gets tricky, eg:
<img src= afile.jpg width=10>
Janko
14-Feb-2009
[3579]
what I have the biggest problem (that I thought is unsolvable - but 
I have to study your example why it works)  is the order of things
Anton
14-Feb-2009
[3580]
Is this a surprise ?
>> parse "abc" [some ["b" | "c" | "a"]]
== true
Janko
14-Feb-2009
[3581x2]
hm.. I don't know right now.. you confused me.. I thought I tried 
everything and it just didn't work what I needed but I don't have 
example in my head
I will try to think of one
Anton
14-Feb-2009
[3583]
Yes, it takes a little while to become familiar with parse.
Janko
14-Feb-2009
[3584]
this does surprise me a little , but I am not sure if this was the 
problem or something else, because I hrought I tried with some and 
all things
Anton
14-Feb-2009
[3585]
It means, basically:
	SOME: Do this 1 or more times, until fail or end is reached:

  [Try "b", if that fails, try "c". If that fails, try "a"]     <--- 
  Given "a" "b" "c", this rule always succeeds.
Janko
14-Feb-2009
[3586x2]
aha.. I think / hope I found an example of my problem ( I already 
settled that I have to do thins like this in multiple passes )
( the problem is at things where things repeat adn I don't know in 
which order they will appear .. I had this problem with parsing something 
like simplified wiki text )
>> a: "start1 1 end start2 2 end start1 3 end"
== "start1 1 end start2 2 end start1 3 end"

>> parse a [ SOME [ [ thru "start2" | thru "start1" ] copy T to "end" 
(print T) ] to end ]
 2
 3
== true

>> parse a [ SOME [ [ thru "start1" | thru "start2" ] copy T to "end" 
(print T) ] to end ]
 1
 3
== true

( to not give impression I have only problems with parse, I used 
parse to solve many things that would be headhurting any other way... 
these and problem upthere are just cases where I got into trouble)
Anton
14-Feb-2009
[3588x3]
Yes, multiple passes can make the code simpler.
Ah, here it's good to use nested rules to cut down the code.
apiece: [copy T to "end" (?? T)]

parse a [some [thru "start2" apiece | thru "start1" apiece]  to end]
Janko
14-Feb-2009
[3591x2]
This is basically not a problem , as I solve these things wiht multiple 
passes and it works more than fast enought for me that way also ... 
I think this problem would not exist if in case of [ .. | .. | .. 
] parse would check all options and take the one stat is least characters 
away from current position (that comes true the first) .. but this 
would most probably slow down the parse and you would loose the feature 
that you define "priority" with [ .. | ..  | .. ] now .. so maybe 
if there would be a different | for this
( I have to go to eat... will be back .. thanks a lot for before)