r3wp [groups: 83 posts: 189283]
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

World: r3wp

[Parse] Discussion of PARSE dialect

Anton
14-Feb-2009
[3590]
apiece: [copy T to "end" (?? T)]

parse a [some [thru "start2" apiece | thru "start1" apiece]  to end]
Janko
14-Feb-2009
[3591x2]
This is basically not a problem , as I solve these things wiht multiple 
passes and it works more than fast enought for me that way also ... 
I think this problem would not exist if in case of [ .. | .. | .. 
] parse would check all options and take the one stat is least characters 
away from current position (that comes true the first) .. but this 
would most probably slow down the parse and you would loose the feature 
that you define "priority" with [ .. | ..  | .. ] now .. so maybe 
if there would be a different | for this
( I have to go to eat... will be back .. thanks a lot for before)
Anton
14-Feb-2009
[3593]
no worries - I must sleep. :)
Janko
14-Feb-2009
[3594x2]
hm.. interesting solution .. never thought of doing it this way!! 
this would maybe solve these problems I had
hm.. really thanks for this example.. I took it as unsolvable, but 
this is totaly elegant way to solve it .. I will need to think on 
this a little and do some more examples to difest it :) thanks
Anton
14-Feb-2009
[3596]
Not 100% elegant yet !  But glad to help, anyway.
Oldes
14-Feb-2009
[3597]
If you need to parse complex structures, like the marup language, 
you should use charsets and not 'to or 'thru commands... for example 
you cannot say that tag starts with < and ends with > because such 
a tag is valid as well:
<input value="<>">

The 'to and 'thru commands are useful, if you, for example, do datamining 
and don't care to parse all page structure to get just a bit of information 
from it.
Janko
14-Feb-2009
[3598]
Oldes, your examples were so far too hard for me to grasp (but I 
am getting there :) ) ... I imagine they are more like what I described 
above as state machines with which you can parse everything even 
structured/nested data. I will need to study charset parsing at some 
point. I agree with your point otherwise but just in this case <> 
& " ' are not alowed in HTML (or at least XHTML) and should always 
be encoded ( but are not always) I think
Oldes
14-Feb-2009
[3599]
You are right.. but if you use it with browser, it works.. web is 
full of not validate pages:).. But I agree, that it was not good 
example.
amacleod
22-Feb-2009
[3600x2]
Is there a way to force parse to inclose results in {} instead of 
double quotes "" regardless of length?
never mind I see my prob...
MaxV
20-Mar-2009
[3602]
Hello everybody!

I have a problem. I need to extract email addresses from a big text 
like

bla bla [me-:-demo-:-com] bla bla ...  <[you-:-example-:-org]>  etc. [he-:-italy-:-it]

There is possible to obtain a text with all the addresses withou 
the "<" and ">"?
Pekr
20-Mar-2009
[3603]
I am not sure I understand what you are upto ....
Maxim
20-Mar-2009
[3604]
do you want both emails within the <> and those without?
Geomol
20-Mar-2009
[3605]
>> str: "bla bla [me-:-demo-:-com] bla bla ...  <[you-:-example-:-org]>  etc. 
[he-:-italy-:-it]"

>> foreach w parse str none [if find e: to-email load w "@" [print 
e]]
[me-:-demo-:-com]
[you-:-example-:-org]
[he-:-italy-:-it]

or something.
Pekr
20-Mar-2009
[3606x3]
eh, nice :-)
Here's absolutly terrible parser - it does NOT follow RFC, allow 
any combination of alpha chars, dots, one @ char, and the same, once 
again to the next space char ...

space: #" "
mailchar: charset [#"0" - #"9" #"A" - #"Z" #"a" - #"z" ".-"]
at-char: #"@"

email: [

   space
   start:
   some mailchar
   at-char
   some mailchar
   end:
   space
   (print copy/part start end)

]


str: "afadfa adfa asdfasdfa fd [asdfas-:-adfadf-:-adfa-adfadfsda-:-com] adfafaf 
a af"

parse/all str [any [email | skip]]
That eliminates email adresses inside of < >, but maybe it was not 
an intention?
btiffin
20-Mar-2009
[3609]
It would be nice if REBOL could LOAD foreign! data.  :)  Hint hint 
wink wink.


And being here in a public REBOL forum I might get in trouble for 
suggesting this one.

$ grep -o -E '\b[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,4}\b' 
files...
Pekr
20-Mar-2009
[3610]
Brian ... you post is broken ... it contains some strange binary 
fragments :-)
Geomol
20-Mar-2009
[3611]
Brian, you can probably do that grep with a few CHARSET and PARSE 
in REBOL.
btiffin
20-Mar-2009
[3612]
And actually I think it's wrong anyway ... as it should be.  Posting 
regex in a REBOL forum ... shame on me.   ;)
MaxV
23-Mar-2009
[3613]
Thank you, I'll try Pekr solution. I don't need the "<" and ">" characters.
However, where I can found some good parse documentation?
Brock
23-Mar-2009
[3614]
Rebol Parse documentation: 
http://www.rebol.com/docs/core23/rebolcore-15.html
Chris
23-Mar-2009
[3615]
http://www.codeconscious.com/rebol/parse-tutorial.html
swall
27-Mar-2009
[3616]
I'm having trouble parsing the "none" datatype from within blocks.
The following example illustrates my problem (hopefully):
junk: [none [1 2 [3 4]]]

parse/all junk [none (print ["nothing"]) text: (print ["text:" mold 
text]) set b block! (print ["block:" mold b])]

This produces the following output:
nothing
text: [none [1 2 [3 4]]]
== false


Notice that the block doesn't get parsed. It seems that parse ignores 
"none" tokens rather than extracting them from the input stream. 
If I put a number in place of none and parse for "number!", then 
the block does indeed get parsed.

Is this a bug or an oversight? Or am I just confused?
Izkata
27-Mar-2009
[3617]
'none isn't a datatype - none! is:

>> parse/all junk [none! (print ["nothing"]) text: (print ["text:" 
mold text]) set b block! (print ["block:" mold b])]              
                           
nothing
text: [[1 2 [3 4]]]
block: [1 2 [3 4]]
== true
swall
27-Mar-2009
[3618x2]
I tried that  but it doesn't seem to work.
I'm getting nothing but 'false being returned.
Correction, I tried it in my actual program, rather than the test 
stub, and it seems to work fine.
Thanks.
Steeve
27-Mar-2009
[3620]
the difference with your program is that [none] is not containing 
the none value but the none word.
if you reduce your example , it mays work
junk: reduce [none [1 2 [3 4]]]
Izkata
27-Mar-2009
[3621]
Ah, forgot to copy that part - I'd done "junk/1: none" to make sure 
it was a none value
swall
27-Mar-2009
[3622]
Steeve: that seems to have done it. thanks for clarifying.
Gabriele
28-Mar-2009
[3623]
or use #[none] instead
Pavel
29-Mar-2009
[3624]
Gabriele what #[none] really does/means? I've seen it few times having 
no clue about its functionality.
Henrik
29-Mar-2009
[3625x2]
Pavel, try:

mold/all none
it's just a serialized version of none!, so you can load it as a 
real none value instead of a word.
[unknown: 5]
29-Mar-2009
[3627]
Pavel, this also works with datatypes.  For example:

>> mold/all string!
== "#[datatype! string!]"


This is useful if your loading values from a file.  This way your 
sure to set a value to a string datatype! when desired.
Gabriele
31-Mar-2009
[3628]
#[none] is the value of the word 'none. It is the literal representation 
of the value of type none!.
Pavel
31-Mar-2009
[3629]
THX for description to all
Janko
15-Apr-2009
[3630]
Hi, I have one question .. can you somehow break out of some loop 
by rebol code .. for example


parse [ aa zzz cc ]  [ some [ set W word! ( ?? W if equal? W 'zzz 
[ break ] ) ] ]  ...  that break doesn't work that way, but is there 
some way to do this? I need to compare W with a runtime value
Graham
15-Apr-2009
[3631]
throw an error?
Janko
15-Apr-2009
[3632]
I solved it in a way that I can just return out of whole function 
(with return) at that point so it's ok .. first I had it thought 
out in a way that I would need to exit the some [ ] loop but continue 
parsing .. error probably wouldn't work that way either? This is 
now my code..match: 

match func [ data rules ] [
	parse rules [ 
		SOME 

  [ 	set L lit-word! ( either equal? L reduce first data [ data: next 
  data ] [ return false ] ) | 
			set W word! ( set :W first data  data: next data ) 
		] 
	]
]
Ammon
16-Apr-2009
[3633]
; Here's one way to do it...

>> digit: charset "1234567890"
== make bitset! #{
000000000000FF03000000000000000000000000000000000000000000000000
}

>> rule: [s: some digit e: (print copy/part s e) | h: #"a" (h: tail 
h) :h | skip ]

== [s: some digit e: (print copy/part s e) | h: #"a" (h: tail h) 
:h | skip]
>> parse "12b34c56a78" [any rule]
12
34
56
== true
Dockimbel
16-Apr-2009
[3634]
Another possible way is by setting at runtime a [break] rule :

branch-rule: [ ]

parse [ aa zzz cc ]  [ 
	some [ 
		set W word! ( 
			?? W
			if equal? W 'zzz [ branch-rule: [ break ] ]
		)
		branch-rule
	]
]
Janko
16-Apr-2009
[3635]
Ah, thanks Ammon and Dockimbel! haven't thought of these two ways 
(well I don't yet fully understant Ammon's)
shadwolf
16-Apr-2009
[3636x4]
charset create a "mask" in bitset form to be compared to the curent 
item read from the string
some digit since digit is a bitset containing the binary image of 
 what you looking for (numbers char from 1 to
that means each content of the string will be compare to the mask 
and if that mach then you proceed to the calculation
the equivalent lame would be someting like foreach a string [ either 
find?  "1234567890" a [ append e a ][probe e clear e ] ]