r3wp [groups: 83 posts: 189283]
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

World: r3wp

[Parse] Discussion of PARSE dialect

amacleod
22-Feb-2009
[3601]
never mind I see my prob...
MaxV
20-Mar-2009
[3602]
Hello everybody!

I have a problem. I need to extract email addresses from a big text 
like

bla bla [me-:-demo-:-com] bla bla ...  <[you-:-example-:-org]>  etc. [he-:-italy-:-it]

There is possible to obtain a text with all the addresses withou 
the "<" and ">"?
Pekr
20-Mar-2009
[3603]
I am not sure I understand what you are upto ....
Maxim
20-Mar-2009
[3604]
do you want both emails within the <> and those without?
Geomol
20-Mar-2009
[3605]
>> str: "bla bla [me-:-demo-:-com] bla bla ...  <[you-:-example-:-org]>  etc. 
[he-:-italy-:-it]"

>> foreach w parse str none [if find e: to-email load w "@" [print 
e]]
[me-:-demo-:-com]
[you-:-example-:-org]
[he-:-italy-:-it]

or something.
Pekr
20-Mar-2009
[3606x3]
eh, nice :-)
Here's absolutly terrible parser - it does NOT follow RFC, allow 
any combination of alpha chars, dots, one @ char, and the same, once 
again to the next space char ...

space: #" "
mailchar: charset [#"0" - #"9" #"A" - #"Z" #"a" - #"z" ".-"]
at-char: #"@"

email: [

   space
   start:
   some mailchar
   at-char
   some mailchar
   end:
   space
   (print copy/part start end)

]


str: "afadfa adfa asdfasdfa fd [asdfas-:-adfadf-:-adfa-adfadfsda-:-com] adfafaf 
a af"

parse/all str [any [email | skip]]
That eliminates email adresses inside of < >, but maybe it was not 
an intention?
btiffin
20-Mar-2009
[3609]
It would be nice if REBOL could LOAD foreign! data.  :)  Hint hint 
wink wink.


And being here in a public REBOL forum I might get in trouble for 
suggesting this one.

$ grep -o -E '\b[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,4}\b' 
files...
Pekr
20-Mar-2009
[3610]
Brian ... you post is broken ... it contains some strange binary 
fragments :-)
Geomol
20-Mar-2009
[3611]
Brian, you can probably do that grep with a few CHARSET and PARSE 
in REBOL.
btiffin
20-Mar-2009
[3612]
And actually I think it's wrong anyway ... as it should be.  Posting 
regex in a REBOL forum ... shame on me.   ;)
MaxV
23-Mar-2009
[3613]
Thank you, I'll try Pekr solution. I don't need the "<" and ">" characters.
However, where I can found some good parse documentation?
Brock
23-Mar-2009
[3614]
Rebol Parse documentation: 
http://www.rebol.com/docs/core23/rebolcore-15.html
Chris
23-Mar-2009
[3615]
http://www.codeconscious.com/rebol/parse-tutorial.html
swall
27-Mar-2009
[3616]
I'm having trouble parsing the "none" datatype from within blocks.
The following example illustrates my problem (hopefully):
junk: [none [1 2 [3 4]]]

parse/all junk [none (print ["nothing"]) text: (print ["text:" mold 
text]) set b block! (print ["block:" mold b])]

This produces the following output:
nothing
text: [none [1 2 [3 4]]]
== false


Notice that the block doesn't get parsed. It seems that parse ignores 
"none" tokens rather than extracting them from the input stream. 
If I put a number in place of none and parse for "number!", then 
the block does indeed get parsed.

Is this a bug or an oversight? Or am I just confused?
Izkata
27-Mar-2009
[3617]
'none isn't a datatype - none! is:

>> parse/all junk [none! (print ["nothing"]) text: (print ["text:" 
mold text]) set b block! (print ["block:" mold b])]              
                           
nothing
text: [[1 2 [3 4]]]
block: [1 2 [3 4]]
== true
swall
27-Mar-2009
[3618x2]
I tried that  but it doesn't seem to work.
I'm getting nothing but 'false being returned.
Correction, I tried it in my actual program, rather than the test 
stub, and it seems to work fine.
Thanks.
Steeve
27-Mar-2009
[3620]
the difference with your program is that [none] is not containing 
the none value but the none word.
if you reduce your example , it mays work
junk: reduce [none [1 2 [3 4]]]
Izkata
27-Mar-2009
[3621]
Ah, forgot to copy that part - I'd done "junk/1: none" to make sure 
it was a none value
swall
27-Mar-2009
[3622]
Steeve: that seems to have done it. thanks for clarifying.
Gabriele
28-Mar-2009
[3623]
or use #[none] instead
Pavel
29-Mar-2009
[3624]
Gabriele what #[none] really does/means? I've seen it few times having 
no clue about its functionality.
Henrik
29-Mar-2009
[3625x2]
Pavel, try:

mold/all none
it's just a serialized version of none!, so you can load it as a 
real none value instead of a word.
[unknown: 5]
29-Mar-2009
[3627]
Pavel, this also works with datatypes.  For example:

>> mold/all string!
== "#[datatype! string!]"


This is useful if your loading values from a file.  This way your 
sure to set a value to a string datatype! when desired.
Gabriele
31-Mar-2009
[3628]
#[none] is the value of the word 'none. It is the literal representation 
of the value of type none!.
Pavel
31-Mar-2009
[3629]
THX for description to all
Janko
15-Apr-2009
[3630]
Hi, I have one question .. can you somehow break out of some loop 
by rebol code .. for example


parse [ aa zzz cc ]  [ some [ set W word! ( ?? W if equal? W 'zzz 
[ break ] ) ] ]  ...  that break doesn't work that way, but is there 
some way to do this? I need to compare W with a runtime value
Graham
15-Apr-2009
[3631]
throw an error?
Janko
15-Apr-2009
[3632]
I solved it in a way that I can just return out of whole function 
(with return) at that point so it's ok .. first I had it thought 
out in a way that I would need to exit the some [ ] loop but continue 
parsing .. error probably wouldn't work that way either? This is 
now my code..match: 

match func [ data rules ] [
	parse rules [ 
		SOME 

  [ 	set L lit-word! ( either equal? L reduce first data [ data: next 
  data ] [ return false ] ) | 
			set W word! ( set :W first data  data: next data ) 
		] 
	]
]
Ammon
16-Apr-2009
[3633]
; Here's one way to do it...

>> digit: charset "1234567890"
== make bitset! #{
000000000000FF03000000000000000000000000000000000000000000000000
}

>> rule: [s: some digit e: (print copy/part s e) | h: #"a" (h: tail 
h) :h | skip ]

== [s: some digit e: (print copy/part s e) | h: #"a" (h: tail h) 
:h | skip]
>> parse "12b34c56a78" [any rule]
12
34
56
== true
Dockimbel
16-Apr-2009
[3634]
Another possible way is by setting at runtime a [break] rule :

branch-rule: [ ]

parse [ aa zzz cc ]  [ 
	some [ 
		set W word! ( 
			?? W
			if equal? W 'zzz [ branch-rule: [ break ] ]
		)
		branch-rule
	]
]
Janko
16-Apr-2009
[3635]
Ah, thanks Ammon and Dockimbel! haven't thought of these two ways 
(well I don't yet fully understant Ammon's)
shadwolf
16-Apr-2009
[3636x5]
charset create a "mask" in bitset form to be compared to the curent 
item read from the string
some digit since digit is a bitset containing the binary image of 
 what you looking for (numbers char from 1 to
that means each content of the string will be compare to the mask 
and if that mach then you proceed to the calculation
the equivalent lame would be someting like foreach a string [ either 
find?  "1234567890" a [ append e a ][probe e clear e ] ]
so the ammon solution using charset / bitset and parse is the totally 
rebolish way
[unknown: 5]
16-Apr-2009
[3641]
parse [aa zzz cc][some [set w word! (?? w cont: if w = 'zzz [[end 
skip]]) cont]]
Ammon
17-Apr-2009
[3642x2]
Essentially what I'm doing with the above code is simply skipping 
to the end of the parse input when a given rule is matched. This 
works because a get-word in the parse rules sets the current parse 
input.  The get-word can be any value of the same type as the original 
parse input.  You can't set the parse input to a string! if a block! 
was provided to parse to start with.
Using your code to do the same thing...

match func [ data rules ] [
	parse rules [ 
		SOME 

  [ 	set L lit-word! blk: ( either equal? L reduce first data [ data: 
  next data ] [ blk: tail blk ] ) :blk | 
			set W word! ( set :W first data  data: next data ) 
		] 
	]
]
Graham
23-Apr-2009
[3644]
I'd like to take an english sentence and tidy it up.  I want to automatically 
apply english grammar to it ... so capitalize the first letter after 
a period, and remove extraneous spaces eg. a comma after a space. 
 Anyone done anything like this with 'parse?
Ammon
24-Apr-2009
[3645]
Not yet but I've been thinking about it for quite a while now... 
 I think I have a pretty good idea what the parse rules should look 
like but I haven't written any code for it yet.
Steeve
24-Apr-2009
[3646]
Good start...

letter: charset [#"a" - #"z" #"A" - #"Z"]
dirt: complement letter
word: [some letter]
clean: [here: dirt :here (remove here)]
space: [here: (insert here #" ") skip]
capital: [here: letter (uppercase/part here 1)]
sentence: [
	some [
		  capital opt word break
		| clean
	]
	any [
		  [#";" | #","] any clean space word
		| #"." any clean space capital opt word
		| #" " word
		| clean
	]
]

parse/all text: {test  test . test;; test ..test } sentence
probe text
>>"Test test. Test; test. Test"
Janko
24-Apr-2009
[3647x2]
I have made auto capitalising first words for some bot once .. it 
wasn't anything special , I can find the code and send it to you
ah, Steeve's already works
Steeve
24-Apr-2009
[3649]
Has to be ehanced indeed
Graham
24-Apr-2009
[3650]
Hey, nice start ...