World: r3wp
[Parse] Discussion of PARSE dialect
older newer | first last |
Anton 14-Feb-2009 [3593] | no worries - I must sleep. :) |
Janko 14-Feb-2009 [3594x2] | hm.. interesting solution .. never thought of doing it this way!! this would maybe solve these problems I had |
hm.. really thanks for this example.. I took it as unsolvable, but this is totaly elegant way to solve it .. I will need to think on this a little and do some more examples to difest it :) thanks | |
Anton 14-Feb-2009 [3596] | Not 100% elegant yet ! But glad to help, anyway. |
Oldes 14-Feb-2009 [3597] | If you need to parse complex structures, like the marup language, you should use charsets and not 'to or 'thru commands... for example you cannot say that tag starts with < and ends with > because such a tag is valid as well: <input value="<>"> The 'to and 'thru commands are useful, if you, for example, do datamining and don't care to parse all page structure to get just a bit of information from it. |
Janko 14-Feb-2009 [3598] | Oldes, your examples were so far too hard for me to grasp (but I am getting there :) ) ... I imagine they are more like what I described above as state machines with which you can parse everything even structured/nested data. I will need to study charset parsing at some point. I agree with your point otherwise but just in this case <> & " ' are not alowed in HTML (or at least XHTML) and should always be encoded ( but are not always) I think |
Oldes 14-Feb-2009 [3599] | You are right.. but if you use it with browser, it works.. web is full of not validate pages:).. But I agree, that it was not good example. |
amacleod 22-Feb-2009 [3600x2] | Is there a way to force parse to inclose results in {} instead of double quotes "" regardless of length? |
never mind I see my prob... | |
MaxV 20-Mar-2009 [3602] | Hello everybody! I have a problem. I need to extract email addresses from a big text like bla bla [me-:-demo-:-com] bla bla ... <[you-:-example-:-org]> etc. [he-:-italy-:-it] There is possible to obtain a text with all the addresses withou the "<" and ">"? |
Pekr 20-Mar-2009 [3603] | I am not sure I understand what you are upto .... |
Maxim 20-Mar-2009 [3604] | do you want both emails within the <> and those without? |
Geomol 20-Mar-2009 [3605] | >> str: "bla bla [me-:-demo-:-com] bla bla ... <[you-:-example-:-org]> etc. [he-:-italy-:-it]" >> foreach w parse str none [if find e: to-email load w "@" [print e]] [me-:-demo-:-com] [you-:-example-:-org] [he-:-italy-:-it] or something. |
Pekr 20-Mar-2009 [3606x3] | eh, nice :-) |
Here's absolutly terrible parser - it does NOT follow RFC, allow any combination of alpha chars, dots, one @ char, and the same, once again to the next space char ... space: #" " mailchar: charset [#"0" - #"9" #"A" - #"Z" #"a" - #"z" ".-"] at-char: #"@" email: [ space start: some mailchar at-char some mailchar end: space (print copy/part start end) ] str: "afadfa adfa asdfasdfa fd [asdfas-:-adfadf-:-adfa-adfadfsda-:-com] adfafaf a af" parse/all str [any [email | skip]] | |
That eliminates email adresses inside of < >, but maybe it was not an intention? | |
btiffin 20-Mar-2009 [3609] | It would be nice if REBOL could LOAD foreign! data. :) Hint hint wink wink. And being here in a public REBOL forum I might get in trouble for suggesting this one. $ grep -o -E '\b[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,4}\b' files... |
Pekr 20-Mar-2009 [3610] | Brian ... you post is broken ... it contains some strange binary fragments :-) |
Geomol 20-Mar-2009 [3611] | Brian, you can probably do that grep with a few CHARSET and PARSE in REBOL. |
btiffin 20-Mar-2009 [3612] | And actually I think it's wrong anyway ... as it should be. Posting regex in a REBOL forum ... shame on me. ;) |
MaxV 23-Mar-2009 [3613] | Thank you, I'll try Pekr solution. I don't need the "<" and ">" characters. However, where I can found some good parse documentation? |
Brock 23-Mar-2009 [3614] | Rebol Parse documentation: http://www.rebol.com/docs/core23/rebolcore-15.html |
Chris 23-Mar-2009 [3615] | http://www.codeconscious.com/rebol/parse-tutorial.html |
swall 27-Mar-2009 [3616] | I'm having trouble parsing the "none" datatype from within blocks. The following example illustrates my problem (hopefully): junk: [none [1 2 [3 4]]] parse/all junk [none (print ["nothing"]) text: (print ["text:" mold text]) set b block! (print ["block:" mold b])] This produces the following output: nothing text: [none [1 2 [3 4]]] == false Notice that the block doesn't get parsed. It seems that parse ignores "none" tokens rather than extracting them from the input stream. If I put a number in place of none and parse for "number!", then the block does indeed get parsed. Is this a bug or an oversight? Or am I just confused? |
Izkata 27-Mar-2009 [3617] | 'none isn't a datatype - none! is: >> parse/all junk [none! (print ["nothing"]) text: (print ["text:" mold text]) set b block! (print ["block:" mold b])] nothing text: [[1 2 [3 4]]] block: [1 2 [3 4]] == true |
swall 27-Mar-2009 [3618x2] | I tried that but it doesn't seem to work. I'm getting nothing but 'false being returned. |
Correction, I tried it in my actual program, rather than the test stub, and it seems to work fine. Thanks. | |
Steeve 27-Mar-2009 [3620] | the difference with your program is that [none] is not containing the none value but the none word. if you reduce your example , it mays work junk: reduce [none [1 2 [3 4]]] |
Izkata 27-Mar-2009 [3621] | Ah, forgot to copy that part - I'd done "junk/1: none" to make sure it was a none value |
swall 27-Mar-2009 [3622] | Steeve: that seems to have done it. thanks for clarifying. |
Gabriele 28-Mar-2009 [3623] | or use #[none] instead |
Pavel 29-Mar-2009 [3624] | Gabriele what #[none] really does/means? I've seen it few times having no clue about its functionality. |
Henrik 29-Mar-2009 [3625x2] | Pavel, try: mold/all none |
it's just a serialized version of none!, so you can load it as a real none value instead of a word. | |
[unknown: 5] 29-Mar-2009 [3627] | Pavel, this also works with datatypes. For example: >> mold/all string! == "#[datatype! string!]" This is useful if your loading values from a file. This way your sure to set a value to a string datatype! when desired. |
Gabriele 31-Mar-2009 [3628] | #[none] is the value of the word 'none. It is the literal representation of the value of type none!. |
Pavel 31-Mar-2009 [3629] | THX for description to all |
Janko 15-Apr-2009 [3630] | Hi, I have one question .. can you somehow break out of some loop by rebol code .. for example parse [ aa zzz cc ] [ some [ set W word! ( ?? W if equal? W 'zzz [ break ] ) ] ] ... that break doesn't work that way, but is there some way to do this? I need to compare W with a runtime value |
Graham 15-Apr-2009 [3631] | throw an error? |
Janko 15-Apr-2009 [3632] | I solved it in a way that I can just return out of whole function (with return) at that point so it's ok .. first I had it thought out in a way that I would need to exit the some [ ] loop but continue parsing .. error probably wouldn't work that way either? This is now my code..match: match func [ data rules ] [ parse rules [ SOME [ set L lit-word! ( either equal? L reduce first data [ data: next data ] [ return false ] ) | set W word! ( set :W first data data: next data ) ] ] ] |
Ammon 16-Apr-2009 [3633] | ; Here's one way to do it... >> digit: charset "1234567890" == make bitset! #{ 000000000000FF03000000000000000000000000000000000000000000000000 } >> rule: [s: some digit e: (print copy/part s e) | h: #"a" (h: tail h) :h | skip ] == [s: some digit e: (print copy/part s e) | h: #"a" (h: tail h) :h | skip] >> parse "12b34c56a78" [any rule] 12 34 56 == true |
Dockimbel 16-Apr-2009 [3634] | Another possible way is by setting at runtime a [break] rule : branch-rule: [ ] parse [ aa zzz cc ] [ some [ set W word! ( ?? W if equal? W 'zzz [ branch-rule: [ break ] ] ) branch-rule ] ] |
Janko 16-Apr-2009 [3635] | Ah, thanks Ammon and Dockimbel! haven't thought of these two ways (well I don't yet fully understant Ammon's) |
shadwolf 16-Apr-2009 [3636x5] | charset create a "mask" in bitset form to be compared to the curent item read from the string |
some digit since digit is a bitset containing the binary image of what you looking for (numbers char from 1 to | |
that means each content of the string will be compare to the mask and if that mach then you proceed to the calculation | |
the equivalent lame would be someting like foreach a string [ either find? "1234567890" a [ append e a ][probe e clear e ] ] | |
so the ammon solution using charset / bitset and parse is the totally rebolish way | |
[unknown: 5] 16-Apr-2009 [3641] | parse [aa zzz cc][some [set w word! (?? w cont: if w = 'zzz [[end skip]]) cont]] |
Ammon 17-Apr-2009 [3642] | Essentially what I'm doing with the above code is simply skipping to the end of the parse input when a given rule is matched. This works because a get-word in the parse rules sets the current parse input. The get-word can be any value of the same type as the original parse input. You can't set the parse input to a string! if a block! was provided to parse to start with. |
older newer | first last |