World: r3wp
[Parse] Discussion of PARSE dialect
older newer | first last |
Anton 14-Feb-2009 [3589x2] | Ah, here it's good to use nested rules to cut down the code. |
apiece: [copy T to "end" (?? T)] parse a [some [thru "start2" apiece | thru "start1" apiece] to end] | |
Janko 14-Feb-2009 [3591x2] | This is basically not a problem , as I solve these things wiht multiple passes and it works more than fast enought for me that way also ... I think this problem would not exist if in case of [ .. | .. | .. ] parse would check all options and take the one stat is least characters away from current position (that comes true the first) .. but this would most probably slow down the parse and you would loose the feature that you define "priority" with [ .. | .. | .. ] now .. so maybe if there would be a different | for this |
( I have to go to eat... will be back .. thanks a lot for before) | |
Anton 14-Feb-2009 [3593] | no worries - I must sleep. :) |
Janko 14-Feb-2009 [3594x2] | hm.. interesting solution .. never thought of doing it this way!! this would maybe solve these problems I had |
hm.. really thanks for this example.. I took it as unsolvable, but this is totaly elegant way to solve it .. I will need to think on this a little and do some more examples to difest it :) thanks | |
Anton 14-Feb-2009 [3596] | Not 100% elegant yet ! But glad to help, anyway. |
Oldes 14-Feb-2009 [3597] | If you need to parse complex structures, like the marup language, you should use charsets and not 'to or 'thru commands... for example you cannot say that tag starts with < and ends with > because such a tag is valid as well: <input value="<>"> The 'to and 'thru commands are useful, if you, for example, do datamining and don't care to parse all page structure to get just a bit of information from it. |
Janko 14-Feb-2009 [3598] | Oldes, your examples were so far too hard for me to grasp (but I am getting there :) ) ... I imagine they are more like what I described above as state machines with which you can parse everything even structured/nested data. I will need to study charset parsing at some point. I agree with your point otherwise but just in this case <> & " ' are not alowed in HTML (or at least XHTML) and should always be encoded ( but are not always) I think |
Oldes 14-Feb-2009 [3599] | You are right.. but if you use it with browser, it works.. web is full of not validate pages:).. But I agree, that it was not good example. |
amacleod 22-Feb-2009 [3600x2] | Is there a way to force parse to inclose results in {} instead of double quotes "" regardless of length? |
never mind I see my prob... | |
MaxV 20-Mar-2009 [3602] | Hello everybody! I have a problem. I need to extract email addresses from a big text like bla bla [me-:-demo-:-com] bla bla ... <[you-:-example-:-org]> etc. [he-:-italy-:-it] There is possible to obtain a text with all the addresses withou the "<" and ">"? |
Pekr 20-Mar-2009 [3603] | I am not sure I understand what you are upto .... |
Maxim 20-Mar-2009 [3604] | do you want both emails within the <> and those without? |
Geomol 20-Mar-2009 [3605] | >> str: "bla bla [me-:-demo-:-com] bla bla ... <[you-:-example-:-org]> etc. [he-:-italy-:-it]" >> foreach w parse str none [if find e: to-email load w "@" [print e]] [me-:-demo-:-com] [you-:-example-:-org] [he-:-italy-:-it] or something. |
Pekr 20-Mar-2009 [3606x3] | eh, nice :-) |
Here's absolutly terrible parser - it does NOT follow RFC, allow any combination of alpha chars, dots, one @ char, and the same, once again to the next space char ... space: #" " mailchar: charset [#"0" - #"9" #"A" - #"Z" #"a" - #"z" ".-"] at-char: #"@" email: [ space start: some mailchar at-char some mailchar end: space (print copy/part start end) ] str: "afadfa adfa asdfasdfa fd [asdfas-:-adfadf-:-adfa-adfadfsda-:-com] adfafaf a af" parse/all str [any [email | skip]] | |
That eliminates email adresses inside of < >, but maybe it was not an intention? | |
btiffin 20-Mar-2009 [3609] | It would be nice if REBOL could LOAD foreign! data. :) Hint hint wink wink. And being here in a public REBOL forum I might get in trouble for suggesting this one. $ grep -o -E '\b[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,4}\b' files... |
Pekr 20-Mar-2009 [3610] | Brian ... you post is broken ... it contains some strange binary fragments :-) |
Geomol 20-Mar-2009 [3611] | Brian, you can probably do that grep with a few CHARSET and PARSE in REBOL. |
btiffin 20-Mar-2009 [3612] | And actually I think it's wrong anyway ... as it should be. Posting regex in a REBOL forum ... shame on me. ;) |
MaxV 23-Mar-2009 [3613] | Thank you, I'll try Pekr solution. I don't need the "<" and ">" characters. However, where I can found some good parse documentation? |
Brock 23-Mar-2009 [3614] | Rebol Parse documentation: http://www.rebol.com/docs/core23/rebolcore-15.html |
Chris 23-Mar-2009 [3615] | http://www.codeconscious.com/rebol/parse-tutorial.html |
swall 27-Mar-2009 [3616] | I'm having trouble parsing the "none" datatype from within blocks. The following example illustrates my problem (hopefully): junk: [none [1 2 [3 4]]] parse/all junk [none (print ["nothing"]) text: (print ["text:" mold text]) set b block! (print ["block:" mold b])] This produces the following output: nothing text: [none [1 2 [3 4]]] == false Notice that the block doesn't get parsed. It seems that parse ignores "none" tokens rather than extracting them from the input stream. If I put a number in place of none and parse for "number!", then the block does indeed get parsed. Is this a bug or an oversight? Or am I just confused? |
Izkata 27-Mar-2009 [3617] | 'none isn't a datatype - none! is: >> parse/all junk [none! (print ["nothing"]) text: (print ["text:" mold text]) set b block! (print ["block:" mold b])] nothing text: [[1 2 [3 4]]] block: [1 2 [3 4]] == true |
swall 27-Mar-2009 [3618x2] | I tried that but it doesn't seem to work. I'm getting nothing but 'false being returned. |
Correction, I tried it in my actual program, rather than the test stub, and it seems to work fine. Thanks. | |
Steeve 27-Mar-2009 [3620] | the difference with your program is that [none] is not containing the none value but the none word. if you reduce your example , it mays work junk: reduce [none [1 2 [3 4]]] |
Izkata 27-Mar-2009 [3621] | Ah, forgot to copy that part - I'd done "junk/1: none" to make sure it was a none value |
swall 27-Mar-2009 [3622] | Steeve: that seems to have done it. thanks for clarifying. |
Gabriele 28-Mar-2009 [3623] | or use #[none] instead |
Pavel 29-Mar-2009 [3624] | Gabriele what #[none] really does/means? I've seen it few times having no clue about its functionality. |
Henrik 29-Mar-2009 [3625x2] | Pavel, try: mold/all none |
it's just a serialized version of none!, so you can load it as a real none value instead of a word. | |
[unknown: 5] 29-Mar-2009 [3627] | Pavel, this also works with datatypes. For example: >> mold/all string! == "#[datatype! string!]" This is useful if your loading values from a file. This way your sure to set a value to a string datatype! when desired. |
Gabriele 31-Mar-2009 [3628] | #[none] is the value of the word 'none. It is the literal representation of the value of type none!. |
Pavel 31-Mar-2009 [3629] | THX for description to all |
Janko 15-Apr-2009 [3630] | Hi, I have one question .. can you somehow break out of some loop by rebol code .. for example parse [ aa zzz cc ] [ some [ set W word! ( ?? W if equal? W 'zzz [ break ] ) ] ] ... that break doesn't work that way, but is there some way to do this? I need to compare W with a runtime value |
Graham 15-Apr-2009 [3631] | throw an error? |
Janko 15-Apr-2009 [3632] | I solved it in a way that I can just return out of whole function (with return) at that point so it's ok .. first I had it thought out in a way that I would need to exit the some [ ] loop but continue parsing .. error probably wouldn't work that way either? This is now my code..match: match func [ data rules ] [ parse rules [ SOME [ set L lit-word! ( either equal? L reduce first data [ data: next data ] [ return false ] ) | set W word! ( set :W first data data: next data ) ] ] ] |
Ammon 16-Apr-2009 [3633] | ; Here's one way to do it... >> digit: charset "1234567890" == make bitset! #{ 000000000000FF03000000000000000000000000000000000000000000000000 } >> rule: [s: some digit e: (print copy/part s e) | h: #"a" (h: tail h) :h | skip ] == [s: some digit e: (print copy/part s e) | h: #"a" (h: tail h) :h | skip] >> parse "12b34c56a78" [any rule] 12 34 56 == true |
Dockimbel 16-Apr-2009 [3634] | Another possible way is by setting at runtime a [break] rule : branch-rule: [ ] parse [ aa zzz cc ] [ some [ set W word! ( ?? W if equal? W 'zzz [ branch-rule: [ break ] ] ) branch-rule ] ] |
Janko 16-Apr-2009 [3635] | Ah, thanks Ammon and Dockimbel! haven't thought of these two ways (well I don't yet fully understant Ammon's) |
shadwolf 16-Apr-2009 [3636x3] | charset create a "mask" in bitset form to be compared to the curent item read from the string |
some digit since digit is a bitset containing the binary image of what you looking for (numbers char from 1 to | |
that means each content of the string will be compare to the mask and if that mach then you proceed to the calculation | |
older newer | first last |