World: r3wp
[Parse] Discussion of PARSE dialect
older newer | first last |
Ammon 16-Apr-2009 [3633] | ; Here's one way to do it... >> digit: charset "1234567890" == make bitset! #{ 000000000000FF03000000000000000000000000000000000000000000000000 } >> rule: [s: some digit e: (print copy/part s e) | h: #"a" (h: tail h) :h | skip ] == [s: some digit e: (print copy/part s e) | h: #"a" (h: tail h) :h | skip] >> parse "12b34c56a78" [any rule] 12 34 56 == true |
Dockimbel 16-Apr-2009 [3634] | Another possible way is by setting at runtime a [break] rule : branch-rule: [ ] parse [ aa zzz cc ] [ some [ set W word! ( ?? W if equal? W 'zzz [ branch-rule: [ break ] ] ) branch-rule ] ] |
Janko 16-Apr-2009 [3635] | Ah, thanks Ammon and Dockimbel! haven't thought of these two ways (well I don't yet fully understant Ammon's) |
shadwolf 16-Apr-2009 [3636x5] | charset create a "mask" in bitset form to be compared to the curent item read from the string |
some digit since digit is a bitset containing the binary image of what you looking for (numbers char from 1 to | |
that means each content of the string will be compare to the mask and if that mach then you proceed to the calculation | |
the equivalent lame would be someting like foreach a string [ either find? "1234567890" a [ append e a ][probe e clear e ] ] | |
so the ammon solution using charset / bitset and parse is the totally rebolish way | |
[unknown: 5] 16-Apr-2009 [3641] | parse [aa zzz cc][some [set w word! (?? w cont: if w = 'zzz [[end skip]]) cont]] |
Ammon 17-Apr-2009 [3642x2] | Essentially what I'm doing with the above code is simply skipping to the end of the parse input when a given rule is matched. This works because a get-word in the parse rules sets the current parse input. The get-word can be any value of the same type as the original parse input. You can't set the parse input to a string! if a block! was provided to parse to start with. |
Using your code to do the same thing... match func [ data rules ] [ parse rules [ SOME [ set L lit-word! blk: ( either equal? L reduce first data [ data: next data ] [ blk: tail blk ] ) :blk | set W word! ( set :W first data data: next data ) ] ] ] | |
Graham 23-Apr-2009 [3644] | I'd like to take an english sentence and tidy it up. I want to automatically apply english grammar to it ... so capitalize the first letter after a period, and remove extraneous spaces eg. a comma after a space. Anyone done anything like this with 'parse? |
Ammon 24-Apr-2009 [3645] | Not yet but I've been thinking about it for quite a while now... I think I have a pretty good idea what the parse rules should look like but I haven't written any code for it yet. |
Steeve 24-Apr-2009 [3646] | Good start... letter: charset [#"a" - #"z" #"A" - #"Z"] dirt: complement letter word: [some letter] clean: [here: dirt :here (remove here)] space: [here: (insert here #" ") skip] capital: [here: letter (uppercase/part here 1)] sentence: [ some [ capital opt word break | clean ] any [ [#";" | #","] any clean space word | #"." any clean space capital opt word | #" " word | clean ] ] parse/all text: {test test . test;; test ..test } sentence probe text >>"Test test. Test; test. Test" |
Janko 24-Apr-2009 [3647x2] | I have made auto capitalising first words for some bot once .. it wasn't anything special , I can find the code and send it to you |
ah, Steeve's already works | |
Steeve 24-Apr-2009 [3649] | Has to be ehanced indeed |
Graham 24-Apr-2009 [3650] | Hey, nice start ... |
Steeve 24-Apr-2009 [3651] | indeed, i'm nice |
Graham 24-Apr-2009 [3652x2] | :) |
have to add #"'" ie. ' to the letter charset | |
Steeve 24-Apr-2009 [3654x2] | #"-" too and what with the numbers ? |
for #"'" you should add a rule to remove spaces | |
Janko 24-Apr-2009 [3656] | Mine was meant so I cold make pretty texts with all upper case in some search engine.. maybe it doesn't work that great in all cases.. smart-uc-after: func [ str sep ] [ parse str [ ANY [ thru sep mark: ( uppercase/part trim mark 1 insert mark " " ) :mark ] ] str ] smart-case: func [ str ] [ calc-with X [ [ lowercase str ] [ uppercase/part X 1 ] [ smart-uc-after X "." ] [ smart-uc-after X "?" ] [ smart-uc-after X "!" ] ]] >> smart-case "HI HOW ARE YOU! we will go. bye!" == "Hi how are you! We will go. Bye! " |
Graham 24-Apr-2009 [3657] | numbers aren't usually part of words. Unless it's trademark like 3M |
Janko 24-Apr-2009 [3658x2] | but mine is also worse because it does 3 parses instead of one like Steeve |
calc-with: func [ 'wrd bs ] [ foreach b bs [ set wrd do b ] ] ; it uses this func also | |
Graham 24-Apr-2009 [3660] | Stevee's looks faster :) |
Janko 24-Apr-2009 [3661] | yes, I agree :) |
Steeve 24-Apr-2009 [3662x4] | this is the rule for #"-" | #"'" any clean word |
with that you supress unwanted spaces. it' s a good day --> "it's a good day" | |
so don't add ""'" as a vali | |
d letter | |
Graham 24-Apr-2009 [3666] | ahh ... |
Steeve 24-Apr-2009 [3667] | do as you want... :-) |
Graham 24-Apr-2009 [3668x2] | trailing "." or "," gets lost |
Also, I think have to add ' to the letter charset because words ending in s can have a trailing ' for possession ... | |
Steeve 24-Apr-2009 [3670] | but what if they have inserted a space after or before ' |
Graham 24-Apr-2009 [3671] | so, Miles' wallet and not Miles's wallet |
Steeve 24-Apr-2009 [3672x4] | hum ok, but you could handle that specif case with a different rule |
of course, for the trailing #".", just add | #"." end | |
or better change the rule: | #"." any clean [end | space capital opt word] | |
parse is just amazing for such simple grammar. A simple add and it's doing all you want. | |
Pekr 3-May-2009 [3676] | Have I found a parse bug? 1) >> parse/all {zybc} [ some ["b" break | "y" break | skip] copy result thru "c" (print result)] bc == true 2) >> parse/all {zybc} [ some ["b" break| "y" break | skip] copy result thru "c" (print result)] ** Script Error: break| has no value ** Near: parse/all "zybc" [some ["b" break| "y" break | skip] copy result thru "c" (print result)] 3) >> parse/all {zybc} [ some ["b" break | "y" break| skip] copy result thru "c" (print result)] == false Such stupid bugs are really making the testing process difficult. I wondered at least 5 minutes, why the result of case 3 was wrong, and then I tried to add space behind the second break, and the code was corrected. How is that second break| does not report error? ;-) |
shadwolf 3-May-2009 [3677x3] | 3) is like 2) you put a | to close of the second break. I noticed on rebol 2 strange reactions with find multi-case too |
in rebol 2 for example if you do if not find str any ["!" ";" ] that will work if you have str with "!" in it but not with ";" but if you invert the position of your find argument like this [";" "!"] then you detect when str have ";" in it but not when it have "!" in it | |
>> str: "tot!;" == "tot!;" >> if not find str any["!" ";"] [ print "found it!!"] == none >> str: "tot;" == "tot;" >> if not find str any["!" ";"] [ print "found it!!"] found it!! | |
Pekr 3-May-2009 [3680x2] | Shadwolf - but that is your bug ;-) Simply put, you try to mix parse-like behaviour with how 'any behaves. 'any and 'all are just functions, so in the case of 'any it returns any true condition match, so any ["!" ";"] always returns "!", because it is evaluated as 'true. |
... so the code above behaves correctly, because in the second case your string does not contain "!" | |
Dockimbel 3-May-2009 [3682] | Pekr, try to run your 2) and 3) in trace mode, you'll see that there's no bug, parse rules evaluation looks consistent to me. |
older newer | first last |