World: r3wp
[Parse] Discussion of PARSE dialect
older newer | first last |
Steeve 24-Apr-2009 [3651] | indeed, i'm nice |
Graham 24-Apr-2009 [3652x2] | :) |
have to add #"'" ie. ' to the letter charset | |
Steeve 24-Apr-2009 [3654x2] | #"-" too and what with the numbers ? |
for #"'" you should add a rule to remove spaces | |
Janko 24-Apr-2009 [3656] | Mine was meant so I cold make pretty texts with all upper case in some search engine.. maybe it doesn't work that great in all cases.. smart-uc-after: func [ str sep ] [ parse str [ ANY [ thru sep mark: ( uppercase/part trim mark 1 insert mark " " ) :mark ] ] str ] smart-case: func [ str ] [ calc-with X [ [ lowercase str ] [ uppercase/part X 1 ] [ smart-uc-after X "." ] [ smart-uc-after X "?" ] [ smart-uc-after X "!" ] ]] >> smart-case "HI HOW ARE YOU! we will go. bye!" == "Hi how are you! We will go. Bye! " |
Graham 24-Apr-2009 [3657] | numbers aren't usually part of words. Unless it's trademark like 3M |
Janko 24-Apr-2009 [3658x2] | but mine is also worse because it does 3 parses instead of one like Steeve |
calc-with: func [ 'wrd bs ] [ foreach b bs [ set wrd do b ] ] ; it uses this func also | |
Graham 24-Apr-2009 [3660] | Stevee's looks faster :) |
Janko 24-Apr-2009 [3661] | yes, I agree :) |
Steeve 24-Apr-2009 [3662x4] | this is the rule for #"-" | #"'" any clean word |
with that you supress unwanted spaces. it' s a good day --> "it's a good day" | |
so don't add ""'" as a vali | |
d letter | |
Graham 24-Apr-2009 [3666] | ahh ... |
Steeve 24-Apr-2009 [3667] | do as you want... :-) |
Graham 24-Apr-2009 [3668x2] | trailing "." or "," gets lost |
Also, I think have to add ' to the letter charset because words ending in s can have a trailing ' for possession ... | |
Steeve 24-Apr-2009 [3670] | but what if they have inserted a space after or before ' |
Graham 24-Apr-2009 [3671] | so, Miles' wallet and not Miles's wallet |
Steeve 24-Apr-2009 [3672x4] | hum ok, but you could handle that specif case with a different rule |
of course, for the trailing #".", just add | #"." end | |
or better change the rule: | #"." any clean [end | space capital opt word] | |
parse is just amazing for such simple grammar. A simple add and it's doing all you want. | |
Pekr 3-May-2009 [3676] | Have I found a parse bug? 1) >> parse/all {zybc} [ some ["b" break | "y" break | skip] copy result thru "c" (print result)] bc == true 2) >> parse/all {zybc} [ some ["b" break| "y" break | skip] copy result thru "c" (print result)] ** Script Error: break| has no value ** Near: parse/all "zybc" [some ["b" break| "y" break | skip] copy result thru "c" (print result)] 3) >> parse/all {zybc} [ some ["b" break | "y" break| skip] copy result thru "c" (print result)] == false Such stupid bugs are really making the testing process difficult. I wondered at least 5 minutes, why the result of case 3 was wrong, and then I tried to add space behind the second break, and the code was corrected. How is that second break| does not report error? ;-) |
shadwolf 3-May-2009 [3677x3] | 3) is like 2) you put a | to close of the second break. I noticed on rebol 2 strange reactions with find multi-case too |
in rebol 2 for example if you do if not find str any ["!" ";" ] that will work if you have str with "!" in it but not with ";" but if you invert the position of your find argument like this [";" "!"] then you detect when str have ";" in it but not when it have "!" in it | |
>> str: "tot!;" == "tot!;" >> if not find str any["!" ";"] [ print "found it!!"] == none >> str: "tot;" == "tot;" >> if not find str any["!" ";"] [ print "found it!!"] found it!! | |
Pekr 3-May-2009 [3680x2] | Shadwolf - but that is your bug ;-) Simply put, you try to mix parse-like behaviour with how 'any behaves. 'any and 'all are just functions, so in the case of 'any it returns any true condition match, so any ["!" ";"] always returns "!", because it is evaluated as 'true. |
... so the code above behaves correctly, because in the second case your string does not contain "!" | |
Dockimbel 3-May-2009 [3682x2] | Pekr, try to run your 2) and 3) in trace mode, you'll see that there's no bug, parse rules evaluation looks consistent to me. |
In 3), the second 'break| doesn't report error because it's never evaluated. The rule fails on the first input character when trying to match "y" and 'skip is never reached. In 2), 'skip helps consuming the input until the "y" character which leads to evaluate 'break| and raises the error. | |
Pekr 3-May-2009 [3684] | yes, you might be right doc. But - it is really very difficult to track down for user. It almost looks like scanner bug, but it is not. What actually happens in the case 3) is, that "break|" is being considered a regular word, which just does not have value. Stating that, it also means that 'skip is not part of OR expression. So, 'some block fails on not matching "y" .... |
Graham 16-May-2009 [3685x3] | Here's a parse question for the experts. |
If I have a document with headings eg. a: b: .. z: and text optionally under each heading ... would it be possible to use parse to collect all the text from each heading if the headings are in any order and some headings with no text are optionally missing? | |
Each heading can only occur once in the document. | |
Maxim 16-May-2009 [3688] | sure |
Graham 16-May-2009 [3689] | Ok, let me rephrase that .. sure it's possible, but I can imagine it would be quite complicated |
Maxim 16-May-2009 [3690x2] | now was that a question of the "can you give me the solution" kind? |
actually it can be done quite simply... depends on the headers themselves... | |
Graham 16-May-2009 [3692] | It's a little complicated because the headers can have spaces in them. |
Maxim 16-May-2009 [3693x2] | spaces add no complication to the system, as long as the headers can be identified without doubt. |
so the rule is : headers start on new line, stop at first ":" all the rest is content? | |
Graham 16-May-2009 [3695] | now if you have a rule copy text [ to "a:" | to "b:" .... ] but if b: occurs before a: in the text, then you will include a header in copied text |
Maxim 16-May-2009 [3696] | forget to and thru... they are not proper parsing. |
Graham 16-May-2009 [3697] | yes, headers start on a newline and terminate in ":" |
Maxim 16-May-2009 [3698] | and there can be no ":" within the content? |
Graham 16-May-2009 [3699x2] | No, there can be a ":" in the content |
but you know what the headers are ... so that's not a big problem. | |
older newer | first last |