World: r3wp
[Parse] Discussion of PARSE dialect
older newer | first last |
Anton 5-Mar-2006 [889x2] | Actually, the last number! can probably become opt number! |
for the case when there are no numbers at all. | |
Rebolek 5-Mar-2006 [891] | there is at least one. I f theres at least one, don't do any action. If there are two do one action and so on. |
Anton 5-Mar-2006 [892x2] | Ok, that should be ok then. |
Man, I wish you could do: parse [1][integer! -1 skip] and arrive back at the head of the input. | |
Oldes 5-Mar-2006 [894x2] | infinitive loop? |
(infinite) | |
Geomol 5-Mar-2006 [896x2] | Another possible way: >> parse [1 2 3 4 5] [any [set val number! pos: (if not tail? pos [print val])]] |
Anton, you can do: >> parse [9] [integer! to 1] and arrive back at the beginning. | |
Anton 6-Mar-2006 [898] | Oh yes! forgot about that. :) Great ! |
sqlab 6-Mar-2006 [899x2] | Can you explain this curious results ? REBOL/View 1.3.2.3.1 5-Dec-2005 Core 2.6.3 >> parse [1 2 3 4][any [number! set val number!] (print val)] 4 == true >> parse [1 2 3 4 5 ][any [number! set val number!] (print val)] 4 == false >> parse [1 2 3 4 5 6][any [number! set val number!] (print val)] 6 == true >> parse [1 2 3 4 5 6 7][any [number! set val number!] (print val)] 6 == false >> parse [1 2 3 4 5 6 7 8][any [number! set val number!] (print val)] 8 == true note the results with odd numbers of items! |
Forget my question. I see that the block tries to consume two items.( | |
Anton 6-Mar-2006 [901] | :) |
Geomol 6-Mar-2006 [902] | 'parse' is the path to great explorations and inventions - and also to great confusion and maybe despair. ;-) No really, it can be a bit confusing at times, but I guess, it can't be done otherwise to have such great functionality. There's no short cut with 'parse'. Learning by doing is the way to go. And it's a brilliant tool! |
sqlab 6-Mar-2006 [903x2] | So it is parse [1 2 3 4 b 5][ some [ set val number! v: number! :v (print val)] to end (?? val)] |
too late.( | |
Oldes 7-Mar-2006 [905x4] | Maybe someone will find this usefull: |
count-word-frequency: func[ "Counts word frequency from the given text" text [string!] "text to analyse" /exclude ex [block!] "words which should not be counted" /local counts f wordchars nonwordchars ][ counts: make hash! 100000 wordchars: charset [#"a" - #"z" #"A" - #"Z" "̊؎ύѪ"] nonwordchars: complement wordchars parse/all text [ any nonwordchars any [ copy word some wordchars ( ;probe word if any [not exclude none? find ex word][ either none? f: find/tail counts word [ repend counts [ word 1 ] ][ change f (f/1 + 1) ] ] ) any nonwordchars ] ] counts: to-block counts sort/skip/compare/reverse counts 2 2 new-line/skip counts true 2 ] | |
If you know some other chars, which should be included in the words, please let me know, now it should be complete for czech language and hope that for spanish too (as I use it to count spanish words:). | |
found missing czech chars-> wordchars: charset [#"a" - #"z" #"A" - #"Z" "̊؎ύѪ"] | |
Oldes 13-Mar-2006 [909] | Is this a bug? parse/all {"some words"} {" } ;== ["some words"] parse/all {and "some words"} {" } ;== ["and" "some words"] parse {and "some words"} {" } ;== ["and" "some" "words"] parse {"some words"} {" } ;== ["some words"] |
Geomol 13-Mar-2006 [910] | Good question! It's in a tough corner of REBOL - parsing. REBOL is in many ways more like a human language, than a computer language. Strictly speaking, you can argue, that those examples have a bug or two, but can you live with it? The behaviour might make it difficult to parse input strings, written by humans, because people write all sorts of things. (If it can go wrong, it will.) Try change the quotation marks to something else and see the results change, like: >> parse/all {Xsome wordsX}{X } == ["" "some" "words"] |
Gabriele 13-Mar-2006 [911] | parse, without a rule, treats quotes specially. this is to allow parse to be used directly with things like csv data. |
Oldes 14-Mar-2006 [912x2] | I think it's a bug! I was trying to use this to divide large string to words and found that I have all sentences inside , instead of just words. It's problem only if you have the divider on the edge. |
In the Geomol's example I would expect the result to be ["some" "words"] so it must be bug - it's inconsistent | |
Gabriele 14-Mar-2006 [914] | this behavior is the one intended by Carl. so, it's so by design, and not a bug. but, you may try to convince Carl that you don't like it. ;) |
Oldes 14-Mar-2006 [915x5] | I still think it's a bug - I cannot see the diference between parse and parse/all in this example. If Carl don't want to fix it, no problem for me, I used more complicated rule to do the same thing, just still think, it's a bug and it will confuse more people in the future as well. |
but the true is, that in CSV is logical to have: parse {,d ,d} {,} == ["" "d" "d"] | |
and parse {,"a b, d" ,d} {,} == ["" "a b, d" "d"] (so probably Carl has true;-) | |
But it should be in documentation, that the quotes are very special characters for such a type of parsing! | |
There is also bug in doc: http://www.rebol.com/docs/core23/rebolcore-15.html (section 2 - Simple Splitting) -> there is sentence: "To avoid that action, you can use the /any refinement." where shoud be /all as there is no /any refinement in parse! | |
Graham 14-Mar-2006 [920] | oldes, rambo the documentation problem. |
Oldes 14-Mar-2006 [921] | done |
Thr 4-Apr-2006 [922] | . |
Oldes 28-Apr-2006 [923] | I think it would be good to have some standard place for common parsing rules and charsets used in parse rules, like 'digits, 'spaces' and other, what do you thing? |
Anton 28-Apr-2006 [924] | I like the idea in theory, but what are standard parse rules ? There's an argument already - look, I'm arguing ! :) I would prefer to call the "digit" rule "digits". Also, for this example, it's faster to define and be clear with it: digit: charset "0123456789" than being abstract: (even though it would become well known): digit: system/parse/rules/digit |
JaimeVargas 28-Apr-2006 [925] | Oldes a regex context will be a good addition. Where regex are the basic rules for numbers, white space, *words* and their negations. |
Oldes 28-Apr-2006 [926x5] | anton: I think, that any parse rule which don have to be global variable, but you can still the name used in parse block. But probably it would be a security issue |
regex would be very nice | |
the problem with the idea is, that we are mixing code and parse rules | |
but at least spaces and digits could be used - it means charsets - which could be available during parse without need to define it all the time | |
(but it was just the idea how to improve the 'parse function) | |
Anton 28-Apr-2006 [931] | Hmmm...... |
Gregg 28-Apr-2006 [932x4] | I've thought about that as well. There are some base charsets we could probably standardize on, and that would be good (IMO). Beyond a few basics, though, consensus gets tough. |
The singular/plural argument seems easy, but isn't (IMO); DIGITS could be done as SOME DIGIT, and you could argue that things like 2 DIGITS reads better, though 1 DIGITS does not. You could double-define it, but that gets ugly too. So, what about DIG? That doesn't imply any singularity, though it's a bit terse, and not a full word (or, rather, the wrong full word). | |
I'm all for proposing some basics though. Worst case, you can override them, which is no more work than we do today. | |
space/spc whitespace/wsp alpha digit(s) alpha-num ; should digit be num? ctl/control non-US-ASCII/high-ASCII quoted-string escaped-char ; what is the escape though; REBOL ^, C \, etc.? What other standard sets would we want? | |
Sunanda 28-Apr-2006 [936] | II was sure I'd posted this just after Oldes' message.....But it ain't there now.....Maybe it's in the wrong group) Andrew has a nice starter set: http://www.rebol.org/cgi-bin/cgiwrap/rebol/view-script.r?script=common-parse-values.r And I know he has extended that list extensively to include things like email address and URL |
Gregg 28-Apr-2006 [937x2] | It would be great (again, IMO), if we had parse rules for REBOL datatypes. For those that want the power of block parsing, with the ability to load strings that aren't valid REBOL, it would be very handy. |
Good starter set! I forgot about that. Thanks Sunanda. | |
older newer | first last |