World: r3wp
[!REBOL3]
older newer | first last |
Pekr 15-Jul-2011 [9183] | Geomol - yes, you need to write a wrapper for each DLL you are about to utilise ... |
Andreas 15-Jul-2011 [9184] | Geomol, yes, the "!REBOL3 /library" group is about R2/Library-style access to DSOs. I.e. using pre-existing DSOs from within REBOL. The "!REBOL3 Extensions" group is for discussion about native R3 extensions. I.e. writing special-purpose DSOs which can export native-like functions into R3. |
Henrik 16-Jul-2011 [9185x2] | http://curecode.org/rebol3/ticket.rsp?id=1888&cursor=1 This doesn't look like a bug to me. Anyone? |
http://curecode.org/rebol3/ticket.rsp?id=1886&cursor=3 This one looks fixable, as it's a mezzanine. | |
Steeve 16-Jul-2011 [9187] | About parse: Always been like that, nothing new. |
BrianH 16-Jul-2011 [9188] | #1888 is definitely not a bug. #1886 should be looked at by the person who knows what SPLIT is supposed to do. It wasn't one of mine, and there was never really any consensus about its behavior. SPLIT isn't finished yet. |
Gregg 17-Jul-2011 [9189x16] | I don't know where the test suite for SPLIT is, but the rule in effect for that changed from the old source that Gabriele and I originally created. The final rule, for string/char/bitset delimiters was originally this: [any [mk1: some [mk2: dlm break | skip] (emit copy/part mk1 mk2)]] but is now this: [any [mk1: [to dlm mk2: dlm | to end mk2:] (keep copy/part mk1 mk2)]] It looks like that changed due to http://issue.cc/r3/573, but obviously wasn't run through a test suite. I don't know what caused the issue with the above bug, as that parse rule returns a correct result. |
Found a small test suite. | |
test: func [block] [ print [mold/only :block newline tab mold do block] ] test [split "1234567812345678" 4] ;== ["1234" "5678" "1234" "5678"] test [split "1234567812345678" 3] ;== ["123" "456" "781" "234" "567" "8"] test [split "1234567812345678" 5] ;== ["12345" "67812" "34567" "8"] test [split/into [1 2 3 4 5 6] 2] ;== [[1 2 3] [4 5 6]] test [split/into "1234567812345678" 2] ;== ["12345678" "12345678"] test [split/into "1234567812345678" 3] ;== ["12345" "67812" "345678"] test [split/into "1234567812345678" 5] ;== ["123" "456" "781" "234" "5678"] test [split [1 2 3 4 5 6] [2 1 3]] ;== [[1 2] [3] [4 5 6]] test [split "1234567812345678" [4 4 2 2 1 1 1 1]] ;== ["1234" "5678" "12" "34" "5" "6" "7" "8"] test [split first [(1 2 3 4 5 6 7 8 9)] 3] ;== [(1 2 3) (4 5 6) (7 8 9)] test [split #{0102030405060708090A} [4 3 1 2]] ;== [#{01020304} #{050607} #{08} #{090A}] test [split [1 2 3 4 5 6] [2 1]] ;== [[1 2] [3]] test [split [1 2 3 4 5 6] [2 1 3 5]] ;== [[1 2] [3] [4 5 6] []] test [split [1 2 3 4 5 6] [2 1 6]] ;== [[1 2] [3] [4 5 6]] test [split [1 2 3 4 5 6] [3 2 2 -1 -4 3 -2]] ;== [[1 2 3] [4 5] [6] [6] [2 3 4 5] [2 3 4] [3 4]] test [split "abc,de,fghi,jk" #","] ;== ["abc" "de" "fghi" "jk"] test [split "abc<br>de<br>fghi<br>jk" <br>] ;== ["abc" "de" "fghi" "jk"] test [split "abc|de/fghi:jk" charset "|/:"] ;== ["abc" "de" "fghi" "jk"] test [split "abc^M^Jde^Mfghi^Jjk" [crlf | #"^M" | newline]] ;== ["abc" "de" "fghi" "jk"] test [split "abc de fghi jk" [some #" "]] ;== ["abc" "de" "fghi" "jk"] | |
The original was written before MAP-EACH and the new COLLECT. Here is the source I have, updated to use those as the current version does, but with the last rule reverted to the original. Related cc reports: http://issue.cc/r3/1096 http://issue.cc/r3/690 split: func [ "Split a series into pieces; fixed or variable size, fixed number, or at delimiters" series [series!] "The series to split" dlm [block! integer! char! bitset! any-string!] "Split size, delimiter(s), or rule(s)." /into "If dlm is an integer, split into n pieces, rather than pieces of length n." /local size count mk1 mk2 ][ either all [block? dlm parse dlm [some integer!]] [ map-each len dlm [ either positive? len [ copy/part series series: skip series len ] [ series: skip series negate len ; return unset so that nothing is added to output () ] ] ][ size: dlm ; alias for readability collect [ parse/all series case [ all [integer? size into] [ if size < 1 [cause-error 'Script 'invalid-arg size] count: size - 1 size: round/down divide length? series size [ count [copy series size skip (keep/only series)] copy series to end (keep/only series) ] ] integer? dlm [ if size < 1 [cause-error 'Script 'invalid-arg size] [any [copy series 1 size skip (keep/only series)]] ] 'else [ ; = any [bitset? dlm any-string? dlm char? dlm] [any [mk1: some [mk2: dlm break | skip] (keep copy/part mk1 mk2)]] ] ] ] ] ] | |
>> split "a.b.c" "." == ["a" "b" "c"] >> split "c c" " " == ["c" "c"] >> split "1," " " == ["1,"] >> split "1,2" " " == ["1,2"] >> split "c,c" "," == ["c" "c"] >> split/into "" 1 == [""] >> split/into "" 2 == ["" ""] >> split "This! is a. test? to see " charset "!?." == ["This" " is a" " test" " to see "] | |
The test case that fails with this is where the delimiter is the last char. You don't get an empty field at the end. >> split "1,2,3," "," == ["1" "2" "3"] | |
I found some notes that at one point to/thru broke for block and bitset targets. | |
ROUND not returning an integer broke some things too. i.e. currently broken. | |
Another bug has crept in somewhere along the way: series: skip series negate len The NEGATE messes up the skip of the already negative value, which breaks cases like this: >> split [1 2 3 4 5 6] [3 2 2 -2 2 -4 3] == [[1 2 3] [4 5] [6] [5 6] [3 4 5]] | |
Updated SPLIT. Please test, add tests cases, comment, and critique. If you look at the special processing section, and are offended by it, feel free to impove it. Well, feel free to improve any of it. I haven't checked the doc page to see if all the examples work as they are doc'd, which also needs to be done. | |
split: func [ "Split a series into pieces; fixed or variable size, fixed number, or at delimiters" series [series!] "The series to split" dlm [block! integer! char! bitset! any-string!] "Split size, delimiter(s), or rule(s)." /into "If dlm is an integer, split into n pieces, rather than pieces of length n." /local size piece-size count mk1 mk2 res fill-val add-fill-val ][ either all [block? dlm parse dlm [some integer!]] [ map-each len dlm [ either positive? len [ copy/part series series: skip series len ] [ series: skip series len ; return unset so that nothing is added to output () ] ] ][ size: dlm ; alias for readability res: collect [ parse/all series case [ all [integer? size into] [ if size < 1 [cause-error 'Script 'invalid-arg size] count: size - 1 piece-size: to integer! round/down divide length? series size if zero? piece-size [piece-size: 1] [ count [copy series piece-size skip (keep/only series)] copy series to end (keep/only series) ] ] integer? dlm [ if size < 1 [cause-error 'Script 'invalid-arg size] [any [copy series 1 size skip (keep/only series)]] ] 'else [ ; = any [bitset? dlm any-string? dlm char? dlm] [any [mk1: some [mk2: dlm break | skip] (keep/only copy/part mk1 mk2)]] ] ] ] ;-- Special processing, to handle cases where the spec'd more items in ; /into than the series contains (so we want to append empty items), ; or where the dlm was a char/string/charset and it was the last char ; (so we want to append an empty field that the above rule misses). fill-val: does [copy either any-block? series [[]] [""]] add-fill-val: does [append/only res fill-val] case [ all [integer? size into] [ ; If the result is too short, i.e., less items than 'size, add ; empty items to fill it to 'size. ; We loop here, because insert/dup doesn't copy the value inserted. if size > length? res [ loop (size - length? res) [add-fill-val] ] ] ; integer? dlm [ ; ] 'else [ ; = any [bitset? dlm any-string? dlm char? dlm] ; If the last thing in the series is a delimiter, there is an ; implied empty field after it, which we add here. case [ bitset? dlm [ ; ATTEMPT is here because LAST will return NONE for an ; empty series, and finding none in a bitest is not allowed. if attempt [find dlm last series] [add-fill-val] ] char? dlm [ if dlm = last series [add-fill-val] ] string? dlm [ if all [ find series dlm empty? find/last/tail series dlm ] [add-fill-val] ] ] ] ] res ] ] | |
test: func [block expected-result /local res] [ if error? try [ print [mold/only :block newline tab mold res: do block] if res <> expected-result [print [tab 'FAILED! tab 'expected mold expected-result]] ][ print [mold/only :block newline tab "ERROR!"] ] ] | |
test [split "1234567812345678" 4] ["1234" "5678" "1234" "5678"] test [split "1234567812345678" 3] ["123" "456" "781" "234" "567" "8"] test [split "1234567812345678" 5] ["12345" "67812" "34567" "8"] test [split/into [1 2 3 4 5 6] 2] [[1 2 3] [4 5 6]] test [split/into "1234567812345678" 2] ["12345678" "12345678"] test [split/into "1234567812345678" 3] ["12345" "67812" "345678"] test [split/into "1234567812345678" 5] ["123" "456" "781" "234" "5678"] test [split/into "123" 6] ["1" "2" "3" "" "" ""] test [split/into [1 2 3] 6] [[1] [2] [3] [] [] []] test [split [1 2 3 4 5 6] [2 1 3]] [[1 2] [3] [4 5 6]] test [split "1234567812345678" [4 4 2 2 1 1 1 1]] ["1234" "5678" "12" "34" "5" "6" "7" "8"] test [split first [(1 2 3 4 5 6 7 8 9)] 3] [(1 2 3) (4 5 6) (7 8 9)] test [split #{0102030405060708090A} [4 3 1 2]] [#{01020304} #{050607} #{08} #{090A}] test [split [1 2 3 4 5 6] [2 1]] [[1 2] [3]] test [split [1 2 3 4 5 6] [2 1 3 5]] [[1 2] [3] [4 5 6] []] test [split [1 2 3 4 5 6] [2 1 6]] [[1 2] [3] [4 5 6]] test [split [1 2 3 4 5 6] [3 2 2 -2 2 -4 3]] [[1 2 3] [4 5] [6] [5 6] [3 4 5]] test [split "abc,de,fghi,jk" #","] ["abc" "de" "fghi" "jk"] test [split "abc<br>de<br>fghi<br>jk" <br>] ["abc" "de" "fghi" "jk"] test [split "a.b.c" "."] ["a" "b" "c"] test [split "c c" " "] ["c" "c"] test [split "1,2,3" " "] ["1,2,3"] test [split "1,2,3" ","] ["1" "2" "3"] test [split "1,2,3," ","] ["1" "2" "3" ""] test [split "1,2,3," charset ",."] ["1" "2" "3" ""] test [split "1.2,3." charset ",."] ["1" "2" "3" ""] test [split "abc|de/fghi:jk" charset "|/:"] ["abc" "de" "fghi" "jk"] test [split "abc^M^Jde^Mfghi^Jjk" [crlf | #"^M" | newline]] ["abc" "de" "fghi" "jk"] test [split "abc de fghi jk" [some #" "]] ["abc" "de" "fghi" "jk"] | |
A quick scan of the docs showed that negative skip val usage changed from the original design. I will revert the negate on those to match the doc'd behavior. | |
split: func [ "Split a series into pieces; fixed or variable size, fixed number, or at delimiters" series [series!] "The series to split" dlm [block! integer! char! bitset! any-string!] "Split size, delimiter(s), or rule(s)." /into "If dlm is an integer, split into n pieces, rather than pieces of length n." /local size piece-size count mk1 mk2 res fill-val add-fill-val ][ either all [block? dlm parse dlm [some integer!]] [ map-each len dlm [ either positive? len [ copy/part series series: skip series len ] [ series: skip series negate len ; return unset so that nothing is added to output () ] ] ][ size: dlm ; alias for readability res: collect [ parse/all series case [ all [integer? size into] [ if size < 1 [cause-error 'Script 'invalid-arg size] count: size - 1 piece-size: to integer! round/down divide length? series size if zero? piece-size [piece-size: 1] [ count [copy series piece-size skip (keep/only series)] copy series to end (keep/only series) ] ] integer? dlm [ if size < 1 [cause-error 'Script 'invalid-arg size] [any [copy series 1 size skip (keep/only series)]] ] 'else [ ; = any [bitset? dlm any-string? dlm char? dlm] [any [mk1: some [mk2: dlm break | skip] (keep/only copy/part mk1 mk2)]] ] ] ] ;-- Special processing, to handle cases where the spec'd more items in ; /into than the series contains (so we want to append empty items), ; or where the dlm was a char/string/charset and it was the last char ; (so we want to append an empty field that the above rule misses). fill-val: does [copy either any-block? series [[]] [""]] add-fill-val: does [append/only res fill-val] case [ all [integer? size into] [ ; If the result is too short, i.e., less items than 'size, add ; empty items to fill it to 'size. ; We loop here, because insert/dup doesn't copy the value inserted. if size > length? res [ loop (size - length? res) [add-fill-val] ] ] ; integer? dlm [ ; ] 'else [ ; = any [bitset? dlm any-string? dlm char? dlm] ; If the last thing in the series is a delimiter, there is an ; implied empty field after it, which we add here. case [ bitset? dlm [ ; ATTEMPT is here because LAST will return NONE for an ; empty series, and finding none in a bitest is not allowed. if attempt [find dlm last series] [add-fill-val] ] char? dlm [ if dlm = last series [add-fill-val] ] string? dlm [ if all [ find series dlm empty? find/last/tail series dlm ] [add-fill-val] ] ] ] ] res ] ] | |
; Old design for negative skip vals ;test [split [1 2 3 4 5 6] [3 2 2 -2 2 -4 3]] [[1 2 3] [4 5] [6] [5 6] [3 4 5]] ; New design for negative skip vals test [split [1 2 3 4 5 6] [2 -2 2]] [[1 2] [5 6]] | |
Steeve 18-Jul-2011 [9205x4] | Seems you wrecked the behavior when a parse rule is fulfilled. [split] should keep the matched parts, you do the contrary (exclusion), why this change ?. |
Ok, I see now you turned it back to the primary behavior. But it should be discussed at first. I vote for the include behavior. | |
It makes sense because whatever new junk sequences are added in the source, the macthing process will continue to collect the expected tokens. | |
It makes sense because whatever new junk sequences are added in the source, the macthing process will continue to collect the expected tokens. | |
Gregg 18-Jul-2011 [9209] | Could you provide examples of what you mean? The original design was flexible, but perhaps not as useful. I understand why it was changed, and think it's better for general use. |
Steeve 18-Jul-2011 [9210x2] | Well, I just read the code. You replaced this: [any [mk1: some [mk2: dlm break | skip] (emit copy/part mk1 mk2)]] by this: [any [mk1: [to dlm mk2: dlm | to end mk2:] (keep copy/part mk1 mk2)]] In the first case: the rule is used to extract the matching sequences In the second case, the rule is used to exclude the matching sequences. |
Sorry, In fact it's the contrary (swap the 2 cases) | |
Gregg 18-Jul-2011 [9212] | OK. I'm not vested in the implementation, just the results. Feel free to improve things and make it more elegant. As long as the tests all pass, or we agree on behavior changes, I don't have a problem. |
Steeve 18-Jul-2011 [9213] | Well...Seems I don't know how to make it clear, as usual :-) It's you who want to change the behavior. But it's Ok I guess, since no one else complained :-) |
Gregg 18-Jul-2011 [9214x2] | Hmmm, I thought I reverted to the current behavior (which was not the original behavior), aside from bug fixes. |
If you could use a test case to explain, I might get it. I'm slow today. | |
Steeve 18-Jul-2011 [9216] | To me the current behavior is the one I have in the current code of R3 |
Gregg 18-Jul-2011 [9217] | And what behavior did I change (as a test case)? |
Steeve 18-Jul-2011 [9218] | oK wait a little, I will do my best ;-) |
Gregg 18-Jul-2011 [9219] | Maybe you can find one on http://www.rebol.com/r3/docs/functions/split.html that shows it? |
Steeve 18-Jul-2011 [9220x4] | current behavior: split "-a-a'" ["a"] >> ["a" "a"] yours: split "-a-a" ["a"] >> ["-" "-"] |
not tested though, I just read the code | |
hmmm, Seems It's me who is totaly wreckled | |
Ok forget my big mouth, if you can | |
Gregg 18-Jul-2011 [9224x2] | So, you're saying that you want to specify a delimiter, and have it keep that? In any case, that's not the current behavior: >> split "-a-a'" ["a"] == ["-" "-" "'" ""] Here's mine: >> split "-a-a'" ["a"] == ["-" "-" "'"] |
So, my final version above seems OK then? | |
Steeve 18-Jul-2011 [9226] | Yeah, I just taken my pills, I'm fine now |
Gregg 18-Jul-2011 [9227x3] | LOL. :-) |
I don't know the current system for submitting patches to R3. Once more people sign off on it, maybe BrianH will show up and see if we can get it in there. | |
Thanks for taking a look at it Steeve. Pills or not. | |
Steeve 18-Jul-2011 [9230x3] | I will probably rewrite it completly before though |
Gezz, I wanted tp say HE not I | |
*wanted to say He (Brian) | |
older newer | first last |