World: r3wp
[!REBOL3]
older newer | first last |
Steeve 18-Jul-2011 [9207x2] | It makes sense because whatever new junk sequences are added in the source, the macthing process will continue to collect the expected tokens. |
It makes sense because whatever new junk sequences are added in the source, the macthing process will continue to collect the expected tokens. | |
Gregg 18-Jul-2011 [9209] | Could you provide examples of what you mean? The original design was flexible, but perhaps not as useful. I understand why it was changed, and think it's better for general use. |
Steeve 18-Jul-2011 [9210x2] | Well, I just read the code. You replaced this: [any [mk1: some [mk2: dlm break | skip] (emit copy/part mk1 mk2)]] by this: [any [mk1: [to dlm mk2: dlm | to end mk2:] (keep copy/part mk1 mk2)]] In the first case: the rule is used to extract the matching sequences In the second case, the rule is used to exclude the matching sequences. |
Sorry, In fact it's the contrary (swap the 2 cases) | |
Gregg 18-Jul-2011 [9212] | OK. I'm not vested in the implementation, just the results. Feel free to improve things and make it more elegant. As long as the tests all pass, or we agree on behavior changes, I don't have a problem. |
Steeve 18-Jul-2011 [9213] | Well...Seems I don't know how to make it clear, as usual :-) It's you who want to change the behavior. But it's Ok I guess, since no one else complained :-) |
Gregg 18-Jul-2011 [9214x2] | Hmmm, I thought I reverted to the current behavior (which was not the original behavior), aside from bug fixes. |
If you could use a test case to explain, I might get it. I'm slow today. | |
Steeve 18-Jul-2011 [9216] | To me the current behavior is the one I have in the current code of R3 |
Gregg 18-Jul-2011 [9217] | And what behavior did I change (as a test case)? |
Steeve 18-Jul-2011 [9218] | oK wait a little, I will do my best ;-) |
Gregg 18-Jul-2011 [9219] | Maybe you can find one on http://www.rebol.com/r3/docs/functions/split.html that shows it? |
Steeve 18-Jul-2011 [9220x4] | current behavior: split "-a-a'" ["a"] >> ["a" "a"] yours: split "-a-a" ["a"] >> ["-" "-"] |
not tested though, I just read the code | |
hmmm, Seems It's me who is totaly wreckled | |
Ok forget my big mouth, if you can | |
Gregg 18-Jul-2011 [9224x2] | So, you're saying that you want to specify a delimiter, and have it keep that? In any case, that's not the current behavior: >> split "-a-a'" ["a"] == ["-" "-" "'" ""] Here's mine: >> split "-a-a'" ["a"] == ["-" "-" "'"] |
So, my final version above seems OK then? | |
Steeve 18-Jul-2011 [9226] | Yeah, I just taken my pills, I'm fine now |
Gregg 18-Jul-2011 [9227x3] | LOL. :-) |
I don't know the current system for submitting patches to R3. Once more people sign off on it, maybe BrianH will show up and see if we can get it in there. | |
Thanks for taking a look at it Steeve. Pills or not. | |
Steeve 18-Jul-2011 [9230x4] | I will probably rewrite it completly before though |
Gezz, I wanted tp say HE not I | |
*wanted to say He (Brian) | |
Ok, time to go bed, it seems | |
sqlab 19-Jul-2011 [9234] | It seems the parse behaviour changed somewhere a few times.. Nevertheless I think split is overloaded and overcomplicated. The parse rules should better go in a PARSE Common Patterns library, that is included with Rebol, not unlike http://www.rebol.com/article/0508.html |
Gregg 19-Jul-2011 [9235] | How would you redesign it? |
BrianH 19-Jul-2011 [9236x4] | Well, for one thing, complex, flexible mezzanine functions tend to be slowed down by the conditional code that determines the actual behavior desired in a particular case. There are real advantages to seperating a complex function into multiple smaller, simpler functions. This makes it so the choices about which set of behavior to use are made by the programmer at development time instead of by the interpreter at runtime. SPLIT is a really large function that does many different things, so it's a good candidate for such a function split. |
I liked Carl's idea of a SPLIT function that takes a series and returns the series from the head to the offset, and from the offset to the end. Like this: split: func [series [series!]] [reduce [copy/part head series series copy series]] At most, add a option to control the copying. Then have a seperate function split on a delimeter, another split into a number of parts, etc. | |
There are some mezzanine functions that have to be large and complex for other reasons. For instance, a couple of the LOAD subfunctions need to have functionality bundled together for security purposes. This doesn't seem to be the case with SPLIT. | |
It's one of the ironies of R3 that for a language that touts its ability to create user-designed dialects, inside the R3 mezzanine code, dialected functions are often too slow to be efficient enough for inclusion. This is why most of the builtin dialects are implemented in native code, through natives or commands. A dialect needs to be efficient enough to merit its use as opposed to the procedural equivalent, and easy enough to comprehend that the users of the dialect are likely to use it, rather than a simpler alternative. Developers' minds have overhead too. | |
Gregg 19-Jul-2011 [9240] | If optimization is the goal, we can certainly write specialized funcs. I have a lot of them myself. How much too slow is the current SPLIT and in what contexts? This SPLIT is intended to be general, like ROUND. If you need to round something, HELP ROUND gives you all the options, rather than having CEIL, FLOOR, TRUNC, etc. There was a long discussion about that when it was designed. The goal is to reduce the cognitive overhead. If people think this functionality is not helpful, and all we need is SPLIT = [first rest], then that's all we need. If so, please give it a more precise name. |
BrianH 19-Jul-2011 [9241] | SPLIT has enough cognitive overhead that I've never understood what it was supposed to do, and thus never used it. A sign? |
Gregg 19-Jul-2011 [9242] | How could this be made clearer then? Split a series into pieces; fixed or variable size, fixed number, or at delimiters Did you ever look at the docs for it? http://www.rebol.com/r3/docs/functions/split.html |
Maxim 19-Jul-2011 [9243] | the current SPLIT might be better renamed as Tokenize. |
BrianH 19-Jul-2011 [9244] | But that might be because I've been mostly writing mezzanine code in R3, which doesn't allow the use of functions that complex, even though that's where they're implemented. For my user code, SPLIT was too confusing for me to remember to use, even when it would have been an advantage. As a counterexample, COLLECT also has too much overhead to use in mezzanine code, but I use it in user code all the time. |
Gregg 19-Jul-2011 [9245x2] | Except that it could also rightly be called GROUP, CHUNK, or SEGMENT Max. |
i.e. you're not looking for token separators in all cases. | |
BrianH 19-Jul-2011 [9247] | SPLIT is a good name to use for something, but using English synonyms for alternate splitting functions won't work because developers won't remember which English synonym means which REBOL completely different function. |
Gregg 19-Jul-2011 [9248x2] | I don't follow. How is SPLIT confiusing in that regard? |
It could be called SPLIT-SERIES I suppose, but I don't think it helps. | |
BrianH 19-Jul-2011 [9250] | SPLIT isn't, but calling the alternates CHUNK or SEGMENT might be. |
Gregg 19-Jul-2011 [9251x2] | YES! Which is why they all got rolled into SPLIT. |
They are all special cases of splitting. | |
BrianH 19-Jul-2011 [9253] | Like most developers, I don't read the docs for a function unless it is complex enough to need docs, and powerful enough to make it worth the time to do so. PARSE is an example of a function that deserves docs beyond the doc strings, or maybe the source. SPLIT should be more like FIND, understandable without reading a web page. Requiring otherwise is a design failure. I couldn't even understand SPLIT's rationale from its own source code. |
Gregg 19-Jul-2011 [9254] | So, you won't read the docs, but you'll read the source. ;-) |
BrianH 19-Jul-2011 [9255] | Yup, because you can do that from the console, with no internet access. |
Gregg 19-Jul-2011 [9256] | I would argue that FIND is far more confusing than SPLIT. |
older newer | first last |