r3wp [groups: 83 posts: 189283]
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

World: r3wp

[!REBOL3]

Gregg
18-Jul-2011
[9215]
If you could use a test case to explain, I might get it. I'm slow 
today.
Steeve
18-Jul-2011
[9216]
To me the current behavior is the one I have in the current code 
of R3
Gregg
18-Jul-2011
[9217]
And what behavior did I change (as a test case)?
Steeve
18-Jul-2011
[9218]
oK wait a little, I will do my best ;-)
Gregg
18-Jul-2011
[9219]
Maybe you can find one on http://www.rebol.com/r3/docs/functions/split.html
that shows it?
Steeve
18-Jul-2011
[9220x4]
current behavior:
split "-a-a'" ["a"]
>> ["a" "a"]

yours:
split "-a-a" ["a"]
>> ["-" "-"]
not tested though, I just read the code
hmmm, Seems It's me who is totaly wreckled
Ok forget my big mouth, if you can
Gregg
18-Jul-2011
[9224x2]
So, you're saying that you want to specify a delimiter, and have 
it keep that? In any case, that's not the current behavior:

>> split "-a-a'" ["a"]
== ["-" "-" "'" ""]

Here's mine:

>> split "-a-a'" ["a"]
== ["-" "-" "'"]
So, my final version above seems OK then?
Steeve
18-Jul-2011
[9226]
Yeah, I just taken my pills, I'm fine now
Gregg
18-Jul-2011
[9227x3]
LOL. :-)
I don't know the current system for submitting patches to R3. Once 
more people sign off on it, maybe BrianH will show up and see if 
we can get it in there.
Thanks for taking a look at it Steeve. Pills or not.
Steeve
18-Jul-2011
[9230x4]
I will probably rewrite it completly before though
Gezz, I wanted tp say HE not I
*wanted to say He (Brian)
Ok, time to go bed, it seems
sqlab
19-Jul-2011
[9234]
It seems the parse behaviour changed somewhere a few times..
Nevertheless I think split is overloaded and overcomplicated.

The parse rules should better go in a PARSE Common Patterns library, 
that is included with Rebol,
not unlike http://www.rebol.com/article/0508.html
Gregg
19-Jul-2011
[9235]
How would you redesign it?
BrianH
19-Jul-2011
[9236x4]
Well, for one thing, complex, flexible mezzanine functions tend to 
be slowed down by the conditional code that determines the actual 
behavior desired in a particular case. There are real advantages 
to seperating a complex function into multiple smaller, simpler functions. 
This makes it so the choices about which set of behavior to use are 
made by the programmer at development time instead of by the interpreter 
at runtime. SPLIT is a really large function that does many different 
things, so it's a good candidate for such a function split.
I liked Carl's idea of a SPLIT function that takes a series and returns 
the series from the head to the offset, and from the offset to the 
end. Like this:

    split: func [series [series!]] [reduce [copy/part head series series 
    copy series]]

At most, add a option to control the copying. Then have a seperate 
function split on a delimeter, another split into a number of parts, 
etc.
There are some mezzanine functions that have to be large and complex 
for other reasons. For instance, a couple of the LOAD subfunctions 
need to have functionality bundled together for security purposes. 
This doesn't seem to be the case with SPLIT.
It's one of the ironies of R3 that for a language that touts its 
ability to create user-designed dialects, inside the R3 mezzanine 
code, dialected functions are often too slow to be efficient enough 
for inclusion. This is why most of the builtin dialects are implemented 
in native code, through natives or commands. A dialect needs to be 
efficient enough to merit its use as opposed to the procedural equivalent, 
and easy enough to comprehend that the users of the dialect are likely 
to use it, rather than a simpler alternative. Developers' minds have 
overhead too.
Gregg
19-Jul-2011
[9240]
If optimization is the goal, we can certainly write specialized funcs. 
I have a lot of them myself.

How much too slow is the current SPLIT and in what contexts?


This SPLIT is intended to be general, like ROUND. If you need to 
round something, HELP ROUND gives you all the options, rather than 
having CEIL, FLOOR, TRUNC, etc. There was a long discussion about 
that when it was designed. The goal is to reduce the cognitive overhead. 


If people think this functionality is not helpful, and all we need 
is SPLIT = [first rest], then that's all we need. If so, please give 
it a more precise name.
BrianH
19-Jul-2011
[9241]
SPLIT has enough cognitive overhead that I've never understood what 
it was supposed to do, and thus never used it. A sign?
Gregg
19-Jul-2011
[9242]
How could this be made clearer then?


Split a series into pieces; fixed or variable size, fixed number, 
or at delimiters

Did you ever look at the docs for it?

http://www.rebol.com/r3/docs/functions/split.html
Maxim
19-Jul-2011
[9243]
the current SPLIT might be better renamed as Tokenize.
BrianH
19-Jul-2011
[9244]
But that might be because I've been mostly writing mezzanine code 
in R3, which doesn't allow the use of functions that complex, even 
though that's where they're implemented. For my user code, SPLIT 
was too confusing for me to remember to use, even when it would have 
been an advantage. As a counterexample, COLLECT also has too much 
overhead to use in mezzanine code, but I use it in user code all 
the time.
Gregg
19-Jul-2011
[9245x2]
Except that it could also rightly be called GROUP, CHUNK, or SEGMENT 
Max.
i.e. you're not looking for token separators in all cases.
BrianH
19-Jul-2011
[9247]
SPLIT is a good name to use for something, but using English synonyms 
for alternate splitting functions won't work because developers won't 
remember which English synonym means which REBOL completely different 
function.
Gregg
19-Jul-2011
[9248x2]
I don't follow. How is SPLIT confiusing in that regard?
It could be called SPLIT-SERIES I suppose, but I don't think it helps.
BrianH
19-Jul-2011
[9250]
SPLIT isn't, but calling the alternates CHUNK or SEGMENT might be.
Gregg
19-Jul-2011
[9251x2]
YES! Which is why they all got rolled into SPLIT.
They are all special cases of splitting.
BrianH
19-Jul-2011
[9253]
Like most developers, I don't read the docs for a function unless 
it is complex enough to need docs, and powerful enough to make it 
worth the time to do so. PARSE is an example of a function that deserves 
docs beyond the doc strings, or maybe the source. SPLIT should be 
more like FIND, understandable without reading a web page. Requiring 
otherwise is a design failure. I couldn't even understand SPLIT's 
rationale from its own source code.
Gregg
19-Jul-2011
[9254]
So, you won't read the docs, but you'll read the source. ;-)
BrianH
19-Jul-2011
[9255]
Yup, because you can do that from the console, with no internet access.
Gregg
19-Jul-2011
[9256x2]
I would argue that FIND is far more confusing than SPLIT.
It's a good topic: how do we learn what a function does?
BrianH
19-Jul-2011
[9258]
FIND is more complex than split, but its options are more understandable 
because it isn't dialected beyond its refinements, so you can read 
its docs with HELP. But note that FIND is so complex that it would 
need to be native for that reason alone, let alone the overhead of 
the actual finding.
Gregg
19-Jul-2011
[9259]
Native or mezz implementation is irrelevant.
Kaj
19-Jul-2011
[9260]
Hm, I usually don't use a function until I've read the docs, because 
otherwise I'd have no idea how to use it
BrianH
19-Jul-2011
[9261]
I usually don't consider whole applications to be finished until 
they can be used without reading the docs, let alone simple functions. 
But that's just me.
Gregg
19-Jul-2011
[9262]
I will have to hire you. I often embed docs right on the main screen, 
even for single-screen apps. Maybe I'm thinking different kinds of 
apps though. What sort of apps are you talking about?
Kaj
19-Jul-2011
[9263]
Apps usually fail to act like I would expect them to, so I have to 
read the manual to find out the limitations
Gregg
19-Jul-2011
[9264]
FIND is more complex than split, but its options are more understandable 
because it isn't dialected beyond its refinements,


I disagree. FINDs refinements interact with each other and change 
the behavior of the function, sometimes in unpredictable ways. SPLIT 
uses the power of datatypes to control behavior, with only one refinement 
as an exception.