World: r3wp

Join the discussions in the REBOL3 world...

[!REBOL3]

older newer	first last
Steeve 18-Jul-2011 [9230x4]	I will probably rewrite it completly before though
	Gezz, I wanted tp say HE not I
	*wanted to say He (Brian)
	Ok, time to go bed, it seems
sqlab 19-Jul-2011 [9234]	It seems the parse behaviour changed somewhere a few times.. Nevertheless I think split is overloaded and overcomplicated. The parse rules should better go in a PARSE Common Patterns library, that is included with Rebol, not unlike http://www.rebol.com/article/0508.html
Gregg 19-Jul-2011 [9235]	How would you redesign it?
BrianH 19-Jul-2011 [9236x4]	Well, for one thing, complex, flexible mezzanine functions tend to be slowed down by the conditional code that determines the actual behavior desired in a particular case. There are real advantages to seperating a complex function into multiple smaller, simpler functions. This makes it so the choices about which set of behavior to use are made by the programmer at development time instead of by the interpreter at runtime. SPLIT is a really large function that does many different things, so it's a good candidate for such a function split.
	I liked Carl's idea of a SPLIT function that takes a series and returns the series from the head to the offset, and from the offset to the end. Like this: split: func [series [series!]] [reduce [copy/part head series series copy series]] At most, add a option to control the copying. Then have a seperate function split on a delimeter, another split into a number of parts, etc.
	There are some mezzanine functions that have to be large and complex for other reasons. For instance, a couple of the LOAD subfunctions need to have functionality bundled together for security purposes. This doesn't seem to be the case with SPLIT.
	It's one of the ironies of R3 that for a language that touts its ability to create user-designed dialects, inside the R3 mezzanine code, dialected functions are often too slow to be efficient enough for inclusion. This is why most of the builtin dialects are implemented in native code, through natives or commands. A dialect needs to be efficient enough to merit its use as opposed to the procedural equivalent, and easy enough to comprehend that the users of the dialect are likely to use it, rather than a simpler alternative. Developers' minds have overhead too.
Gregg 19-Jul-2011 [9240]	If optimization is the goal, we can certainly write specialized funcs. I have a lot of them myself. How much too slow is the current SPLIT and in what contexts? This SPLIT is intended to be general, like ROUND. If you need to round something, HELP ROUND gives you all the options, rather than having CEIL, FLOOR, TRUNC, etc. There was a long discussion about that when it was designed. The goal is to reduce the cognitive overhead. If people think this functionality is not helpful, and all we need is SPLIT = [first rest], then that's all we need. If so, please give it a more precise name.
BrianH 19-Jul-2011 [9241]	SPLIT has enough cognitive overhead that I've never understood what it was supposed to do, and thus never used it. A sign?
Gregg 19-Jul-2011 [9242]	How could this be made clearer then? Split a series into pieces; fixed or variable size, fixed number, or at delimiters Did you ever look at the docs for it? http://www.rebol.com/r3/docs/functions/split.html
Maxim 19-Jul-2011 [9243]	the current SPLIT might be better renamed as Tokenize.
BrianH 19-Jul-2011 [9244]	But that might be because I've been mostly writing mezzanine code in R3, which doesn't allow the use of functions that complex, even though that's where they're implemented. For my user code, SPLIT was too confusing for me to remember to use, even when it would have been an advantage. As a counterexample, COLLECT also has too much overhead to use in mezzanine code, but I use it in user code all the time.
Gregg 19-Jul-2011 [9245x2]	Except that it could also rightly be called GROUP, CHUNK, or SEGMENT Max.
Gregg 19-Jul-2011 [9245x2]	i.e. you're not looking for token separators in all cases.
BrianH 19-Jul-2011 [9247]	SPLIT is a good name to use for something, but using English synonyms for alternate splitting functions won't work because developers won't remember which English synonym means which REBOL completely different function.
Gregg 19-Jul-2011 [9248x2]	I don't follow. How is SPLIT confiusing in that regard?
Gregg 19-Jul-2011 [9248x2]	It could be called SPLIT-SERIES I suppose, but I don't think it helps.
BrianH 19-Jul-2011 [9250]	SPLIT isn't, but calling the alternates CHUNK or SEGMENT might be.
Gregg 19-Jul-2011 [9251x2]	YES! Which is why they all got rolled into SPLIT.
Gregg 19-Jul-2011 [9251x2]	They are all special cases of splitting.
BrianH 19-Jul-2011 [9253]	Like most developers, I don't read the docs for a function unless it is complex enough to need docs, and powerful enough to make it worth the time to do so. PARSE is an example of a function that deserves docs beyond the doc strings, or maybe the source. SPLIT should be more like FIND, understandable without reading a web page. Requiring otherwise is a design failure. I couldn't even understand SPLIT's rationale from its own source code.
Gregg 19-Jul-2011 [9254]	So, you won't read the docs, but you'll read the source. ;-)
BrianH 19-Jul-2011 [9255]	Yup, because you can do that from the console, with no internet access.
Gregg 19-Jul-2011 [9256x2]	I would argue that FIND is far more confusing than SPLIT.
Gregg 19-Jul-2011 [9256x2]	It's a good topic: how do we learn what a function does?
BrianH 19-Jul-2011 [9258]	FIND is more complex than split, but its options are more understandable because it isn't dialected beyond its refinements, so you can read its docs with HELP. But note that FIND is so complex that it would need to be native for that reason alone, let alone the overhead of the actual finding.
Gregg 19-Jul-2011 [9259]	Native or mezz implementation is irrelevant.
Kaj 19-Jul-2011 [9260]	Hm, I usually don't use a function until I've read the docs, because otherwise I'd have no idea how to use it
BrianH 19-Jul-2011 [9261]	I usually don't consider whole applications to be finished until they can be used without reading the docs, let alone simple functions. But that's just me.
Gregg 19-Jul-2011 [9262]	I will have to hire you. I often embed docs right on the main screen, even for single-screen apps. Maybe I'm thinking different kinds of apps though. What sort of apps are you talking about?
Kaj 19-Jul-2011 [9263]	Apps usually fail to act like I would expect them to, so I have to read the manual to find out the limitations
Gregg 19-Jul-2011 [9264]	FIND is more complex than split, but its options are more understandable because it isn't dialected beyond its refinements, I disagree. FINDs refinements interact with each other and change the behavior of the function, sometimes in unpredictable ways. SPLIT uses the power of datatypes to control behavior, with only one refinement as an exception.
BrianH 19-Jul-2011 [9265x2]	Docs in the main screen are part of the app, though if they're too complex they can slow down the usage of the app. But my work has mostly been business apps. I often will make an app suite rather than a single complex app, just to make them easier to use. Every hour spent making the app easier to use saves you hundreds of hours of training. It's worse for consumer apps with competitors, because being too hard to use will lose customers.
BrianH 19-Jul-2011 [9265x2]	Native or mezz implementation is irrelevant in R3, agreed, whichever is more efficient. But the rationale of SPLIT's dialect seems confusing, particularly as it relates to negative numbers. Any reason keywords weren't used instead?
Gregg 19-Jul-2011 [9267x6]	I understand the reasoning, and agree completely. I do app suites as well, but the main app is then usually a "control center" with built in docs.
	The original design used negative numbers to skip backwards. I don't think I changed the design, but I understand the reasoning behind how they work now. Keywords were probably not used as it would have complicated the dialect. Well, it would have made it a dialect which is really isn't today.
	Carl may have done that. He likes that sort of thing.
	This brings up a good point though. Do we consider a function dialected if the behavior is controlled by datatypes? SPLIT does have the case of a block of integers having more fine-grained control, so there is a dialect option there that could easily be expanded.
	I have to run, an will be offline for a week. I would LOVE to see alternate implementations and designs for comparison. We can talk a lot, but if we can compare two options side by side, it often makes it easier to say which one you like better, rather than discussing costs and benefits in the abstract.
	Great chat Brian. Stimulating as always. Thanks!
BrianH 19-Jul-2011 [9273x3]	The non-dialected behaviors seem simple enough (for the purposes of discussion I've read the docs). The problem is in the dialect, especially these: - "Negative values can be used to skip in the series without returning that part:" Why not use a 'skip keyword for that? - "Note that for greater control, you can use simple parse rules:" Which ones? It really is a dialect, but the language is not confusing (first case) and not well defined (second case). Using keywords would make the dialect easier to understand (and thus use), and potentially more efficient to implement using command dispatch.
	is not confusing -> is confusing
	There is a conceptual conflict between the treatment of splitting into parts by length and splitting by delimiter, that has the effect of limiting both sets of behavior. It would be better to put the delimiter splitting into a separate function called DELIMIT. This would allow the dialected variants of SPLIT and DELIMIT to develop separately without conflict, and make the SPLIT dialect easier to understand. Then you would have two relatively simple functions with a clear distinction between them.
Steeve 20-Jul-2011 [9276]	So, You want to split split
BrianH 20-Jul-2011 [9277]	Yup. I'll try to mock something up later this week.
Cyphre 22-Jul-2011 [9278]	BTW doesn't the current SPLIT have a bug? >> split "1,2,333,4444,5555" #"," == ["1" "2" "333" "4444" "5555" ""] Note the last empty string in the result.
Kaj 22-Jul-2011 [9279]	Yeah, that's annoying. I vaguely remember finding out that it was designed this way
older newer	first last