World: r3wp
[!REBOL3 Proposals] For discussion of feature proposals
older newer | first last |
BrianH 28-Jan-2011 [890] | And it's order safe. |
Maxim 28-Jan-2011 [891x3] | and if the function does this internally in C it will still be MUCH faster which is why I'd much prefer having refinements for in-place functioning of all set functions. |
yeah, that implementation is pretty neat. | |
using refinements also means less functions to remember... there are quite a few set functions. I don't want to have each one duplicated as a mezz function. | |
BrianH 28-Jan-2011 [894] | It has to allocate another series anyways, as the set functions do hashing. In-place is just for convenience, not to save memory. |
Maxim 28-Jan-2011 [895x3] | I know, but everything that's done in the C side will save on speed and memory, since the C doesn't have to go through the GC and all that. in tight loops and real-time continual processing, these details make a big difference in overall smoothness of the app. |
which is why its preferable to do it there anyways... and the fact that we only have one function name to remember for the two versions is also a big deal for simplicitie's sake. | |
I just realized that if Carl is using a hash table for some of the set functions... does this mean that its subject to the 24 bit hash problem we discovered in maps? | |
BrianH 28-Jan-2011 [898x3] | Yes, until that's fixed. |
See also http://issue.cc/r3/1574 | |
You could add /no-copy refinements to the set functions though. Of course this would switch which half of the set functions are misnamed :) | |
Maxim 28-Jan-2011 [901x2] | /no-copy would be really nice... its been a recurring discussion for years by many of us, which proves that its a required feature IMHO. |
copy or not, there is no more "correct" version, they both are. | |
BrianH 28-Jan-2011 [903x2] | No, I mean that modifying functions should have verb names and non-modifying, not-for-effect functions shouldn't. So for the current set functions UNIQUE, DIFFERENCE and UNION have good names, but EXCLUDE should be called EXCLUDING and INTERSECT should be called INTERSECTING; this gets reversed for modifying versions :) |
But that's just being silly, unless we're adding new functions that need names. | |
Maxim 28-Jan-2011 [905] | yeah. |
BrianH 28-Jan-2011 [906] | Doesn't matter. The non-modifying version of APPEND is called JOIN, and both of those are verbs. |
Maxim 28-Jan-2011 [907] | does JOIN still reduce like it did in R2? |
BrianH 28-Jan-2011 [908] | Yup, so it's not a direct correspondance. |
Maxim 28-Jan-2011 [909] | I always wondered why it reduced... I find that very annoying... many times I'd use it and It ends up mucking up my data, so I just almost never use it. |
BrianH 28-Jan-2011 [910] | I mostly use REJOIN and AJOIN instead of JOIN, or maybe APPEND COPY. |
Maxim 28-Jan-2011 [911] | me to. |
Ladislav 28-Jan-2011 [912] | I am pretty sure, that: 1) the set operations in Rebol are in fact GC-safe using the standard meaning of the sentence 2) it is always necessary to use auxiliary data, if the wish is to do the set operation efficiently 3) nobody pretending to need a modifying version really needs an inefficient variant, which does not use any auxiliary data |
Maxim 28-Jan-2011 [913] | why do you say we "pretend"? |
BrianH 28-Jan-2011 [914] | I think that people who need a modifying version really need it, but the rest of us need the non-modifying default :) |
Ladislav 28-Jan-2011 [915x2] | really need it - does that mean they need what I specified - an inefficient variant not using any auxiliary data? |
I suppose that is nonsense | |
BrianH 28-Jan-2011 [917] | Nope, it just means they need the first argument modified to contain the result instead of what it originally contained. |
Maxim 28-Jan-2011 [918] | no one is saying to use anything less efficient than the copy returning version. |
BrianH 28-Jan-2011 [919] | The only difference between the DEDUPLICATE code in the ticket and a native version is that the auxiliary data could be deleted immediately after use instead of at the next GC run. |
Maxim 28-Jan-2011 [920x4] | and the data is managed directly by C, not by the interpreter, which is faster for sure. |
which also means that some of that can possibly be optimised by the compiler... something that cannot happen within rebol. | |
its also going to use less ram, even if it does use some auxilliary data... since that auxilliary data is not wrapped within REBOL interpreter wrappers. | |
or at least, parts of it wont. | |
BrianH 28-Jan-2011 [924] | Not much less RAM. The "interpreter wrapper" is pretty much constant, no matter the size of the data. Remember, the data you are doing set operations on is REBOL data already. |
Maxim 28-Jan-2011 [925x2] | yes, but the extra data used to build it as a mezz, including the stack frames and stuff is prevented. I know I'm being picky here. but we're doing a detailed analysis.. :-) |
but in the end, its the usability which everyone wants, even if its only slightly more effective. | |
Ladislav 28-Jan-2011 [927] | The only difference between the DEDUPLICATE code in the ticket and a native version is that the auxiliary data could be deleted immediately after use instead of at the next GC run. - that would be inefficient as well |
BrianH 28-Jan-2011 [928] | INSERT, CLEAR and UNIQUE are already native, so the actual time-consuming portions are already optimized. The only overhead you would be reducing by making DEDUPLICATE native is constant per function call, and freeing the memory immediately just takes a little pressure off the GC at collection time. You don't get as much benefit as adding /into to REDUCE and COMPOSE gave, but it might be worth adding as a /no-copy option, or just as useful to add as a library function. |
Maxim 28-Jan-2011 [929x2] | right now the GC is very cumbersome. it waits for it to have 3-5MB before working. and it can take a noticeable amount of time to do when there is a lot of ram. I've had it freeze for a second in some apps. everything we can do to prevent memory being scanned by the GC is a good thing. |
by 3-5MB, I mean that it will usually accumulate ~ 3-5 MB of new data before running. | |
BrianH 28-Jan-2011 [931] | Mark and sweep only scans the referenced data, not the unreferenced data, but adding a lot of unreferenced data makes the GC run more often. |
Maxim 28-Jan-2011 [932] | yep. |
Ladislav 28-Jan-2011 [933x2] | right now the GC is very cumbersome. it waits for it to have 3-5MB before working. and it can take a noticeable amount of time to do when there is a lot of ram. I've had it freeze for a second in some apps. - what exactly does the GC have in common with the "Deduplicate issue"? |
I demonstrated above, that, in fact, nothing. | |
BrianH 28-Jan-2011 [935] | But that doesn't mean that deallocating immediately will be any more efficient; likely it won't. |
Ladislav 28-Jan-2011 [936] | This is all just pretending, if, what is needed, is a kind of incremental/generational/whichever other GC variant, then no "Deduplicate" can help with that |
BrianH 28-Jan-2011 [937x2] | We don't need DEDUPLICATE to help with the GC. He was suggesting that having it be native would help reduce the pressure on the GC when used for other reasons instead of a mezzanine version. I don't think it will by much. |
He needs DEDUPLICATE for his own code. The GC also needs work, but that is another issue :) | |
Maxim 28-Jan-2011 [939] | if we implement deduplicate as a mezz, we are juggling data which invariably tampers the GC. doing this native, helps to prevent the GC from working to hard. the problem is not how long/fast the allocation/deallocation is... its the fact that cramming data for the GC to manage, will make the GC trigger longer/more often. |
older newer | first last |