r3wp [groups: 83 posts: 189283]
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

World: r3wp

[!REBOL3 Proposals] For discussion of feature proposals

BrianH
28-Jan-2011
[890]
And it's order safe.
Maxim
28-Jan-2011
[891x3]
and if the function does this internally in C it will still be MUCH 
faster  which is why I'd much prefer having refinements for in-place 
functioning of all set functions.
yeah, that implementation is pretty neat.
using refinements also means less functions to remember... there 
are quite a few set functions.  I don't want to have each one duplicated 
as a mezz function.
BrianH
28-Jan-2011
[894]
It has to allocate another series anyways, as the set functions do 
hashing. In-place is just for convenience, not to save memory.
Maxim
28-Jan-2011
[895x3]
I know, but everything that's done in the C side will save on speed 
and memory, since the C doesn't have to go through the GC and all 
that.  in tight loops and real-time continual processing, these details 
make a big difference in overall smoothness of the app.
which is why its preferable to do it there anyways... and the fact 
that we only have one function name to remember for the two versions 
is also a big deal for simplicitie's sake.
I just realized that if Carl is using a hash table for some of the 
set functions... does this mean that its subject to the 24 bit hash 
problem we discovered in maps?
BrianH
28-Jan-2011
[898x3]
Yes, until that's fixed.
See also http://issue.cc/r3/1574
You could add /no-copy refinements to the set functions though. Of 
course this would switch which half of the set functions are misnamed 
:)
Maxim
28-Jan-2011
[901x2]
/no-copy would be really nice... its been a recurring discussion 
for years by many of us, which proves that its a required feature 
IMHO.
copy or not, there is no more "correct" version, they both are.
BrianH
28-Jan-2011
[903x2]
No, I mean that modifying functions should have verb names and non-modifying, 
not-for-effect functions shouldn't. So for the current set functions 
UNIQUE, DIFFERENCE and UNION have good names, but EXCLUDE should 
be called EXCLUDING and INTERSECT should be called INTERSECTING; 
this gets reversed for modifying versions :)
But that's just being silly, unless we're adding new functions that 
need names.
Maxim
28-Jan-2011
[905]
yeah.
BrianH
28-Jan-2011
[906]
Doesn't matter. The non-modifying version of APPEND is called JOIN, 
and both of those are verbs.
Maxim
28-Jan-2011
[907]
does JOIN still reduce like it did in R2?
BrianH
28-Jan-2011
[908]
Yup, so it's not a direct correspondance.
Maxim
28-Jan-2011
[909]
I always wondered why it reduced... I find that very annoying... 
many times I'd use it and It ends up mucking up my data, so I just 
almost never use it.
BrianH
28-Jan-2011
[910]
I mostly use REJOIN and AJOIN instead of JOIN, or maybe APPEND COPY.
Maxim
28-Jan-2011
[911]
me to.
Ladislav
28-Jan-2011
[912]
I am pretty sure, that:


1) the set operations in Rebol are in fact GC-safe using the standard 
meaning of the sentence

2) it is always necessary to use auxiliary data, if the wish is to 
do the set operation efficiently

3) nobody pretending to need a modifying version really needs an 
inefficient variant, which does not use any auxiliary data
Maxim
28-Jan-2011
[913]
why do you say we "pretend"?
BrianH
28-Jan-2011
[914]
I think that people who need a modifying version really need it, 
but the rest of us need the non-modifying default :)
Ladislav
28-Jan-2011
[915x2]
really need it

 - does that mean they need what I specified - an inefficient variant 
 not using any auxiliary data?
I suppose that is nonsense
BrianH
28-Jan-2011
[917]
Nope, it just means they need the first argument modified to contain 
the result instead of what it originally contained.
Maxim
28-Jan-2011
[918]
no one is saying to use anything less efficient than the copy returning 
version.
BrianH
28-Jan-2011
[919]
The only difference between the DEDUPLICATE code in the ticket and 
a native version is that the auxiliary data could be deleted immediately 
after use instead of at the next GC run.
Maxim
28-Jan-2011
[920x4]
and the data is managed directly by C, not by the interpreter, which 
is faster for sure.
which also means that some of that can possibly be optimised by the 
compiler... something that cannot happen within rebol.
its also going to use less ram, even if it does use some auxilliary 
data... since that auxilliary data is not wrapped within REBOL interpreter 
wrappers.
or at least, parts of it wont.
BrianH
28-Jan-2011
[924]
Not much less RAM. The "interpreter wrapper" is pretty much constant, 
no matter the size of the data. Remember, the data you are doing 
set operations on is REBOL data already.
Maxim
28-Jan-2011
[925x2]
yes, but the extra data used to build it as a mezz, including the 
stack frames and stuff is prevented.   


I know I'm being picky here.  but we're doing a detailed analysis.. 
 :-)
but in the end, its the usability which everyone wants, even if its 
only slightly more effective.
Ladislav
28-Jan-2011
[927]
The only difference between the DEDUPLICATE code in the ticket and 
a native version is that the auxiliary data could be deleted immediately 
after use instead of at the next GC run.
 - that would be inefficient as well
BrianH
28-Jan-2011
[928]
INSERT, CLEAR and UNIQUE are already native, so the actual time-consuming 
portions are already optimized. The only overhead you would be reducing 
by making DEDUPLICATE native is constant per function call, and freeing 
the memory immediately just takes a little pressure off the GC at 
collection time. You don't get as much benefit as adding /into to 
REDUCE and COMPOSE gave, but it might be worth adding as a /no-copy 
option, or just as useful to add as a library function.
Maxim
28-Jan-2011
[929x2]
right now the GC is very cumbersome. it waits for it to have 3-5MB 
before working. and it can take a noticeable amount of time to do 
when there is a lot of ram.  I've had it freeze for a second in some 
apps.

everything we can do to prevent memory being scanned by the GC is 
a good thing.
by 3-5MB, I mean that it will usually accumulate ~ 3-5 MB of new 
data before running.
BrianH
28-Jan-2011
[931]
Mark and sweep only scans the referenced data, not the unreferenced 
data, but adding a lot of unreferenced data makes the GC run more 
often.
Maxim
28-Jan-2011
[932]
yep.
Ladislav
28-Jan-2011
[933x2]
right now the GC is very cumbersome. it waits for it to have 3-5MB 
before working. and it can take a noticeable amount of time to do 
when there is a lot of ram.  I've had it freeze for a second in some 
apps.

 - what exactly does the GC have in common with the "Deduplicate issue"?
I demonstrated above, that, in fact, nothing.
BrianH
28-Jan-2011
[935]
But that doesn't  mean that deallocating immediately will be any 
more efficient; likely it won't.
Ladislav
28-Jan-2011
[936]
This is all just pretending, if, what is needed, is a kind of incremental/generational/whichever 
other GC variant, then no "Deduplicate" can help with that
BrianH
28-Jan-2011
[937x2]
We don't need DEDUPLICATE to help with the GC. He was suggesting 
that having it be native would help reduce the pressure on the GC 
when used for other reasons instead of a mezzanine version. I don't 
think it will by much.
He needs DEDUPLICATE for his own code. The GC also needs work, but 
that is another issue :)
Maxim
28-Jan-2011
[939]
if we implement deduplicate as a mezz, we are juggling data which 
invariably tampers the GC.  doing this native, helps to prevent the 
GC from working to hard.


the problem is not how long/fast the allocation/deallocation is... 
its the fact that cramming data for the GC to manage, will make the 
GC trigger longer/more often.