r3wp [groups: 83 posts: 189283]
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

World: r3wp

[Core] Discuss core issues

BrianH
23-Feb-2010
[15885]
Graham, we have set functions that work with the block! type, we 
don't need a set! type. There is a budget for built-in types - we 
can only have a limited number of them. There has to be a major need 
that can't be handled easily enough otherwise for a new built-in 
datatype to be added. A set! type might merit a user-defined datatype, 
or a spoofed datatype like the ones I added to 2.7.7+, but it's not 
an extreme enough difference from the behavior of block! with set 
functions to merit a full datatype.
Graham
23-Feb-2010
[15886]
Seems a bit of overhead if you're trying to maintain a set if you 
have to check each time you add something if it already exists or 
not.
BrianH
23-Feb-2010
[15887]
Which is why I want an INCLUDE function, the opposite of EXCLUDE.
Steeve
23-Feb-2010
[15888]
bitsets are exactly what you need graham (only working with sets 
of integers or chars, though)
BrianH
23-Feb-2010
[15889]
Maps work well for that too, using #[true] as the value or #[none] 
for not there. Works great, works with FOREACH.
Steeve
23-Feb-2010
[15890]
but heavy :)
BrianH
23-Feb-2010
[15891x2]
For other types than integers or chars, it's really not that heavy. 
The space to store one extra value slot each is nothing incomparison 
to the overhead of bitsets for sparse data with poor locality.
Remember, bitsets include the zeros, maps don't necessarily include 
the nones.
Steeve
23-Feb-2010
[15893x2]
true, but bitsets have  lightning speed in comparison.
there is nothing faster
BrianH
23-Feb-2010
[15895x2]
Only for integers and chars. And can be hundreds of megabytes in 
RAM for some datasets.
It's best to pick one based on your data.
Steeve
23-Feb-2010
[15897]
sure
BrianH
23-Feb-2010
[15898]
On the other hand, if your data has good locality you can do a bitset 
minus a base value and it can still be small. Say if you are doing 
the numbers from 100000 to 100100 just subtract 100000 first.
Steeve
23-Feb-2010
[15899x2]
i think we can combine the 2 technics to gether map! + bitset!
map! as a primary index , and bitsets for linked indexes when data 
have good localities
BrianH
23-Feb-2010
[15901x2]
Yup. Not a bad project, and it should even be fast in mezzanine code. 
Maybe as a user-defined index type if you like, though a set of related 
functions would do.
Though sparse-bitset! would be a more accurate name due to the integer/char 
restriction for contents.
Steeve
23-Feb-2010
[15903x5]
i remember having an algo done with R2 to simulate hash! using bitsets 
and blocks
i should search for...
Found ...

f: fast-dic: context [
	size: 100000

 hash: 128 - 1	;** hash size speed up the search, must be a power 
 of 2 - 1 (ie. 15, 31, 63, 127, 257 ...)
	master: copy/deep head insert/dup/only [] [] hash + 1
	index: make bitset! size
	flag: func [idx [integer!]][
		unless find index idx [
			insert index idx

   insert/only insert tail pick master idx and hash + 1 idx copy []
		]
	]
	flag?: func [idx [integer!]][find index idx]
	deflag: func [idx [integer!]][
		remove/part index idx
		remove/part find pick master idx and hash + 1 idx 2
	]
]
.Hum...  it was not finished...
Obviously it was done to speed up the search
BrianH
23-Feb-2010
[15908x2]
A sparse index lookup with map!/bitset! in R3 would just be: find 
select index to-integer x / period x
You could wrap the structure in accessors that would do the work. 
It's doable.
Pavel
24-Feb-2010
[15910x3]
I recomend to use compressed bitmap EWAH scheme, in worst case tradeof 
is one 32bit word, in sparse bitmap it will save huge amount of space, 
AND,OR,XOR algorithms described for those bitmaps, usually used (but 
not restricted to) as DB bitmap index
Bitmap = bitset in this case
Nothing to do with images
Graham
24-Feb-2010
[15913]
IIS question http://synapse-ehr.com/forums/showthread.php?28-REBOL-on-Microsoft-Web-Server-2008
Maxim
26-Feb-2010
[15914x2]
by just making your own INCLUDE function, while one is added to the 
default REBOL language, I find its pretty easy to manage sets.


Use all the  Data Set functions and remove-each.   combining sets, 
excluding sets, etc.. its all there.
I also often use a FILTER function which is just a wrapper around 
remove-each which does a copy on the source input and adds a NOT 
to the comparison block you provide.


this means you keep data instead of removing it and don't break the 
original series.
Steeve
26-Feb-2010
[15916x3]
a keep-each
a keep-each
Actually, it's so easy to add a copy where you need, that i think 
all the  serie's constructing native functions should update the 
input by default, rather than construncting new ones.
Think About UNION, UNIQUE, INTERSECT etc...

They are handy functions but involve too much overhead on big series.
Maxim
26-Feb-2010
[15919x2]
I would love the /INTO refinement be added to all series manipulators.

This way we can make a huge buffer and reuse it all the time.
much less work for the GC and big speed gains on large series.
Steeve
26-Feb-2010
[15921]
UNIQUE is not used because of this.

I can't remember how many times i wanted an handy way to add unique 
values in an existing serie.
like 
>> unique index [new-values ...]
instead of having such, we do dirty tricks like.
>> unless find index new-value [append index new-value]  
pretty common...
Graham
26-Feb-2010
[15922x2]
didn't I suggest we have a set! datatype?
and yes, it's common to want to maintain a set ...
Maxim
26-Feb-2010
[15924x2]
steeve, that is what BrianH and I are talking about with the 'INCLUDE 
function.
basically, ALTER should disapear and be replaced with 'INCLUDE
Steeve
26-Feb-2010
[15926]
ALTER has no use except for Carl
Maxim
26-Feb-2010
[15927]
It should be renamed 'TOGGLE, and then I might use it, cause I'd 
understand what it means and think about it.
Gregg
27-Feb-2010
[15928]
INCLUDE isn't a good name for it, though, because of conflicting 
with the much more common INCLUDE for dependencies.
Izkata
27-Feb-2010
[15929]
If the set functions are made in-place, that would just be 'union, 
wouldn't it?
Steeve
27-Feb-2010
[15930]
Right, i was talking about UNION, not UNIQUE.
BrianH
27-Feb-2010
[15931]
Steeve, you are presuming that Carl has a use for ALTER - I haven't 
seen him use it yet.
Steeve
27-Feb-2010
[15932]
Is that not used somewhere in VID ?
BrianH
27-Feb-2010
[15933]
Not that I've noticed, but I seem to recall that VID flags is what 
ALTER was originally for.
Steeve
27-Feb-2010
[15934]
If that's so, Who's  asked for that clumsy function :)