World: r3wp

Join the discussions in the REBOL3 world...

[Core] Discuss core issues

older newer	first last
Sunanda 20-May-2007 [8088]	For compact representation of large integer sets, I often uses the format: [10x4 50 76x1000] ir REBOL pairs! (each taking 1 value slot) meaning [10 11 12 13 50 76 77 78 ..... 1076] This works for me for large blocks of semi-sparse integers. There is a script for handling this format at REBOL.org -- look for rse-ids.r
Terry 20-May-2007 [8089]	Im guessing my largest number is probably around 2 million... approx
Anton 20-May-2007 [8090]	You have to know what your largest number is. Otherwise your software will break with overflow errors.
Terry 20-May-2007 [8091]	well.. say 21 bits
Anton 20-May-2007 [8092x3]	Well 24bits (3 bytes) --> 2^24 = 16777216 unique values.
	Ok, so you could pack each 21 (or 24) bits into a binary.
	No wastage.
Terry 20-May-2007 [8095]	so is that a block of binaries then?
Anton 20-May-2007 [8096]	No, a single binary.
Terry 20-May-2007 [8097]	yeah for each... but i need to group them
Anton 20-May-2007 [8098x3]	You have to write the access methods to index into your binary though.
	(So for that I recommend just using 3 bytes.)
	Ok, well you can have several binaries, why not ?
Sunanda 20-May-2007 [8101x2]	Anton, I think, is suggesting you do something like this: my-list: make binary! 25000 Then use insert or append to store 3-byte values to the binary string you have created. That way you need only 1 value slot plus the length of the binary string.
Sunanda 20-May-2007 [8101x2]	oops -- anton typed faster than me :-)
Terry 20-May-2007 [8103]	ahh ok.. ie: [a4b 4�� Ag] etc ?
Sunanda 20-May-2007 [8104]	That would be a block of binary....each binar would take a slot Anton is suggesting this sort of approach: x: make binary! 25000 >> loop 18 [insert x to-char random 255] == #{5E218289FC8B65B86A1C597C232F9E8C79} Just one binary string to hold the data.
Terry 20-May-2007 [8105x2]	is there a function to convert an integer to 3 bytes?
Terry 20-May-2007 [8105x2]	ahh.. nice
Anton 20-May-2007 [8107x4]	No, you must make the functions to convert 3-byte-int <--> integer!
	>> bin: #{000005000006000007} >> repeat index 3 [bin-int: copy/part skip bin (index - 1 * 3) 3 print [index mold bin-int to-integer bin-int]] 1 #{000005} 5 2 #{000006} 6 3 #{000007} 7
	binary! is indexed per byte, so if we use 3 bytes per value, we must multiply the index by 3 to find the correct value.
	We must do this in our functions which should allow retrieving, changing and inserting the value.
Terry 20-May-2007 [8111]	would there be a performance trade off as opposed to just a block of integers? (finding/ appending etc.)
Anton 20-May-2007 [8112x3]	Note that converting integer <-> binary is a little tricky. But we can use a struct! (re-use a shared struct for optimal performance).
	Yes, there is lost performance. Obviously, you must go via your access functions each time you store and retrieve.
	(This is addressed by Rebol3's vector! datatype, which stores different values this way.)
Terry 20-May-2007 [8115x2]	well, performance is the primary concern
Terry 20-May-2007 [8115x2]	i thought crawling a block of binary (hash?) would be faster than a block of integers
Anton 20-May-2007 [8117]	How many numbers are there likely to be ? What do they represent ?
Terry 20-May-2007 [8118]	Here's what Im trying to do.. I have a dictionary... dict: ["one" "two" "three" ... ] with a couple of million values I want to build an index block .. dex: [ 2 3 2 1] that represents the index? of each word in my dictionary
Anton 20-May-2007 [8119]	Is this to get around rebol's current word limit ?
Terry 20-May-2007 [8120x5]	Only partly
	but there's a compression value as well..
	a value in the dictionary could be a page of rebol code (as a string) .. and it could be represented with a single bit (or in this case.. 3 bytes)
	the values never change.. they may be deleted... and their index re-valued.. or the dictionary may be appended..
	so then my code becomes a block of 3 byte 'symbols' that I use to 'pick' the actual values out of the dictionary with.
Anton 20-May-2007 [8125x2]	Is the aim to create a fixed-width number representation for the index ?
Anton 20-May-2007 [8125x2]	(I mean, a fixed-width of 3 characters.)
Terry 20-May-2007 [8127x3]	i would rather have it not fixed, and grow as needed
	2 bytes would be plenty to start.. but would quickly grow to 3.. and very slowly (if ever.. but possible) grow to 4 bytes..
	so 3 bytes is fine
Anton 20-May-2007 [8130]	When the dictionary grows more than the current index limit (3 bytes -> 4 bytes), then the entire index block would need to be reprocessed to add the extra byte to each index value.
Terry 20-May-2007 [8131x5]	so.. for compaction.. the words most commonly used should be 1 byte.. least common.. 3 bytes
	that's fine.. the index can be rebuilt rather easily
	dict: ["Rebol" "Carl" "isa"] are common.. should be represented as [1 2 3 ] .. and dict:["This is a rare string.. probably only used once"] shoud be [ ZZZ ]
	Even though that last dict value is long, it only takes 3 bytes to take advantage of it.. see where Im going?
	Im going to store the entire dict in memory as a hash!, and my code as a separate block of 3 byte "symbols"
Anton 20-May-2007 [8136]	Your 2 million entry index, with each index value of 3 bytes, should theoretically take 5.7 MB. Do you really need to reduce that with compaction ?
Terry 20-May-2007 [8137]	not really...again.. performance is the main thing
older newer	first last