World: r3wp
[Core] Discuss core issues
older newer | first last |
Sunanda 20-May-2007 [8088] | For compact representation of large integer sets, I often uses the format: [10x4 50 76x1000] ir REBOL pairs! (each taking 1 value slot) meaning [10 11 12 13 50 76 77 78 ..... 1076] This works for me for large blocks of semi-sparse integers. There is a script for handling this format at REBOL.org -- look for rse-ids.r |
Terry 20-May-2007 [8089] | Im guessing my largest number is probably around 2 million... approx |
Anton 20-May-2007 [8090] | You have to know what your largest number is. Otherwise your software will break with overflow errors. |
Terry 20-May-2007 [8091] | well.. say 21 bits |
Anton 20-May-2007 [8092x3] | Well 24bits (3 bytes) --> 2^24 = 16777216 unique values. |
Ok, so you could pack each 21 (or 24) bits into a binary. | |
No wastage. | |
Terry 20-May-2007 [8095] | so is that a block of binaries then? |
Anton 20-May-2007 [8096] | No, a single binary. |
Terry 20-May-2007 [8097] | yeah for each... but i need to group them |
Anton 20-May-2007 [8098x3] | You have to write the access methods to index into your binary though. |
(So for that I recommend just using 3 bytes.) | |
Ok, well you can have several binaries, why not ? | |
Sunanda 20-May-2007 [8101x2] | Anton, I think, is suggesting you do something like this: my-list: make binary! 25000 Then use insert or append to store 3-byte values to the binary string you have created. That way you need only 1 value slot plus the length of the binary string. |
oops -- anton typed faster than me :-) | |
Terry 20-May-2007 [8103] | ahh ok.. ie: [a4b 4°¢ ÑAg] etc ? |
Sunanda 20-May-2007 [8104] | That would be a block of binary....each binar would take a slot Anton is suggesting this sort of approach: x: make binary! 25000 >> loop 18 [insert x to-char random 255] == #{5E218289FC8B65B86A1C597C232F9E8C79} Just one binary string to hold the data. |
Terry 20-May-2007 [8105x2] | is there a function to convert an integer to 3 bytes? |
ahh.. nice | |
Anton 20-May-2007 [8107x4] | No, you must make the functions to convert 3-byte-int <--> integer! |
>> bin: #{000005000006000007} >> repeat index 3 [bin-int: copy/part skip bin (index - 1 * 3) 3 print [index mold bin-int to-integer bin-int]] 1 #{000005} 5 2 #{000006} 6 3 #{000007} 7 | |
binary! is indexed per byte, so if we use 3 bytes per value, we must multiply the index by 3 to find the correct value. | |
We must do this in our functions which should allow retrieving, changing and inserting the value. | |
Terry 20-May-2007 [8111] | would there be a performance trade off as opposed to just a block of integers? (finding/ appending etc.) |
Anton 20-May-2007 [8112x3] | Note that converting integer <-> binary is a little tricky. But we can use a struct! (re-use a shared struct for optimal performance). |
Yes, there is lost performance. Obviously, you must go via your access functions each time you store and retrieve. | |
(This is addressed by Rebol3's vector! datatype, which stores different values this way.) | |
Terry 20-May-2007 [8115x2] | well, performance is the primary concern |
i thought crawling a block of binary (hash?) would be faster than a block of integers | |
Anton 20-May-2007 [8117] | How many numbers are there likely to be ? What do they represent ? |
Terry 20-May-2007 [8118] | Here's what Im trying to do.. I have a dictionary... dict: ["one" "two" "three" ... ] with a couple of million values I want to build an index block .. dex: [ 2 3 2 1] that represents the index? of each word in my dictionary |
Anton 20-May-2007 [8119] | Is this to get around rebol's current word limit ? |
Terry 20-May-2007 [8120x5] | Only partly |
but there's a compression value as well.. | |
a value in the dictionary could be a page of rebol code (as a string) .. and it could be represented with a single bit (or in this case.. 3 bytes) | |
the values never change.. they may be deleted... and their index re-valued.. or the dictionary may be appended.. | |
so then my code becomes a block of 3 byte 'symbols' that I use to 'pick' the actual values out of the dictionary with. | |
Anton 20-May-2007 [8125x2] | Is the aim to create a fixed-width number representation for the index ? |
(I mean, a fixed-width of 3 characters.) | |
Terry 20-May-2007 [8127x3] | i would rather have it not fixed, and grow as needed |
2 bytes would be plenty to start.. but would quickly grow to 3.. and very slowly (if ever.. but possible) grow to 4 bytes.. | |
so 3 bytes is fine | |
Anton 20-May-2007 [8130] | When the dictionary grows more than the current index limit (3 bytes -> 4 bytes), then the entire index block would need to be reprocessed to add the extra byte to each index value. |
Terry 20-May-2007 [8131x5] | so.. for compaction.. the words most commonly used should be 1 byte.. least common.. 3 bytes |
that's fine.. the index can be rebuilt rather easily | |
dict: ["Rebol" "Carl" "isa"] are common.. should be represented as [1 2 3 ] .. and dict:["This is a rare string.. probably only used once"] shoud be [ ZZZ ] | |
Even though that last dict value is long, it only takes 3 bytes to take advantage of it.. see where Im going? | |
Im going to store the entire dict in memory as a hash!, and my code as a separate block of 3 byte "symbols" | |
Anton 20-May-2007 [8136] | Your 2 million entry index, with each index value of 3 bytes, should theoretically take 5.7 MB. Do you really need to reduce that with compaction ? |
Terry 20-May-2007 [8137] | not really...again.. performance is the main thing |
older newer | first last |