World: r3wp
[Core] Discuss core issues
older newer | first last |
Terry 20-May-2007 [8105x2] | is there a function to convert an integer to 3 bytes? |
ahh.. nice | |
Anton 20-May-2007 [8107x4] | No, you must make the functions to convert 3-byte-int <--> integer! |
>> bin: #{000005000006000007} >> repeat index 3 [bin-int: copy/part skip bin (index - 1 * 3) 3 print [index mold bin-int to-integer bin-int]] 1 #{000005} 5 2 #{000006} 6 3 #{000007} 7 | |
binary! is indexed per byte, so if we use 3 bytes per value, we must multiply the index by 3 to find the correct value. | |
We must do this in our functions which should allow retrieving, changing and inserting the value. | |
Terry 20-May-2007 [8111] | would there be a performance trade off as opposed to just a block of integers? (finding/ appending etc.) |
Anton 20-May-2007 [8112x3] | Note that converting integer <-> binary is a little tricky. But we can use a struct! (re-use a shared struct for optimal performance). |
Yes, there is lost performance. Obviously, you must go via your access functions each time you store and retrieve. | |
(This is addressed by Rebol3's vector! datatype, which stores different values this way.) | |
Terry 20-May-2007 [8115x2] | well, performance is the primary concern |
i thought crawling a block of binary (hash?) would be faster than a block of integers | |
Anton 20-May-2007 [8117] | How many numbers are there likely to be ? What do they represent ? |
Terry 20-May-2007 [8118] | Here's what Im trying to do.. I have a dictionary... dict: ["one" "two" "three" ... ] with a couple of million values I want to build an index block .. dex: [ 2 3 2 1] that represents the index? of each word in my dictionary |
Anton 20-May-2007 [8119] | Is this to get around rebol's current word limit ? |
Terry 20-May-2007 [8120x5] | Only partly |
but there's a compression value as well.. | |
a value in the dictionary could be a page of rebol code (as a string) .. and it could be represented with a single bit (or in this case.. 3 bytes) | |
the values never change.. they may be deleted... and their index re-valued.. or the dictionary may be appended.. | |
so then my code becomes a block of 3 byte 'symbols' that I use to 'pick' the actual values out of the dictionary with. | |
Anton 20-May-2007 [8125x2] | Is the aim to create a fixed-width number representation for the index ? |
(I mean, a fixed-width of 3 characters.) | |
Terry 20-May-2007 [8127x3] | i would rather have it not fixed, and grow as needed |
2 bytes would be plenty to start.. but would quickly grow to 3.. and very slowly (if ever.. but possible) grow to 4 bytes.. | |
so 3 bytes is fine | |
Anton 20-May-2007 [8130] | When the dictionary grows more than the current index limit (3 bytes -> 4 bytes), then the entire index block would need to be reprocessed to add the extra byte to each index value. |
Terry 20-May-2007 [8131x5] | so.. for compaction.. the words most commonly used should be 1 byte.. least common.. 3 bytes |
that's fine.. the index can be rebuilt rather easily | |
dict: ["Rebol" "Carl" "isa"] are common.. should be represented as [1 2 3 ] .. and dict:["This is a rare string.. probably only used once"] shoud be [ ZZZ ] | |
Even though that last dict value is long, it only takes 3 bytes to take advantage of it.. see where Im going? | |
Im going to store the entire dict in memory as a hash!, and my code as a separate block of 3 byte "symbols" | |
Anton 20-May-2007 [8136] | Your 2 million entry index, with each index value of 3 bytes, should theoretically take 5.7 MB. Do you really need to reduce that with compaction ? |
Terry 20-May-2007 [8137] | not really...again.. performance is the main thing |
Anton 20-May-2007 [8138] | (They aren't symbols by the way, they're 3-byte integers.) |
Terry 20-May-2007 [8139] | an integer is a symbol |
Anton 20-May-2007 [8140] | Compaction makes everything more complex. You are writing more and more code to implement that. |
Terry 20-May-2007 [8141] | the sound you utter when you say "symbol" is a symbol |
Anton 20-May-2007 [8142] | a symbol is not (only) an integer |
Terry 20-May-2007 [8143x7] | now we're talking semantics |
my main concern is crawling the block of integers... looking for needles in haystacks.. whatever is the FASTEST method for this is best.. compaction is a by-product | |
what's faster, looking for "¥%R" or 23472937 ? | |
or 16581375 rather | |
The dictionary will have 2 million strings... the index would be much smaller | |
index being the DB using triples as schema | |
Here's a question.. when a block of integers is converted to a HASH!.. what actually happens to it? | |
Anton 20-May-2007 [8150x2] | Hashes, lists and blocks can contain any type of value. The values are not affected by the container type. |
A hash just has a different way of linking the items it contains together, so it performs faster in some operations at the expense of others. | |
Oldes 20-May-2007 [8152x2] | If you just need to save large arrays of integers, you can use format used in AS3: The AS3 Integer can be encoded into between 1 and 5 bytes. * if the integer is between 0×00 and 0x7F then only one byte (representing the integer) * if between 0×80 and 0x3FFF then 2 bytes : o (i & 0x7F) | 0×80 o (i » 7) * if between 0×4000 and 0x1FFFFF then 3 bytes : o (i & 0x7F) | 0×80 o (i » 7) | 0×80 o (i » 14) * if between 0×200000 and 0xFFFFFFF then 4 bytes : o (i & 0x7F) | 0×80 o (i » 7) | 0×80 o (i » 14) | 0×80 o (i » 21) * if more or equal than 0×10000000 : o (i & 0x7F) | 0×80 o (i » 7) | 0×80 o (i » 14) | 0×80 o (i » 21) | 0×80 o (i » 28) |
but if jyou need to search such an array, it's not the best for your... especially if there is no rebcode yet:( | |
Anton 20-May-2007 [8154] | That's interesting, using a bit in each byte to "escape" and open up the next, more significant byte. So it's a variable-length encoding scheme. (I guess, kind of like UTF8) |
older newer | first last |