World: r3wp
[Core] Discuss core issues
older newer | first last |
Anton 20-May-2007 [8096] | No, a single binary. |
Terry 20-May-2007 [8097] | yeah for each... but i need to group them |
Anton 20-May-2007 [8098x3] | You have to write the access methods to index into your binary though. |
(So for that I recommend just using 3 bytes.) | |
Ok, well you can have several binaries, why not ? | |
Sunanda 20-May-2007 [8101x2] | Anton, I think, is suggesting you do something like this: my-list: make binary! 25000 Then use insert or append to store 3-byte values to the binary string you have created. That way you need only 1 value slot plus the length of the binary string. |
oops -- anton typed faster than me :-) | |
Terry 20-May-2007 [8103] | ahh ok.. ie: [a4b 4°¢ ÑAg] etc ? |
Sunanda 20-May-2007 [8104] | That would be a block of binary....each binar would take a slot Anton is suggesting this sort of approach: x: make binary! 25000 >> loop 18 [insert x to-char random 255] == #{5E218289FC8B65B86A1C597C232F9E8C79} Just one binary string to hold the data. |
Terry 20-May-2007 [8105x2] | is there a function to convert an integer to 3 bytes? |
ahh.. nice | |
Anton 20-May-2007 [8107x4] | No, you must make the functions to convert 3-byte-int <--> integer! |
>> bin: #{000005000006000007} >> repeat index 3 [bin-int: copy/part skip bin (index - 1 * 3) 3 print [index mold bin-int to-integer bin-int]] 1 #{000005} 5 2 #{000006} 6 3 #{000007} 7 | |
binary! is indexed per byte, so if we use 3 bytes per value, we must multiply the index by 3 to find the correct value. | |
We must do this in our functions which should allow retrieving, changing and inserting the value. | |
Terry 20-May-2007 [8111] | would there be a performance trade off as opposed to just a block of integers? (finding/ appending etc.) |
Anton 20-May-2007 [8112x3] | Note that converting integer <-> binary is a little tricky. But we can use a struct! (re-use a shared struct for optimal performance). |
Yes, there is lost performance. Obviously, you must go via your access functions each time you store and retrieve. | |
(This is addressed by Rebol3's vector! datatype, which stores different values this way.) | |
Terry 20-May-2007 [8115x2] | well, performance is the primary concern |
i thought crawling a block of binary (hash?) would be faster than a block of integers | |
Anton 20-May-2007 [8117] | How many numbers are there likely to be ? What do they represent ? |
Terry 20-May-2007 [8118] | Here's what Im trying to do.. I have a dictionary... dict: ["one" "two" "three" ... ] with a couple of million values I want to build an index block .. dex: [ 2 3 2 1] that represents the index? of each word in my dictionary |
Anton 20-May-2007 [8119] | Is this to get around rebol's current word limit ? |
Terry 20-May-2007 [8120x5] | Only partly |
but there's a compression value as well.. | |
a value in the dictionary could be a page of rebol code (as a string) .. and it could be represented with a single bit (or in this case.. 3 bytes) | |
the values never change.. they may be deleted... and their index re-valued.. or the dictionary may be appended.. | |
so then my code becomes a block of 3 byte 'symbols' that I use to 'pick' the actual values out of the dictionary with. | |
Anton 20-May-2007 [8125x2] | Is the aim to create a fixed-width number representation for the index ? |
(I mean, a fixed-width of 3 characters.) | |
Terry 20-May-2007 [8127x3] | i would rather have it not fixed, and grow as needed |
2 bytes would be plenty to start.. but would quickly grow to 3.. and very slowly (if ever.. but possible) grow to 4 bytes.. | |
so 3 bytes is fine | |
Anton 20-May-2007 [8130] | When the dictionary grows more than the current index limit (3 bytes -> 4 bytes), then the entire index block would need to be reprocessed to add the extra byte to each index value. |
Terry 20-May-2007 [8131x5] | so.. for compaction.. the words most commonly used should be 1 byte.. least common.. 3 bytes |
that's fine.. the index can be rebuilt rather easily | |
dict: ["Rebol" "Carl" "isa"] are common.. should be represented as [1 2 3 ] .. and dict:["This is a rare string.. probably only used once"] shoud be [ ZZZ ] | |
Even though that last dict value is long, it only takes 3 bytes to take advantage of it.. see where Im going? | |
Im going to store the entire dict in memory as a hash!, and my code as a separate block of 3 byte "symbols" | |
Anton 20-May-2007 [8136] | Your 2 million entry index, with each index value of 3 bytes, should theoretically take 5.7 MB. Do you really need to reduce that with compaction ? |
Terry 20-May-2007 [8137] | not really...again.. performance is the main thing |
Anton 20-May-2007 [8138] | (They aren't symbols by the way, they're 3-byte integers.) |
Terry 20-May-2007 [8139] | an integer is a symbol |
Anton 20-May-2007 [8140] | Compaction makes everything more complex. You are writing more and more code to implement that. |
Terry 20-May-2007 [8141] | the sound you utter when you say "symbol" is a symbol |
Anton 20-May-2007 [8142] | a symbol is not (only) an integer |
Terry 20-May-2007 [8143x3] | now we're talking semantics |
my main concern is crawling the block of integers... looking for needles in haystacks.. whatever is the FASTEST method for this is best.. compaction is a by-product | |
what's faster, looking for "¥%R" or 23472937 ? | |
older newer | first last |