World: r3wp

Join the discussions in the REBOL3 world...

[Core] Discuss core issues

older newer	first last
Terry 20-May-2007 [8133x3]	dict: ["Rebol" "Carl" "isa"] are common.. should be represented as [1 2 3 ] .. and dict:["This is a rare string.. probably only used once"] shoud be [ ZZZ ]
	Even though that last dict value is long, it only takes 3 bytes to take advantage of it.. see where Im going?
	Im going to store the entire dict in memory as a hash!, and my code as a separate block of 3 byte "symbols"
Anton 20-May-2007 [8136]	Your 2 million entry index, with each index value of 3 bytes, should theoretically take 5.7 MB. Do you really need to reduce that with compaction ?
Terry 20-May-2007 [8137]	not really...again.. performance is the main thing
Anton 20-May-2007 [8138]	(They aren't symbols by the way, they're 3-byte integers.)
Terry 20-May-2007 [8139]	an integer is a symbol
Anton 20-May-2007 [8140]	Compaction makes everything more complex. You are writing more and more code to implement that.
Terry 20-May-2007 [8141]	the sound you utter when you say "symbol" is a symbol
Anton 20-May-2007 [8142]	a symbol is not (only) an integer
Terry 20-May-2007 [8143x7]	now we're talking semantics
	my main concern is crawling the block of integers... looking for needles in haystacks.. whatever is the FASTEST method for this is best.. compaction is a by-product
	what's faster, looking for "�%R" or 23472937 ?
	or 16581375 rather
	The dictionary will have 2 million strings... the index would be much smaller
	index being the DB using triples as schema
	Here's a question.. when a block of integers is converted to a HASH!.. what actually happens to it?
Anton 20-May-2007 [8150x2]	Hashes, lists and blocks can contain any type of value. The values are not affected by the container type.
Anton 20-May-2007 [8150x2]	A hash just has a different way of linking the items it contains together, so it performs faster in some operations at the expense of others.
Oldes 20-May-2007 [8152x2]	If you just need to save large arrays of integers, you can use format used in AS3: The AS3 Integer can be encoded into between 1 and 5 bytes. * if the integer is between 0�00 and 0x7F then only one byte (representing the integer) * if between 0�80 and 0x3FFF then 2 bytes : o (i & 0x7F) \| 0�80 o (i � 7) * if between 0�4000 and 0x1FFFFF then 3 bytes : o (i & 0x7F) \| 0�80 o (i � 7) \| 0�80 o (i � 14) * if between 0�200000 and 0xFFFFFFF then 4 bytes : o (i & 0x7F) \| 0�80 o (i � 7) \| 0�80 o (i � 14) \| 0�80 o (i � 21) * if more or equal than 0�10000000 : o (i & 0x7F) \| 0�80 o (i � 7) \| 0�80 o (i � 14) \| 0�80 o (i � 21) \| 0�80 o (i � 28)
Oldes 20-May-2007 [8152x2]	but if jyou need to search such an array, it's not the best for your... especially if there is no rebcode yet:(
Anton 20-May-2007 [8154x2]	That's interesting, using a bit in each byte to "escape" and open up the next, more significant byte. So it's a variable-length encoding scheme. (I guess, kind of like UTF8)
Anton 20-May-2007 [8154x2]	But yes, I think this adds complexity and would slow down the access routines.
Terry 20-May-2007 [8156]	yeah... I think the conclusion is ... don't worry about the number of bytes (and thus mem) when using plain integers with my index block, as it's much smaller than the dictionary anyways.. and unless Im shown otherwise, the crawling of it (find/ foreach, append) should be about as fast as any other method, right?
Anton 20-May-2007 [8157]	It should be faster and simpler to just use integers. When you want to cut down the size of your 2 million integers (=30MB), you can then look at implementing 3-byte integers packed in a binary.
Terry 20-May-2007 [8158]	exactly
Anton 20-May-2007 [8159x2]	Here's a couple of conversion routines.
Anton 20-May-2007 [8159x2]	struct: make struct! [int [int]] none ; convert integer -> 3-byte binary integer-to-3-byte-binary: func [integer [integer!]][ struct/int: integer copy/part third struct 3 ] ; convert 3-byte binary -> integer binary-3-byte-to-integer: func [int3 [binary!]][ struct/int: 0 ; just make sure all bytes are zero change third struct int3 struct/int ] ; test my-bin: integer-to-3-byte-binary 2000000 ;== #{80841E} my-int: binary-3-byte-to-integer my-bin ;== 2000000
Terry 20-May-2007 [8161x3]	ok.. that's cool.. thanks
	Even though each integer uses 16 bytes, there's some compaction by using the smallest integers with the most commonly used dictionary strings
	I'll add these functions to the dictionary.. dict: [{integer-to-3-byte-binary: func [integer [integer!]][ struct/int: integer copy/part third struct 3 ]}] now whenever i want to use that function.. i can represent it with a single integer.. ie: do pick dict 1
Oldes 20-May-2007 [8164x2]	Terry, I would not make own binary storage as there should be new datatype dictionary! in R3 which is exactly what you need... if I understand it well
Oldes 20-May-2007 [8164x2]	I cannot see hash! in this list http://www.rebol.net/r3blogs/0076.html so it will be probably replaced
Terry 20-May-2007 [8166]	and that could be reduced further.. dict: [ {fetch: func [index][do pick dict index]} {integer-to-3-byte-binary: func [integer [integer!]][ struct/int: integer copy/part third struct 3 ]} ] do pick dict 1 fetch 2 my-bin: integer-to-3-byte-binary 2000000
Oldes 20-May-2007 [8167]	The new vector! datatype will be usefull as well to store large block of same data type
Terry 20-May-2007 [8168]	Im happy with R2 ;)
Oldes 20-May-2007 [8169]	yes... you can make it in R2 using index: [1 2 3] and for R3 you just replace: index: make vector! [integer! 24 [1 2 3]]
Terry 20-May-2007 [8170x3]	Now as for compressing the dictionary.. it seems that smaller strings grow with compression?
	>> a: compress "a" == #{789C4B04000062006201000000}
	>> a: compress "aaaa" == #{789C4B4C4C4C040003CE018504000000} >> a: compress "aaaaa" == #{789C4B4C04020005B401E605000000}
Oldes 20-May-2007 [8173x2]	compress is uzing zlib compression... just insetad of checksum at the tail (last 4 bytes) is used size of the source string
Oldes 20-May-2007 [8173x2]	the size is used to make a result buffer for decompress: >> decompress rejoin [#{789C4B4C} #{04020005B401E6} #{00000000}] ** Script Error: Not enough memory >> decompress rejoin [#{789C4B4C} #{04020005B401E6} #{FF000000}] == "aaaaa"
Dockimbel 20-May-2007 [8175]	Terry, your 'integer-to-3-byte-binary conversion function is dependent on platform endianness (third struct! gives you an endian-dependent result), watch out for that.
Anton 20-May-2007 [8176]	You can blame me for that, I wrote it quickly.
btiffin 21-May-2007 [8177]	Terminology question; I know I could probably RTFM, but sometimes Ask A Friendly Human is more fun. What is the correct terminology for the global REBOL context. I'm describing (or trying to at least) the parse dialect "copy" versus the REBOL "copy". Is there a one word term for the "no context" context? Or is the REBOL global namespace good enough (and not too confusing to new rebols)?
Anton 22-May-2007 [8178]	Yeah, we all call it the global context.
btiffin 22-May-2007 [8179]	Thanks Anton.
Jerry 22-May-2007 [8180]	Is there a function which can flat a block. For example, >> FLAT [ 1 2 [ 3 4 ] 5 6 ] == [ 1 2 3 4 5 6]
Graham 22-May-2007 [8181]	probably not what you want ... >> to-block form [ 1 2 [ 3 4 ] 5 6 ] == [1 2 3 4 5 6]
Chris 22-May-2007 [8182]	flatten: func [block [any-block!]][ parse block [ any [block: any-block! (change/part block first block 1) :block \| skip] ] head block ]
older newer	first last