r3wp [groups: 83 posts: 189283]
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

World: r3wp

[Core] Discuss core issues

eFishAnt
25-Apr-2006
[4063]
...but how much does the size increase when you populate the empty 
blocks with data?
Maxim
25-Apr-2006
[4064x3]
the size of that data linearly.
thus integers add 2bytes per ints, etc.
maybe a little bit more for linkeage data.
eFishAnt
25-Apr-2006
[4067]
(also, you might find ways to structure your data in REBOL which 
reduce the number of block...like make them into structured objects 
... just a wild guess, maybe that would compact it more?)
Maxim
25-Apr-2006
[4068x3]
objects take way more... hehe have you ever even tried to allocate 
1 million objects which have 40 methods.  REBOL won't even be able 
to allocate 20 thousand of them.
I had to use shared methods (like face/feel)
and can still allocate 20000 nodes a second on my trusty 1.5GHz laptop.
eFishAnt
25-Apr-2006
[4071]
does an array help?
Maxim
25-Apr-2006
[4072x2]
do you mean the array function?
cause it allocates blocks ;-)
eFishAnt
25-Apr-2006
[4074]
seems like some improvements could be made by storing more things 
in each block.
Maxim
25-Apr-2006
[4075]
each node is an atom.
eFishAnt
25-Apr-2006
[4076]
but it seems it would be a speed - size tradeoff.
Maxim
25-Apr-2006
[4077x3]
and needs its own block to determine its dependencies.
note that I am allocating 1 million objects... we are talking enterprise 
solutions here.
with more ram I can go to 10 million... that's without data...
eFishAnt
25-Apr-2006
[4080]
the fallacy of db-centric programming ... is having data centrialized, 
rather than distributed.  If you split it apart into separate messaging 
agents, then you might be able to reduce the bottlenecks created 
by having the data cetralized.
Maxim
25-Apr-2006
[4081x5]
with bedrock, each cell, is its own individual atom. there are no 
tables.
each cell is a store of associations this cell has with other things 
in the db.  being bi-directional nodes, I can create sets or groups 
of nodes and nodes can backtrack them to query related informations.
so in theory, when I add offloading of nodes (to disk or remote computers) 
within liquid, we'll be able to scale the db infinitely.
well, limited by 64 bits of addressing at a time.
(thats if R3 has a 64bit mode)
eFishAnt
25-Apr-2006
[4086x2]
bits or bytes?
oh, is that what you are worried about?  I thought you were saying 
an empty block takes 64 bytes.
Maxim
25-Apr-2006
[4088x6]
two different issues.
64 bit wide addressing/block lengths, etc, would allow liquid to 
scale more easily to petabyte sized management
where you have n number  of distributed/remote machines storing infinitely 
large amounts of tidbits of data, de centralised.
is anyone aware that inserting at the head of a block is EXTREMELY 
slow ?
just paste the following (inserting seems to be exponentially slow!) 
do [
	;----- appending -----
	print "^/^/^/==============="
	print "START APPENDING: one million times"
	blk: [] 
	s: now/precise
	loop 1000000 [
		insert tail blk none ; append
	]
	print ["completed: " difference now/precise s]
	
	;----- inserting -----
	print "^/==============="
	print "START INSERTING: ten thousand times"
	blk: [] 
	s: now/precise
	loop 10000 [
		insert blk none
	]
	print ["completed: " difference now/precise s]
	;----- inserting -----
	print "^/==============="
	print "START INSERTING: twenty thousand times"
	blk: [] 
	s: now/precise
	loop 20000 [
		insert blk none
	]
	print ["completed: " difference now/precise s]
]
shows: 
===============
START APPENDING: one million times
completed:  0:00:00.942
 
===============
START INSERTING: ten thousand times
completed:  0:00:00.47

===============
START INSERTING: twenty thousand times
completed:  0:00:01.863
Gabriele
25-Apr-2006
[4094x2]
insert at the head should be O(n), so inserting n times should be 
O(n^2)
the GC and reallocation may complicate things though. but that happens 
on appending too.
Maxim
25-Apr-2006
[4096]
why is inserting slower than appending?  arent blocks internally 
represented as linked-lists?
BrianH
25-Apr-2006
[4097]
No, that's the list! type. Blocks are arrays internally.
Maxim
25-Apr-2006
[4098x2]
but why are they faster at the tail?
I guess it because the alloc engine uses pools, and pre-allocates 
data larger than the empty block?
BrianH
25-Apr-2006
[4100]
Blocks are allocated in larger chunks that a single cell. That means 
that appends are usually just writes to a preallocated memory chunk.
Maxim
25-Apr-2006
[4101x2]
hehe
ok makes sense now.  still, I was surprise that preallocating a block 
of 1 million items was not that much quicker than starting with an 
empty block.
BrianH
25-Apr-2006
[4103]
The way you speed up appends is to force the block to preallocate 
at least as much memory as you think you will need right away by 
specifying that number as the second argument to MAKE.
Maxim
25-Apr-2006
[4104x2]
haha seems I'm reading your mind  ;-)
thanks for confirming this though  :-)
BrianH
25-Apr-2006
[4106x2]
If you really need to insert at the beginning, you can either use 
list! or think of your block in reverse and consider appends to by 
inserts at the beginning - that is the way stacks are typically implemented.
to by -> to be
Maxim
25-Apr-2006
[4108]
yess.
BrianH
25-Apr-2006
[4109]
No point to preallocating a list! though, unless it is to make sure 
you will have enough ram :)
Maxim
25-Apr-2006
[4110]
just for the record, I tried using lists, but in this stress test, 
they were quite memory intensive (compared to blocks) and eventually, 
the GC got slower at a rate quicker  than the speed improvement I 
did notice in the begining.  sooo as always, speed vs flexibility 
vs RAM still applies as always.
BrianH
25-Apr-2006
[4111]
Well, the links in the list! type are overhead, and the chunks of 
memory taken up by list nodes are smaller than those taken up by 
blocks, leading to greater memory fragmentation. REBOL doesn't have 
a compacting collector.
Maxim
25-Apr-2006
[4112]
I guess it at least impletment bottom and top allocation optimisation, 
based on chunk size?