r3wp [groups: 83 posts: 189283]
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

World: r3wp

[!REBOL3]

Maxim
13-Dec-2010
[6518]
yeah, I did a few tests using async, but its a bit tricky, nothing 
as elegant as what Awi proposes/asks
Jerry
13-Dec-2010
[6519]
is there a way that I can do to pre-allocate a huge file? Pre-allocated 
file will have a continuous disk space, which makes seeking fast.
Maxim
13-Dec-2010
[6520x2]
something like? :

write %file head insert "" 1000000
oops... 

write %file head insert/dup "" " " 1000000
Jerry
13-Dec-2010
[6522]
Maxim, I am not sure if doing that will have a continuous disk space. 
Why don't we have something like this: make %file 100000000
Maxim
13-Dec-2010
[6523]
usually it will depend on disk fragmentation.  I'm not sure all OS 
allow you to force a contiguous disk area. 

on any OSes which allow it, it could be a good suggestion for R3.
Jerry
13-Dec-2010
[6524]
Actually, I am making a NoSQL in R3 for Chinese Social Network Analysis. 
I have more than 20,000,000 records, and every records might need 
a few KB.
Maxim
13-Dec-2010
[6525x2]
you should post a CC wish ticket for it.
(and its a good idea to give some details as to why you need (in 
the CC ticket) this in order to give persepective on the ticket)
Jerry
13-Dec-2010
[6527]
When data are not much, my NoSQL was very fast, I called it Lightning 
DB. Now with so much data, It's very slow, and I call it Snail DB. 
:-(
Maxim
13-Dec-2010
[6528]
20,000,000 * 1kb *  n   =  a massive file!  :-)
Jerry
13-Dec-2010
[6529]
OK, I will post this wish in CC.
Maxim
13-Dec-2010
[6530]
are you sure its related to fragmentation?
Jerry
13-Dec-2010
[6531]
Well... I am 95 % sure. By the way, Every column of every table has 
its own one or two huge files. I didn't put all the data in a single 
file.
Steeve
13-Dec-2010
[6532]
As far I remember It's working with R2 if you write the %file past 
the end, it's filled with #{00].

I remember having asked the same feature in R3 one year ago at least 
(in the R3 chat)
Oldes
13-Dec-2010
[6533]
Are you sure it must be in one file?
BrianH
13-Dec-2010
[6534]
Thanks again, Jerry, for pushing the resource usage limits of R3. 
http://issue.cc/r3/1799is about reallocating a 256+ MB map! to an 
even larger map!, and then getting a slowdown probably because of 
virtual memory use. We really need tasks so this can go on in the 
background :)
Andreas
13-Dec-2010
[6535]
More likely a 512+MB map.
BrianH
13-Dec-2010
[6536]
I think the problem was a realloc from 256+MB to 512+MB, which would 
temporarily have 768+MB in memory, plus a hash table recalculation.
Andreas
13-Dec-2010
[6537]
If a single R3 value slot is 128bit and a map needs two value slots 
for each (key, value) pair:
(128 / 8) * 2 * (2 ** 24) / (1024 ** 2) == 512.0
BrianH
13-Dec-2010
[6538x2]
Weird, I did a similar calculation and got 256. I should revise my 
comment.
Oh right, I put a 16 in there when it should have been a 32.
Maxim
13-Dec-2010
[6540]
so should maps allow to be pre-allocated like series?
Andreas
13-Dec-2010
[6541x3]
They are already.
m: make map! n
Well, at least that allocates _something_ :)
Maxim
13-Dec-2010
[6544]
hehe
Pekr
13-Dec-2010
[6545x2]
:-)
So is it about the initial sufficient prediction of programmer allocating 
enough of memory, or is there some artificial limit for the map size?
Maxim
13-Dec-2010
[6547x2]
its about the fact that some things have to be arrays in ram, and 
if you don't make them big enough to begin with eventually, you have 
to live with this sort of "cleanup'
REBOL didn't crash so I'd assume it did its job correctly.  
but a few tests might prove that something is not optimal.
Pekr
13-Dec-2010
[6549]
Why do guys need so large map array? Don't you remember Bill Gates 
once said, that 640KB is enough for everyone? :-)
Maxim
13-Dec-2010
[6550x3]
this seems to prove that pre-allocating maps does indeed store all 
the space required:

>> stats
== 921576
>> m: make map! 20000000
== make map! [
]
>> stats
== 909353688
Jerry, you might want to try the above and see if the wait occurs 
on the 21 millionth item.
(20 millionth item + 1)
Andreas
13-Dec-2010
[6553]
I'm with Brian in that most likely the only thing "not optimal" is 
the amount of RAM in Jerry's system.
Maxim
13-Dec-2010
[6554x2]
yep!
if the pre-allocation fixes the setup. then there's no bug in REBOL.
Andreas
13-Dec-2010
[6556]
Preallocation won't help with insufficient memory :)
Pekr
13-Dec-2010
[6557]
moving to SSD disks might help a bit :-)
BrianH
13-Dec-2010
[6558]
Well, it would, actually. It would still be slow to use, but not 
as slow. Reallocation takes even more memory.
Andreas
13-Dec-2010
[6559]
Yes, of course.
BrianH
13-Dec-2010
[6560]
This is assuming virtual memory. I wouldn't even be able to test 
on this system; it only has 1GB of RAM. My main system would work 
though.
Andreas
13-Dec-2010
[6561x3]
I was hinting at the fact that this probably won't matter much if 
we are talking about a 512M system :)
>> dt [m: make map! [] repeat i to-integer 2 ** 24 [poke m i i]]
== 0:01:22.254896
;; 2393MB resident
>> dt [m: make map! n: to-integer 2 ** 24 repeat i n [poke m i i]]
== 0:00:48.329933
;; 1026MB resident
BrianH
13-Dec-2010
[6564]
I suggested preallocation in a comment. You might want to chime in 
with that code :)
Andreas
13-Dec-2010
[6565]
I would probably make it significantly large than the number of expected 
pairs.
BrianH
13-Dec-2010
[6566]
Round up :)
Andreas
13-Dec-2010
[6567]
;; Storing 2^24 in a map with 2^25 preallocated

>> dt [m: make map! to-integer 2 ** 25 repeat i to-integer 2 ** 24 
[poke m i i]]
== 0:00:33.695578
;; 1538MB resident