r3wp [groups: 83 posts: 189283]
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

World: r3wp

[!REBOL3]

Pekr
13-Dec-2010
[6545x2]
:-)
So is it about the initial sufficient prediction of programmer allocating 
enough of memory, or is there some artificial limit for the map size?
Maxim
13-Dec-2010
[6547x2]
its about the fact that some things have to be arrays in ram, and 
if you don't make them big enough to begin with eventually, you have 
to live with this sort of "cleanup'
REBOL didn't crash so I'd assume it did its job correctly.  
but a few tests might prove that something is not optimal.
Pekr
13-Dec-2010
[6549]
Why do guys need so large map array? Don't you remember Bill Gates 
once said, that 640KB is enough for everyone? :-)
Maxim
13-Dec-2010
[6550x3]
this seems to prove that pre-allocating maps does indeed store all 
the space required:

>> stats
== 921576
>> m: make map! 20000000
== make map! [
]
>> stats
== 909353688
Jerry, you might want to try the above and see if the wait occurs 
on the 21 millionth item.
(20 millionth item + 1)
Andreas
13-Dec-2010
[6553]
I'm with Brian in that most likely the only thing "not optimal" is 
the amount of RAM in Jerry's system.
Maxim
13-Dec-2010
[6554x2]
yep!
if the pre-allocation fixes the setup. then there's no bug in REBOL.
Andreas
13-Dec-2010
[6556]
Preallocation won't help with insufficient memory :)
Pekr
13-Dec-2010
[6557]
moving to SSD disks might help a bit :-)
BrianH
13-Dec-2010
[6558]
Well, it would, actually. It would still be slow to use, but not 
as slow. Reallocation takes even more memory.
Andreas
13-Dec-2010
[6559]
Yes, of course.
BrianH
13-Dec-2010
[6560]
This is assuming virtual memory. I wouldn't even be able to test 
on this system; it only has 1GB of RAM. My main system would work 
though.
Andreas
13-Dec-2010
[6561x3]
I was hinting at the fact that this probably won't matter much if 
we are talking about a 512M system :)
>> dt [m: make map! [] repeat i to-integer 2 ** 24 [poke m i i]]
== 0:01:22.254896
;; 2393MB resident
>> dt [m: make map! n: to-integer 2 ** 24 repeat i n [poke m i i]]
== 0:00:48.329933
;; 1026MB resident
BrianH
13-Dec-2010
[6564]
I suggested preallocation in a comment. You might want to chime in 
with that code :)
Andreas
13-Dec-2010
[6565]
I would probably make it significantly large than the number of expected 
pairs.
BrianH
13-Dec-2010
[6566]
Round up :)
Andreas
13-Dec-2010
[6567x3]
;; Storing 2^24 in a map with 2^25 preallocated

>> dt [m: make map! to-integer 2 ** 25 repeat i to-integer 2 ** 24 
[poke m i i]]
== 0:00:33.695578
;; 1538MB resident
~30% faster.
Still incredibly slow.
BrianH
13-Dec-2010
[6570]
Are you doing those measurements in a fresh console each time? Remember, 
process allocation from the OS matters too.
Andreas
13-Dec-2010
[6571]
Yes :)
BrianH
13-Dec-2010
[6572]
Figured :)
Andreas
13-Dec-2010
[6573x6]
With speedstepping switched off and the REBOL process pinned to a 
single core.
And ...
What antagonist language of the day do we want to counter-benchmark?
Let's go with Python.
~10 times faster (without pre-allocation). Well ...
Guess there's some room for improvement :)
BrianH
13-Dec-2010
[6579]
How fast without the repeat, but with preallocation, comparing Python 
and R3? Remember, Python is compiled.
Andreas
13-Dec-2010
[6580]
Bytecode compiled, with a really slow VM.
BrianH
13-Dec-2010
[6581]
Bytecode compiled is still compiled, and it is likely that bytecodes 
specific to loops are being used. I am interested in the comparison 
of the preallocation of the map! type versus the Python equivalent, 
whatever that is.
Andreas
13-Dec-2010
[6582]
I don't know of a way to preallocate Python dicts.
BrianH
13-Dec-2010
[6583]
Python dicts are probably not allocated in a large chunk.
Andreas
13-Dec-2010
[6584x3]
Well, afair they use open addressing in their implementation, so 
I guess they will.
But then, no idea about the impl details.
As an aside, path notation (`m/(i): i`) is slightly (6%) faster than 
POKE in this case.
BrianH
13-Dec-2010
[6587x2]
POKE has to look up the function twice: Once from the word, the nect 
time in the action's datatype. Path evaluation at least knows what 
it's doing, so it doesn't have to figure it out.
the next time in the datatype's action list.
Jerry
13-Dec-2010
[6589]
There must be some algorithm issue in R3 map!. When I have 21,000,000 
key-value pairs in a map, accessing it becomes very slow. Using " 
mymap/:key " to get a value takes 0.2 sec.
Andreas
13-Dec-2010
[6590]
Did you read the notes suggesting that you might be running out of 
physical memory (RAM)?
Jerry
13-Dec-2010
[6591]
I hope the lack of enough physical memory is the reason. I have 2GB 
RAM in my PC. I will get my MacBook Pro this evening. It will have 
8GB RAM. I will test this in my Mac.
Andreas
13-Dec-2010
[6592x2]
Well, 2GB will be a close call for 21M entries.
Have a look at your memory/swap consumption, that'll probably help 
you identify if that's a problem.
Jerry
13-Dec-2010
[6594]
Yeah, I hope we will have 64-bit REBOL for Mac soon. Analizing social 
networking data without enough physical memory is a pain.