r3wp [groups: 83 posts: 189283]
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

World: r3wp

[Core] Discuss core issues

Maxim
12-Dec-2009
[15199]
the best developper community in the world   :-D
Graham
12-Dec-2009
[15200]
My paypal account is ...
Maxim
12-Dec-2009
[15201]
(hahaha was gonna say...  "bills in the mail"  ;-)
Graham
12-Dec-2009
[15202]
too slow ...
Von
12-Dec-2009
[15203]
Hey, I'd be happy to send some $, if it means speeding up my learning 
curve!  I'll PayPal some money over, seriously!
Graham
12-Dec-2009
[15204x2]
great .. can pay for my new USB LCD monitor :)  [sales-:-compkarori-:-co-:-nz] 
:)
.. help pay .. not pay for the whole lthing!
Von
12-Dec-2009
[15206]
Done, I've submitted via PayPal :-)  Thanks for your help, I get 
can some rest now :-)  Thanks Maxim for your help also!
Graham
12-Dec-2009
[15207]
Thanks Von .. will certainly encourage others to help you out :)
Maxim
12-Dec-2009
[15208]
and who said we coudn't make profit by reboling  ;-)
Von
13-Dec-2009
[15209]
Graham, you mentioned that I should encode the password.   Is this 
in case someone hacks into my host?  If I use the encloak function, 
couldn't someone also find my readable key in the script and then 
decloak my password using Rebol?
Henrik
13-Dec-2009
[15210x2]
would there be instances where write/lines/append would write a quarter 
or half a line? I'm logging tests of several script instances into 
the same file and write/lines/append sometimes produces only half 
a line in the log.
sometimes empty lines occur as well
sqlab
13-Dec-2009
[15212]
If you write with different rebol instances into the same file at 
the same time, you are out of luck. I
Janko
13-Dec-2009
[15213x2]
could you create something like a trie in rebol or would you have 
to go lower level for it to be normally eficient?
(let's say I want to use key-value (string-int) pairs for 5M words 
.. hash tables are probably more memory consuming for such a big 
set of data?)
Maxim
13-Dec-2009
[15215x4]
hash tables for such a big set are the only way to go... they will 
be magnitudes faster on access.
I've had REBOL use up over 700MB of RAM without isues or dramatic 
speed drops... but having millions of items, make sure you pre-allocate 
your hash-table, cause if you keep-appending to the same table for 
each entry, it will get  exponentially slower.
but when things are in the millions, sometimes using a disk-based 
on-demand caching algorithm is fastest... it really depends on the 
application.


cause think of it this way.  every byte used by each element becomes 
a MB  so adds up quickly.


5 million pairs of (10 byte) strings and pairs... is just about 350MB 
!

 >> b: make block! 5000010
>> m: stats
== 84172417

>> loop 5000000 [append b copy random "1234567890" append b random 
10000000]

== ["5862713409" 4765171 "2546013987" 2726704 "9528013746" 3565380 
"4591302786" ...
>>  stats - m
== 348435008
(oops  'strings and pairs'  >  'string and *integers*' )
Janko
13-Dec-2009
[15219x3]
Maxim .. thanks a lot for your answers.. very interesting .. I know 
from distance how hashtables work internally but I don't know details.. 
should a block take roughly the same space as hashtable of the same 
block (in rebol) or factor(s) different?
hm.. does stats return ram used??
(it does, cool :) I was looking at processes to see how much it will 
eat)
Maxim
13-Dec-2009
[15222]
hum... lets see:  ;-)

a: stats
b: make block! 5000010
print stats - a
== 80001039

a: stats
b: make hash! 5000010
print stats - a
== 80005071
Janko
13-Dec-2009
[15223x2]
>> a: stats b: make block! 1000 repeat i 1000 [ append b random "abcdef" 
random 100000 ] print stats - a
48671

>> a: stats b: make hash! 1000 repeat i 1000 [ append b random "abcdef" 
random 100000 ] print stats - a
81454
:)
Maxim
13-Dec-2009
[15225]
but... filled up....

b: make hash! 5000010
m: stats

loop 5000000 [append b copy random "1234567890" append b random 10000000]
print stats - m
== 188430448

here its half the space.


a ha!  depending on the string input... hash tables can actually 
be smaller...   :-)
Janko
13-Dec-2009
[15226]
stats is a cool command , with many refinements also .. I didn't 
know about it
Maxim
13-Dec-2009
[15227]
in REBOL, we're a newbie a few minutes... every day.... even after 
a decade of using it   ;-)
Janko
13-Dec-2009
[15228]
I am nevbie a little longer each day :)
Maxim
13-Dec-2009
[15229]
hehe
Janko
13-Dec-2009
[15230x2]
aha, I see that it depends .. I increased the length of string and 
block increased in size while hash stayed the same
hm.. I have a very newbie question .. do you most effectively add 
new pairs to hashtable by appending to it as a block ? can't figure 
out how to change a value .. set doesn't work that way
Maxim
13-Dec-2009
[15232x4]
yep.
append works on hash tables.  in fact they are exactly the same as 
if you where using blocks, except that the internal representation 
is different than what you look at through code.
a: make hash! [ "33" 33 "44" 44 "55" 55]
select a "33"
change find a "44" ["88" 88]
== make hash! ["33" 33 "88" 88 "55" 55]
but janko... if you test it, you will that hash tables are   extremely 
faster at retrieving data...  the larger the set the bigger the difference. 
 

on millions of records indexed with strings , it could be hundreds 
or thousands of times faster  :-)
Graham
13-Dec-2009
[15236]
Von, I think I just mean that your password for emstp will have to 
be in the script ( if it is needed .. )
Janko
13-Dec-2009
[15237]
Maxim: yes, I am aware that retrieving data from hashtables is really 
fast... I wasn't aware it will just as fast even with 1M records 
so I was quite amazed before when I tried it
Pavel
14-Dec-2009
[15238x2]
Transfering memory based hash! (map! in R3) datatype into disk based 
shema automatically keeping the hash table computation and lookup 
hidden from user gives you a RIF. Holly grail of all rebollers :) 
long long time promissed, still waiting to be done. Anyway hash tables 
are always usually unsorted, when necessary to search in usually 
some type of additional index is used (B-tree for example), for simple 
information if the key is in the set, bitmap vectors are used with 
advantage, when the set is really big (and bitmap vector doesn fit 
into memory) comressed bitmap may be used and usually bitwise operations 
on those vectors are much quicker than on uncompressed. 

Thisi is why it should be used for bitset! datatype anyway. The number 
of byte aligned (BBC,Packbit,RLE)od word aligned (WAH) schemes exists. 
 It is used in very large datasets when index also resides in disk 
file. Once again bitwise operation may be much quickier even in memory 
on those schemes.
For those interrested a Fastbit webpage is good source of docs.
Maxim
14-Dec-2009
[15240x2]
when map! will added to extensions, you might be able implement an 
example for us and Carl might consider adding your code directly 
in the host or r3lib if you agree to it.   :-)
you seem to be already knowledged about this, so you'd be the best 
one to implement it IMHO (pavel).
Pavel
15-Dec-2009
[15242]
I'd glad to try, but internals are quite well hidden now. Anyway 
any hint about handle or crossreferencing from extension you have 
found Maxim?
Maxim
15-Dec-2009
[15243]
you mean calling code from the host within extensions?
Pavel
15-Dec-2009
[15244]
yes I've understand your anouncement this way
Maxim
15-Dec-2009
[15245x4]
I will be rebuilding the callback example with a much better/simpler 
design. but they work very well, basically I have mapped the Reb_Do_String() 
and Reb_Print() functions so that they can be called from within 
any extension.
I am also building little helper funcs like a REBOL datatype centric 
version of sprintf  which acts a bit like a C-side rejoin for REBOL.
this way we can create rebol code directly from strings and native 
data very easily.


there is currently a size limit on executed strings, its a simple 
question of optimisation.  this means we can't use the wiredf function 
for creating large datasets via strings (for now).

but I'm already doing stuff like:


wiredf("rogl-event-handler make wr-event [new-size: %p]", win-w, 
win-h);


calls rebol's do with %p replaced by a pair, using 2 ints.  this 
is a varargs function.
(the callback framework is currently called wired)