r3wp [groups: 83 posts: 189283]
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

World: r3wp

[Core] Discuss core issues

Graham
14-May-2009
[13749]
If you know the length of text "butt" you don't need to find the 
end .. just calculate it, as it should be faster.
Anton
14-May-2009
[13750]
amacleod, I wrote some efficient string search functions. The file 
has simple, first implementations alongside some efficient implementations 
which use PARSE.

http://anton.wildit.net.au/rebol/library/demo-string-search-functions.r
amacleod
14-May-2009
[13751x2]
I'll check it out.  thanks, Anton.
I got it working...

Sort of inserted my hi-lite mapping code into Graham's function...
Thanks again, Graham!
amacleod
15-May-2009
[13753]
It allowed me to step through the face one find after another. Exactly 
what I wanted to do.
Maxim
17-May-2009
[13754x2]
I'm having a problem with call.


when I use the /output/error and supply  strings , I get a strange 
error in the stdout holding string:

Unable to read from standard input: The handle is invalid.


the really strange one is that if I use /console, the expeted out 
is effectively printed out to the rebol console.... so the cmd does 
return data and rebol is able to handle it, but not within the /output/ 
refinement!


also, another call, using another command, works as expected... so 
I don't think I have a error in my code.

not that I have tried using /shell and it didn't help

has anyone seen /output and /console react this way?
not=note
Gabriele
17-May-2009
[13756]
maybe you also need to provide /input ? (it's hard to guess what 
the problem is)
Graham
17-May-2009
[13757]
oddly this group was private but everyone was in it .. made it public.
Maxim
17-May-2009
[13758x3]
I will try with/input.  providing a few "^/"  but It definitely doesn't 
need input on the command-line.
but sometimes with progamming... logic doesn't prevail, hehehe
gabriele, the /input  works !!!  damn why didn't I think of that.. 
I lost several hours trying to find other command-line ssh tools 
which would work... none really do... the putty tools really are 
the best ones out there.
TomBon
18-May-2009
[13761]
what is the fastest way to count the occurence of each numbers
within a block containing 5000+ numbers?  

e.g [1 3 6 2 4 9 33 6 67 2 12 23 34 12 2 4 56 2 ...]

calculating the max and min with a simple loop and counter is 
very time consuming for thousends of these blocks. 
inserting into mysql and doing a 'group by' also. 
any ideas for a faster solution?
Henrik
18-May-2009
[13762x2]
what if you sort the block first?
then FIND on the first occurrence of 1, of 2, of 3, of 4, etc. and 
store the indexes.
TomBon
18-May-2009
[13764x3]
yes henrik, nice idea in this moment I was thinking similar but with 
copy/part but storing the index is much faster.
the numbers for the  FIND can be supplied from a copy of the block 
with UNIQE
yep, will give it a try, thx henrik...
Sunanda
18-May-2009
[13767]
I usually use Joel's tally function -- it saves me the time needed 
to write a faster function:
http://www.rebol.org/ml-display-message.r?m=rmlKDDS
Izkata
18-May-2009
[13768x2]
I think the fastest would be a variation of radix sort, if you know 
the max/min..

Setup for example:
>> Data: array/initial 5000 does [random 200]

== [36 119 148 112 87 59 176 21 95 138 183 28 1 119 14 74 39 78 141 
99 56 71 47 174 92 81 157 182 122 123 98 73 35 170 150 11 183 8...

>> MaxVal: fold Data 0 :max ;Just use a simple loop, I defined fold 
long ago for simplicity
== 200

What I mean:
>> Vals: array/initial MaxVal 0

== [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0...
>> foreach Val Data [Vals/:Val: Vals/:Val + 1]
== 27

>> repeat I MaxVal [if 0 <> Vals/:I [print rejoin [I {, } Vals/:I 
{ times}]]]
1, 24 times
2, 20 times
..........
199, 20 times
200, 16 times
Tally looks nice, though, and not as memory-intensive
Dockimbel
18-May-2009
[13770x2]
Here's one, maybe not fully optimized, but should be fast enough 
:

count-occurences: func [data [block!] /local keys count idx][
	keys: make hash! unique sort data
	count: array/initial length? keys 0
	parse data [
		some [
			set v skip (
				idx: index? find keys v
				poke count idx 1 + count/:idx
			)
		]
	]
	reduce [keys count]
]
>> data: array/initial 5000 does [random 200]

== [36 119 148 112 87 59 176 21 95 138 183 28 1 119 14 74 39 78 141 
99 56 71 47 174 92 81 157 182 122
123 98 73 35 170 150 11 183 8...
>> t: now/time/precise
== 22:01:04.156
>> res: count-occurences data

== [make hash! [1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 
21 22 23 24 25 26 27 28 29 30 31 32
 33 34 35 36 37 38 39 40 41 4...
>> now/time/precise - t
0:00:00.031
>> res/1

== make hash! [1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 
21 22 23 24 25 26 27 28 29 30 31 32
33 34 35 36 37 38 39 40 41 42...
>> res/2

== [24 20 19 28 30 18 45 28 19 24 25 23 36 13 17 29 28 26 17 28 23 
24 28 22 33 31 25 27 25 28 29 31 14
 24 20 21 30 20 29 26 19 26 2...
Steeve
18-May-2009
[13772x2]
Hmm... i think tally func cuold be speed up with the Tombon's trick 
(using UNIQUE)
like, this:

fast-tally: func [b [block!] /local c t u i v] [
	c: sort copy b
	u: unique c
	t: make block! length? u
	i: 1
	v: u/1
	foreach val next u [
		c: find c val
		t: insert t as-pair v negate i - i: index? c
		v: val
	]
	c: u: none
	head t
]

probe fast-tally [1 4 5 6 3 4  6 2 7 4 3 4]
== [1x1 2x1 3x2 4x4 5x1 6x2]
Maxim
18-May-2009
[13774]
steve... why not use block parsing ?  ;-D
Steeve
18-May-2009
[13775]
i'll kill you ;-)
Maxim
18-May-2009
[13776]
LOL
Steeve
18-May-2009
[13777]
hum, there is a bug, the last value (7) is not precessed, but u got 
the idea...
Maxim
18-May-2009
[13778]
steeve you've got to stop giving examples that don't work!   ;-P
Steeve
18-May-2009
[13779]
but they work... in my mind
Izkata
18-May-2009
[13780]
arraycount: func [Data /local MaxVal Vals][
    MaxVal: Data/1
    foreach Val Data [if Val > MaxVal [MaxVal: Val]]
    Vals: array/initial MaxVal 0
    foreach Val Data [Vals/:Val: Vals/:Val + 1]
    Vals
]

So, so far:
>> Data: array/initial 5000 does [random 200]

>> time [tally copy Data] 50
== 0:00:00.561255
>> time [fast-tally copy Data] 50
== 0:00:00.34082
>> time [count-occurences copy Data] 50 
== 0:00:00.610159
>> time [arraycount copy Data] 50      
== 0:00:00.250419
Steeve
18-May-2009
[13781]
Don't know if it's faster, but it's a bug free and consumes less 
memory.

fast-tally: func [b [block!] /local c u i] [
	c: sort copy b
	u: unique c
	i: 1
	forall u [
		c: any [find c u/2 tail c]
		u/1: as-pair u/1 negate i - i: index? c
	]
	c: none
	head u
]
Izkata
18-May-2009
[13782]
Looks mostly the same in speed:
>> time [fast-tally1 copy Data] 50
== 0:00:00.379225
>> time [fast-tally2 copy Data] 50
== 0:00:00.392215
Steeve
18-May-2009
[13783x3]
hum, i forgot, forall is slow within R2...
Should be a little better...

fast-tally: func [b [block!] /local c u i] [
	c: sort copy b
	u: unique c
	i: 1
	until [
		c: any [find c u/2 tail c]
		u/1: as-pair u/1 negate i - i: index? c
		tail? u: next u
	]
	c: none
	head u
]
izkata, in R2 maximum-of is a native function.
with that, your solution should be defintivly the fastest
Izkata
18-May-2009
[13786x2]
Sorting seems to be the big bottleneck:


arraycount with sorted data (using maximum-of, same results as sorted):
>> time [arraycount copy SDat] 50
== 0:00:00.172956
fast-tally with unsorted data:
>> time [fast-tally copy Data] 50
== 0:00:00.357852
fast-tally with the sort step removed, with pre-sorted data:
>> time [fast-tally copy SDat] 50 
== 0:00:00.103735
So which is best depends on what Tom is using it for, I guess
TomBon
19-May-2009
[13788]
this is great, very fast code. thanks a lot for this.
the code is used for generating dynamic parsing rules 
for financial market data. soon I will release a free
software based on this data.
Steeve
19-May-2009
[13789x2]
Seems that Tally is faster with R3.
2 reasons:
SORT is faster
MAXIMUM-OF is not anymore native 

(btw some of us claimed that it was a bad idea to have maximum-of 
 not anymore native in R3, But will claimed in the desert, so now 
we can see the drawback)
(Tally faster than radix, i meant)
BrianH
19-May-2009
[13791]
MAXIMUM-OF was rarely used, and is now mostly native rather than 
fully native. The majority of its processing is the loop, which is 
now native. MAXIMUM-OF and MINIMUM-OF are so rarely used that they 
might not make the mezzanine cut; they might be moved to R3/Plus.
Izkata
19-May-2009
[13792x2]
I would definitely prefer a native fold-type function over maximum-of 
and minimum-of
which I think was mentioned under the name accumulate?
BrianH
19-May-2009
[13794]
We tried, but that has definitely been declared an R3/Plus function, 
not native. Making ACCUMLATE? native wouldn't help much - the mezzanine 
version is really fast already. You won't be able to get functional 
language speed even from a native because that requires compiler 
support, and REBOL doesn't have a compiler at all.
Izkata
19-May-2009
[13795]
Okay, good to know.  Glad it's in, at least
BrianH
19-May-2009
[13796x2]
R3/Plus has a low barrier to entry, because it will be modular. People 
can just not include the modules they don't need.


Conversely, there will be a high standard for inclusion as mezzanine 
or built-in native code because you won't as easily be able to remove 
it if you don't need it. Many R2 mezzanines, and even some natives 
won't make the cut.
The plugin architecture will make it less necessary to include native 
code you don't need. R3/Plus lets us say yes without bloating the 
core. R3 is going to be much tighter than R2.
Graham
19-May-2009
[13798]
Such as ?  I'd had to have an cgi script running and find that functions 
I need aren't included.