World: r3wp

Join the discussions in the REBOL3 world...

[Core] Discuss core issues

older newer	first last
TomBon 18-May-2009 [13761]	what is the fastest way to count the occurence of each numbers within a block containing 5000+ numbers? e.g [1 3 6 2 4 9 33 6 67 2 12 23 34 12 2 4 56 2 ...] calculating the max and min with a simple loop and counter is very time consuming for thousends of these blocks. inserting into mysql and doing a 'group by' also. any ideas for a faster solution?
Henrik 18-May-2009 [13762x2]	what if you sort the block first?
Henrik 18-May-2009 [13762x2]	then FIND on the first occurrence of 1, of 2, of 3, of 4, etc. and store the indexes.
TomBon 18-May-2009 [13764x3]	yes henrik, nice idea in this moment I was thinking similar but with copy/part but storing the index is much faster.
	the numbers for the FIND can be supplied from a copy of the block with UNIQE
	yep, will give it a try, thx henrik...
Sunanda 18-May-2009 [13767]	I usually use Joel's tally function -- it saves me the time needed to write a faster function: http://www.rebol.org/ml-display-message.r?m=rmlKDDS
Izkata 18-May-2009 [13768x2]	I think the fastest would be a variation of radix sort, if you know the max/min.. Setup for example: >> Data: array/initial 5000 does [random 200] == [36 119 148 112 87 59 176 21 95 138 183 28 1 119 14 74 39 78 141 99 56 71 47 174 92 81 157 182 122 123 98 73 35 170 150 11 183 8... >> MaxVal: fold Data 0 :max ;Just use a simple loop, I defined fold long ago for simplicity == 200 What I mean: >> Vals: array/initial MaxVal 0 == [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0... >> foreach Val Data [Vals/:Val: Vals/:Val + 1] == 27 >> repeat I MaxVal [if 0 <> Vals/:I [print rejoin [I {, } Vals/:I { times}]]] 1, 24 times 2, 20 times .......... 199, 20 times 200, 16 times
Izkata 18-May-2009 [13768x2]	Tally looks nice, though, and not as memory-intensive
Dockimbel 18-May-2009 [13770x2]	Here's one, maybe not fully optimized, but should be fast enough : count-occurences: func [data [block!] /local keys count idx][ keys: make hash! unique sort data count: array/initial length? keys 0 parse data [ some [ set v skip ( idx: index? find keys v poke count idx 1 + count/:idx ) ] ] reduce [keys count] ]
Dockimbel 18-May-2009 [13770x2]	>> data: array/initial 5000 does [random 200] == [36 119 148 112 87 59 176 21 95 138 183 28 1 119 14 74 39 78 141 99 56 71 47 174 92 81 157 182 122 123 98 73 35 170 150 11 183 8... >> t: now/time/precise == 22:01:04.156 >> res: count-occurences data == [make hash! [1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 4... >> now/time/precise - t 0:00:00.031 >> res/1 == make hash! [1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42... >> res/2 == [24 20 19 28 30 18 45 28 19 24 25 23 36 13 17 29 28 26 17 28 23 24 28 22 33 31 25 27 25 28 29 31 14 24 20 21 30 20 29 26 19 26 2...
Steeve 18-May-2009 [13772x2]	Hmm... i think tally func cuold be speed up with the Tombon's trick (using UNIQUE)
Steeve 18-May-2009 [13772x2]	like, this: fast-tally: func [b [block!] /local c t u i v] [ c: sort copy b u: unique c t: make block! length? u i: 1 v: u/1 foreach val next u [ c: find c val t: insert t as-pair v negate i - i: index? c v: val ] c: u: none head t ] probe fast-tally [1 4 5 6 3 4 6 2 7 4 3 4] == [1x1 2x1 3x2 4x4 5x1 6x2]
Maxim 18-May-2009 [13774]	steve... why not use block parsing ? ;-D
Steeve 18-May-2009 [13775]	i'll kill you ;-)
Maxim 18-May-2009 [13776]	LOL
Steeve 18-May-2009 [13777]	hum, there is a bug, the last value (7) is not precessed, but u got the idea...
Maxim 18-May-2009 [13778]	steeve you've got to stop giving examples that don't work! ;-P
Steeve 18-May-2009 [13779]	but they work... in my mind
Izkata 18-May-2009 [13780]	arraycount: func [Data /local MaxVal Vals][ MaxVal: Data/1 foreach Val Data [if Val > MaxVal [MaxVal: Val]] Vals: array/initial MaxVal 0 foreach Val Data [Vals/:Val: Vals/:Val + 1] Vals ] So, so far: >> Data: array/initial 5000 does [random 200] >> time [tally copy Data] 50 == 0:00:00.561255 >> time [fast-tally copy Data] 50 == 0:00:00.34082 >> time [count-occurences copy Data] 50 == 0:00:00.610159 >> time [arraycount copy Data] 50 == 0:00:00.250419
Steeve 18-May-2009 [13781]	Don't know if it's faster, but it's a bug free and consumes less memory. fast-tally: func [b [block!] /local c u i] [ c: sort copy b u: unique c i: 1 forall u [ c: any [find c u/2 tail c] u/1: as-pair u/1 negate i - i: index? c ] c: none head u ]
Izkata 18-May-2009 [13782]	Looks mostly the same in speed: >> time [fast-tally1 copy Data] 50 == 0:00:00.379225 >> time [fast-tally2 copy Data] 50 == 0:00:00.392215
Steeve 18-May-2009 [13783x3]	hum, i forgot, forall is slow within R2...
	Should be a little better... fast-tally: func [b [block!] /local c u i] [ c: sort copy b u: unique c i: 1 until [ c: any [find c u/2 tail c] u/1: as-pair u/1 negate i - i: index? c tail? u: next u ] c: none head u ]
	izkata, in R2 maximum-of is a native function. with that, your solution should be defintivly the fastest
Izkata 18-May-2009 [13786x2]	Sorting seems to be the big bottleneck: arraycount with sorted data (using maximum-of, same results as sorted): >> time [arraycount copy SDat] 50 == 0:00:00.172956 fast-tally with unsorted data: >> time [fast-tally copy Data] 50 == 0:00:00.357852 fast-tally with the sort step removed, with pre-sorted data: >> time [fast-tally copy SDat] 50 == 0:00:00.103735
Izkata 18-May-2009 [13786x2]	So which is best depends on what Tom is using it for, I guess
TomBon 19-May-2009 [13788]	this is great, very fast code. thanks a lot for this. the code is used for generating dynamic parsing rules for financial market data. soon I will release a free software based on this data.
Steeve 19-May-2009 [13789x2]	Seems that Tally is faster with R3. 2 reasons: SORT is faster MAXIMUM-OF is not anymore native (btw some of us claimed that it was a bad idea to have maximum-of not anymore native in R3, But will claimed in the desert, so now we can see the drawback)
Steeve 19-May-2009 [13789x2]	(Tally faster than radix, i meant)
BrianH 19-May-2009 [13791]	MAXIMUM-OF was rarely used, and is now mostly native rather than fully native. The majority of its processing is the loop, which is now native. MAXIMUM-OF and MINIMUM-OF are so rarely used that they might not make the mezzanine cut; they might be moved to R3/Plus.
Izkata 19-May-2009 [13792x2]	I would definitely prefer a native fold-type function over maximum-of and minimum-of
Izkata 19-May-2009 [13792x2]	which I think was mentioned under the name accumulate?
BrianH 19-May-2009 [13794]	We tried, but that has definitely been declared an R3/Plus function, not native. Making ACCUMLATE? native wouldn't help much - the mezzanine version is really fast already. You won't be able to get functional language speed even from a native because that requires compiler support, and REBOL doesn't have a compiler at all.
Izkata 19-May-2009 [13795]	Okay, good to know. Glad it's in, at least
BrianH 19-May-2009 [13796x2]	R3/Plus has a low barrier to entry, because it will be modular. People can just not include the modules they don't need. Conversely, there will be a high standard for inclusion as mezzanine or built-in native code because you won't as easily be able to remove it if you don't need it. Many R2 mezzanines, and even some natives won't make the cut.
BrianH 19-May-2009 [13796x2]	The plugin architecture will make it less necessary to include native code you don't need. R3/Plus lets us say yes without bloating the core. R3 is going to be much tighter than R2.
Graham 19-May-2009 [13798]	Such as ? I'd had to have an cgi script running and find that functions I need aren't included.
BrianH 19-May-2009 [13799]	Use the needs header to load the modules you need and you can be sure they'll be included. As for which functions will be mezzanine, that is still in question for many functions. If you want to make sure that your favorites are there, oparticipate in the discussion.
Graham 19-May-2009 [13800x2]	page: read/custom [ scheme: 'http host: "twitter.com" target: "direct_messages/new.xml" user: "my-twitter-id" pass: "mypassword" ] [ POST "text=This was also sent from a Rebol&user=synapse_emr" ] This sends a private tweet to a user ... not clear from the API docs what the call is to just tweet ... anyone know?
Graham 19-May-2009 [13800x2]	ooops ... wrong group.
Maxim 19-May-2009 [13802]	this is easy to figure out using firebug ;-) redirect the html page to a server you have cheyenne running and save out the whole http request, you will have url and post data :-)
Graham 19-May-2009 [13803x2]	just reinstall wireshark
Graham 19-May-2009 [13803x2]	But there is a twitter restful api ... just can't find it there. Must be blind.
Janko 19-May-2009 [13805]	too bad accumulate is not in the core, I totally ditched python for any further work because guido v.r. removed the basic (quite poor) functional constructs it had
Steeve 19-May-2009 [13806]	If that so, most of existing mezzanines should be fired into external modules. Because i use mezzanines only to test some ideas. But if have to do the "real work", then i use only natives. Don't ask why, it's obvious.
Graham 19-May-2009 [13807]	Don''t you end up then writing your own mezzanines?
Janko 19-May-2009 [13808]	to me one of great beauties of rebol is that all stuff are expressions like "either" and by so composable .. and that you don't need explicit return statements.. two of big features that I think aid more functional (not imperative) design of programs...
Steeve 19-May-2009 [13809x2]	but they they have to be strictly adapted for their purposes, so i remove code related to needless use cases.
Steeve 19-May-2009 [13809x2]	speed is a priority especially with Rebol which can be really slow if you don't take care
older newer	first last