r3wp [groups: 83 posts: 189283]
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

World: r3wp

[Core] Discuss core issues

Maxim
7-Feb-2012
[2783x4]
I've been working a lot lately, and haven't had a lot of spare time. 
 I'm actually working with REBOL full time at a company which is 
using it to get a significant competitive advantage over the competition.
(eek that was redundant, sorry ;-)
I think people don't realize just how much power lies in parse.  
 Even I'm impressed with it right now.  I've been doing tests with 
really crazy stuff like two-cursor parse rules and run-time auto-recompilation 
of 400MB parse rules. 


 I've been doing things like parsing 100MB word documents and pushing 
 the interpreter to the limit ... reaching the 32-bit 1.6 GB RAM limit, 
 6 hour loop tests, etc. :-)
in my last test I was doing natural language extraction of concepts 
at a rate of 25000 words a second within multi-megabyte text files. 
 :-)
GrahamC
8-Feb-2012
[2787]
Who is this guy?
Maxim
8-Feb-2012
[2788]
hum... either that was sarcasm or you mean, what is the company I 
now work for?
GrahamC
8-Feb-2012
[2789]
Not sarcasm .. just humor :)
Maxim
8-Feb-2012
[2790]
hehe, I wasn't sure  ;-)
james_nak
8-Feb-2012
[2791]
That's incredible Maxim. Good work. With what you do with parse, 
is the knowledge available online  in tthe form of the present parse 
documentation, or did you have to discover new techniques? I have 
to admit I just barely use it when I need to. Anyway, thanks for 
sharing your experience. I
Maxim
8-Feb-2012
[2792]
learning parse requires baby steps and at some point, the decision 
to solve a real problem with it and force yourself to learn it.  
I didn't use parse for almost a decade until I started using it more 
and more to a point that currently I do more parse than any other 
coding in REBOL (but that's just because its idealy suited for this).


some little tricks accumulate with experience and eventually, we 
discover pretty wacky things, which allow us to use parse almost 
like a VM.
Oldes
8-Feb-2012
[2793]
Parse is REBOL's heart... I cannot imagine living without it.
Pekr
9-Feb-2012
[2794]
REBOL parse is a gem, a treasure to follow. Me, the coding lamer, 
did few things using it. Guys coding C++ first came meh, well, interpreter. 
Then  - how is it possible it is faster than C++ app? Later on, they 
came with new requests asking - well, you know, you have that parser, 
we need to do following stuff ...
ddharing
9-Feb-2012
[2795]
Well said, Oldes.
james_nak
9-Feb-2012
[2796]
Guys, with all this said (and I agree), perhaps this is the one things 
that needs to be the focal point for Rebol and eventually the #Not 
Rebol languages.  I know there are some tutorials out there but do 
any of them do justice to parse? I keep going back to the Codeconscious 
one: http://www.codeconscious.com/rebol/parse-tutorial.htmland 
the ones at reboltutorial, but there doesn't seem to be a lot considering 
how much one can do with it.
Maxim
9-Feb-2012
[2797]
I learnt parse using the 2.3 rebol core guide...  I thought it did 
a pretty good job of launching one in the good direction.   parse 
HAS evolved since then, but for the basic semantics and principles 
of parsing I think its pretty good.

you can also look at this tutorial by Nick Antonaccio:
http://musiclessonz.com/rebol_tutorial.html#section-9.3


IIRC nick has a good sense of tutoring, so it may be a good first 
step... he also gives links to other parse resources at the end of 
that part of his (short) tutorial
Pekr
9-Feb-2012
[2798]
Max - are you using R2 parse, or R3 enhanced one?
Maxim
9-Feb-2012
[2799]
R2.   


since we compile just about all the rules from other datasets and 
simplified user-data, the R3 advantage is much less significant (because 
we can simulate all the R3 improvements by using R2 idoms, though 
its sometimes tricky).


Using R3, it probably would be a few percent faster since some of 
the rules we have would be simpler and those tricks would be managed 
natively by parse rather than by *more* parse rules.
james_nak
9-Feb-2012
[2800]
Thanks Maxim. I appreciate the info.
Maxim
9-Feb-2012
[2801x2]
The problem with R3 right now is that it isn't yet compiled in 64-bits 
we still have the 1.6GB RAM limit for a process which is the biggest 
issue right now.   I have blown that limit a few times already, so 
it makes things a bit more complex and it doesn't allow me to fully 
optimize speed by using more pre-generated tables and unfolded state 
rules.
Our datasets are huge and we optimise for performance by unfolding 
and indexing a lot of stuff into rules... for example instead of 
parsing by a list of words, I parse by a hierarchical tree of characters. 
 its much faster since the speed is linear to the length of the word 
instead of to the number of items in the table. i.e.  the typical 
 O*n   vs.   O*O*n  type of scenario .  just switching to parse already 
was 10 times faster than using  hash! tables and using find on them.... 


In the end, we had a 100 time speed improvement from before parse 
to compiled parse datasets.  this means going from 30 minutes to 
less than 20 seconds....but this comes at a huge cost in RAM... a 
400MB Overhead to be precise.
ddharing
9-Feb-2012
[2803x2]
Memory is cheap. It's the 32-bit limit that is the real problem -- 
as you stated.
I'm confused. Why is REBOL limited to 1.6GB? I've seen that myself 
too, but that is nowhere near the 4GB limit.
Maxim
9-Feb-2012
[2805x3]
yeah...  I've got a server that has 64GB of RAM  I want to use it 
 !!!   :-)
its the MS windows limit.   it can only address 1.6GB of memory in 
32-bit mode.
it may be higher on linux, I've never tested it.
ddharing
9-Feb-2012
[2808]
I see. What about Linux?
Maxim
9-Feb-2012
[2809x2]
(btw that 1.6GB limit used to be a real problem when I was doing 
3D stuff...  3D animation apps are memory hogs, and in some cases, 
we could only work 15 minutes before high-end apps would crash.

which is a problem when a 3D scene takes 30 minutes to save to disk 
over the network  ;-)
can anyone explain a single use for this R2 path conversion?

>> to-string first [path/item]
== "pathitem"


I know I can use mold... it's just that I wonder why to-string doesn't 
use the molded string equivalent as well?
Oldes
9-Feb-2012
[2811]
funny.. I was thinking about it today as well.. but I don't know
Sunanda
9-Feb-2012
[2812]
Something inconsistent in the way paths are handled:
    to-string load "path/item"
    == "pathitem"
     to-string to-path "path/item"
     "path/item"
Steeve
9-Feb-2012
[2813]
You can use FORM as well.

And having alternatives should not be something to complain about. 
:)
Maxim
9-Feb-2012
[2814]
I'm not complaining, I just find absolutely no use-case for the default 
  :-D


my question was can anyone give me a reason for the current use of 
to-string?  

can you?  ;-P
Ladislav
9-Feb-2012
[2815x2]
What is "O*O*n"?
Use case for default:

to-string [1 "." 2] ; == "1.2"
Maxim
9-Feb-2012
[2817x2]
O*O*n
  == a typo  :-)

I guess I really meant  something like O(n*n) 


Its the kind of dramatic  linear vs logarithmic scaling difference 
when we unfold our datasets into parse.


but its not exactly that kind of scaling, since the average topology 
of the sort tree will have a lot of impact on the end-result.  for 
example in my system, when I try to index more than the first 5 characters, 
the speed gain is so insignificant that the ratio is quickly skewed, 
when compared to the difference which the first 3 letters give.  


Its 100% related to the actual dataset I use.  in some, going past 
2 is already almost useless, in others I will have to go beyond 5 
for sure.  in some other datasets we unfold them using hand-picked 
algorythms per branch of data, and others its a pure, brute force 
huge RAM gobler.
I always love when I realize that I write things like this in Rebol:

-*&*&*&*-:  "a pretty impossible to guess variable name :-)"
Steeve
9-Feb-2012
[2819x2]
Max, although I think you're comparing O(1) vs O(n) parsing algorithms 
(random access vs linear)

(The indexing part is probably meant to be O(n.log n) because it 
involves sorting data, but should be taken apart from the parsing 
cost)

just wandering around, uhuh
Anyway O(n*n) is by far too dramatic ;-)
Pekr
10-Feb-2012
[2821x2]
where should I put DLLs, in order for REBOL to find them? I mean 
- I have one DLL, which is dependant on some other. Even if I put 
that DLL into the same directory, it complaints it can't find it. 
Win Vista here ...
or should I register them somehow using regsvr or something like 
that?
Oldes
10-Feb-2012
[2823]
I don't know how it's on Vista, but on W7 or XP you can place it 
anywhere... I today updated my old zlib script to do late initialisation, 
you can find it here: https://github.com/Oldes/rs/tree/88291b8c720e9026978a080ca40100c3f2fb780f/projects-dll/zlib/latest
Endo
10-Feb-2012
[2824]
Pekr: Registration (regsvr) is required only if they are ActiveX 
DLLs, but I think they are not because you cannot use ActiveX DLLs 
in REBOL.

Normally they should be somewhere in your PATH. Try to see what's 
happening with FileMon tool from Systeminternals.com.
Maxim
10-Feb-2012
[2825]
it also looks in the current-dir... but that path will depend of 
how you launched rebol.


use WHAT-DIR just before you try to load your dll  to know where 
the current-dir is at that time and put your dll there.


you can also add a path in the user or system path environment and 
place the dll there.
Pekr
11-Feb-2012
[2826x3]
I'll continue here for now, as /library is now a free part of Core, 
and DLL.SO is not web-public.
My observation is, that if there are one or more dependant DLLs, 
REBOL will load first one, but then the path is somehow not taking 
into account a present directory. Here's few pointns:

- you can't do: do %my-dir/my-dll-script.r
- nor you can do so after: change-dir 


But it works, when you launch REBOL from the directory where those 
DLLs are present.
There is several various paths in R2 structure, dunno if it is just 
weird R2 implementation, or OS level natural functionality ...
PeterWood
11-Feb-2012
[2829]
/library is not a free part of Core only View.
Geomol
17-Feb-2012
[2830x3]
If datatypes equals words, like word! = 'word!, then maybe the refinement 
in type?/word isn't needed? But what are the consequences? The next 
two examples would return the same:

>> find [integer! 42] integer!
== [42]
>> find [integer! 42] 'integer!
== [integer! 42]


I came to think of this, because I find myself writing things like 
the following all the time now:

	either find [block! paren!] type?/word value [ ...
and
	switch type?/word value [ ...


If datatypes equals words, only type? without the refinement would 
be needed.
I know, I today can write things like


 either find [#[datatype! block!] #[datatype! paren!]] type? value 
 [ ...


but I don't do that, because it has too much syntax for my taste, 
and therefore isn't very readable.
Maybe the question should be put the other way around: Are there 
cases (in real scripts), where it would be a problem, if datatypes 
equals words?