r3wp [groups: 83 posts: 189283]
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

World: r3wp

[Core] Discuss core issues

Andreas
9-Nov-2011
[2590x2]
A bitset is just a collection of N bits which are either set or cleared.
A charset is a 256-element bitset, as R2 recognises 256 distinct 
characters.
Geomol
9-Nov-2011
[2592]
I know the benefit of bitsets in parsing. For what use are other 
kinds of bitsets?
Andreas
9-Nov-2011
[2593]
Why isn't the length of a char one?


Because characters are no series datatype and therefore it does not 
make much sense to speak or their length.


That's about the same as asking "Why isn't the lenght of an integer 
one?" or "Why isn't the length of a logic! one?"
Geomol
9-Nov-2011
[2594]
Nah, I don't fully agree on that. Some code (as my string parse example) 
would benefit, if we could ask the length of a char.
Andreas
9-Nov-2011
[2595]
And some code might benefit if we could ask the length of an integer.
Geomol
9-Nov-2011
[2596]
:)
Andreas
9-Nov-2011
[2597x2]
http://www.rebol.com/r3/docs/datatypes/bitset.html(for R3)
http://www.rebol.net/wiki/Bitsets(also for R3)
(Bitsets are vastly more useful in R3.)
Sunanda
9-Nov-2011
[2599]
<Other uses of bitsets>
One use is conformity validation. eg:
    allowed-letters: charset [#"a" - #"e"]
    find allowed-letters "aaadddeeebbb"
    == true
    find allowed-letters "aaadddeeebbbx"
    == none
Geomol
9-Nov-2011
[2600]
Yes, but that's the charset kind of bitset. I guess, other kinds 
of bitsets (other lengths than 256) can be used just to hold a bunch 
of flags or something. I haven't seen them used like that.
BrianH
9-Nov-2011
[2601x2]
Bitsets are useful for intexes and certain kinds of parsing (LL first 
and follow sets, not just character sets). Other stuff too.
intexes -> indexes
Gabriele
10-Nov-2011
[2603x2]
Geomol, because of Unicode, "charset kind of bitsets" need more than 
256 bits.
But, there are so many other ways to use them. For example, to implement 
a bloom filter: http://en.wikipedia.org/wiki/Bloom_filter
Oldes
10-Nov-2011
[2605]
I can imagine length? on char! value in unicode context - it could 
return number of bytes needed to store the char with the utf-8 encoding:) 
But I'm sure I can live without it. It would just add overheat to 
the length! action.
Geomol
10-Nov-2011
[2606]
It probably won't add overhead to length?, because that function 
already works with different datatypes, and char! is just another 
one in the switch already there.
BrianH
10-Nov-2011
[2607]
Oldes, the char! type in R3 refers to a Unicode codepoint, not a 
character. So, length still doesn't apply.
Ladislav
11-Nov-2011
[2608]
I want to share with you an "interoperability problem" I encountered. 
In Windows (at least in not too old versions) there are two versions 
of string-handling functions:

- ANSI (in fact using a codepage for latin charset)
- widechar (in fact UNICODE, restricted to 16 bits, I think)


It looks, that Apple OS X "prefers" to use decomposed UNICODE, also 
known as UTF-8MAC, I guess. That means, that it e.g. for a Robert's 
file it generates a filename looking (in a transcription) as follows:

%"Mu^(combining-umlaut)nch.r"

As far as the UNICODE goes, this is canonically equivalent to

%"M^(u-with-umlaut)nch.r"

, but:


- Windows don't consider these file names equivalent, i.e. you can 
have both in one directory

- When using the former, the ANSI versions of Windows system functions 
"translate" the name to: %"Mu^(umlaut)nch.r"

-- the %"Mu^(umlaut)nch.r" is a third file name, distinct from both 
of the above, so, if the R2 reads it in a directory, it is unable 
to open it
Gabriele
12-Nov-2011
[2609]
Linux also would not consider them equivalent. I understand why Mac 
OS is always canonicizing file names, but, they chose the most stupid 
way to do it, and it's a pain in the ass most of the time. In the 
end, I prefer Linux where you can end up with two files named in 
a way that looks exactly the same, but that has a file system that 
behaves in a predictable way.
amacleod
15-Nov-2011
[2610]
why does this not work:


unless any [exists? file1 exists? file2 exists? file3][print "missing 
file"]

but this does: 



 if any [not exists? file1 not exists? file2 not exists? file3][print 
 "missing file"]
Henrik
15-Nov-2011
[2611]
you would need an ALL instead of ANY in the first one, if it is to 
behave like the bottom one.
amacleod
15-Nov-2011
[2612]
right.....
amacleod
30-Nov-2011
[2613]
I'm trying to reach a time server but having trouble.

I can get a time from my rebol based time server on my server with 
"read daytime://myserver.com"

but if I use it for any of the well known online servers I get :
>> read daytime://time-b.nist.gov
== ""
>> read daytime://nist1-ny.ustiming.org
== ""


sometimes it seems to work but more often than not I get an empty 
string
BrianH
30-Nov-2011
[2614]
There are a few different time protocols, and the standard time servers 
don't tend to run the daytime protocol. They usually run NTP.
Pavel
30-Nov-2011
[2615x2]
2 amacleod: time protocol is not very accurate, the same levely of 
accuracy you can get by reading any HTML size and distile the time 
from HTML header. OTOH NTP protocol is able to get milisecond accuracy 
but by quite difficult handshake and as far as I know not yet written 
for rebol
size = site
amacleod
30-Nov-2011
[2617]
daytime://nist1.aol-va.symmetricom.com seems to work resonably well

I dont need to be too accurate just +- 5 seconds....


my server's time seems to be drifting. I've seen this before on an 
another computer...does a bad battery affect the time even if the 
power remains on?
BrianH
30-Nov-2011
[2618x2]
It might be better to enable your server's native time sync services. 
Windows, Linux and OSX all have such services, as do many more OSes.
The hardware clocks of many computers are rather inaccurate, though 
they're getting better.. They expect you to enabe the time sync services.
amacleod
30-Nov-2011
[2620]
It's enabled but does not sync as fast as the drift and I'm having 
trouble doing it manually...keep getting an error and I tried several 
time servers.
Henrik
7-Dec-2011
[2621]
if I have a file with url-encoded chars in it, what's the fastest 
way to decode them?
Dockimbel
7-Dec-2011
[2622]
dehex
Henrik
7-Dec-2011
[2623]
Thank you very much. I forgot that.
Henrik
10-Dec-2011
[2624]
what is the fastest way to find the number of digits in a number, 
if you want to use it to calculate the pixel width of the number 
for a table column? simply using:

length? form number

?
Geomol
10-Dec-2011
[2625x2]
My guess would be:

1 + to integer! log-10 number

, but that's slightly slower than yours, it seems.
Can your number both be integer and decimal?
Henrik
10-Dec-2011
[2627]
it's most likely to be a decimal
Geomol
10-Dec-2011
[2628]
Then mine won't work.
Henrik
10-Dec-2011
[2629]
I may have found another way without needing to work on the length. 
Thanks, anyway.
Geomol
10-Dec-2011
[2630]
:)
Ladislav
12-Dec-2011
[2631]
but that's slightly slower than yours, it seems
 - strange, here it looks much faster than length? form
Geomol
12-Dec-2011
[2632]
>> number: 1234                                       
== 1234
>> time [loop 1000000 [1 + to integer! log-10 number]]
== 0:00:00.293239
>> time [loop 1000000 [length? form number]]          
== 0:00:00.28022

On R2 version 2.7.7.2.5
Sunanda
12-Dec-2011
[2633]
I see the log-10 loop as around 18% faster under core 2.7.8.

So this optimisation depends, it seems, on specific machines and 
versions.
Kaj
12-Dec-2011
[2634]
It would depend on the relative speeds of the CPU and the FPU
Ashley
14-Dec-2011
[2635]
Ran code like the following against a 600MB file of 500,000+ lines:

	file: read/lines %log.csv
	foreach line file [
		remove/part line 1000
	]


and was surprised that the foreach loop was near instantaneous (I'd 
assumed 500,000+ removes would at least take a second or two). I'm 
not complaining, just curious. ;)
Geomol
14-Dec-2011
[2636]
Does it work? Or do you hit a 4GB boundary with some internal structures, 
so there is an error, you don't get out?
Ashley
14-Dec-2011
[2637]
It certainly worked (file size was 400MB not 600MB).
BrianH
14-Dec-2011
[2638]
R2 or R3? R3 can do removes from the head of the string as fast as 
it can from the tail - near instantaneously. I don't think that this 
is true for R2.
Ashley
15-Dec-2011
[2639]
R2.