r3wp [groups: 83 posts: 189283]
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

World: r3wp

[Script Library] REBOL.org: Script library and Mailing list archive

Sunanda
14-Mar-2009
[777]
No actual stats. Just from feel:
* Scripts -- very few
* Posts on the ML -- a few dozen
* AltME archive -- no idea
Gabriele
15-Mar-2009
[778]
Sunanda, I can tell you where does chars come from. if your page 
is set as utf-8, then the script as been uploaded by the browser 
as utf-8. when you view it in the brower, it shows correctly as utf-8. 
when you download it, it is still utf-8, but if you view it with 
something that believes it's latin1 (eg. the rebol 2 console on windows 
set as latin1), it won't show up correctly.
Anton
15-Mar-2009
[779x6]
Sunanda, you're right about that ascii-math.r file. When I clicked 
the [Download script] link, the browser (konqueror) downloaded and 
directly opened it with the editor (SciTE). SciTE thought it was 
8-bit ascii, and showed the characters incorrectly. All I had to 
do was change the file encoding from 8-bit to utf-8 and the characters 
appeared correctly. I guess the editor had no way of determining 
the encoding, and incorrectly guessed 8-bit ascii.
The view-script.r html source for the page correctly advertises the 
encoding as utf-8, so the browser shows it correctly.
So I'm pretty happy with the way that script was handled by the software 
here.
Except for R2 console, of course.
R3 console seems to handle it better.
Any other scripts you can find showing problems ?
Sunanda
16-Mar-2009
[785x2]
Thanks Gabriele -- that's a clear explanation, and has helped me 
work out what is going on.


Anton and Gabriele -- I have tried changing the charset we emit on 
the download to say UTF-8. But that makes little difference. As both 
of you note, once the file has been saved then (without a MAC-type 
resource fork) there is no obvious indication of the encoding. And 
several editors I have tried get it wrong -- thus "revealing" the 
extra ASCII chars.


Not sure what the solution is other than to de-UTF-8 files on download.
Anton -- not yet run a crawl to check for other scripts with high 
ascii chars.
Anton
16-Mar-2009
[787x3]
Which editors?

I think most editors these days allow manually changing the encoding, 
so developers who notice strange characters can just change it themselves.

Maybe it would be helpful to add a rebol.org library script header 
advertising the encoding (when it is known, and when not).

I don't recommend 'de-UTF-8'ing files on download - that's just going 
to confuse things more, especially when the file is view-script.r'd 
as utf-8 just beforehand.
It seems the responsibility lies with the clients to interpret encodings 
properly. As we move to a unicode world, software assuming 8-bit 
encodings are some ASCII encoding should drop off. But until the 
transition is complete, there's not much we can do about client software 
guessing wrong like that, except stating the encoding in the script 
header, in the web page that provides the download link, and by helping 
confused newbies.
Are rebol.org uploaders asked to declare the encoding used?
swall
16-Mar-2009
[790]
If the offending downloaded script is executed in Rebol/Core, the 
extra ASCII chars are also present in the executed code.  The script 
defines ½ to be 0.5. If "help ½" is typed into the console, the result 
is "Found these words:   ½              decimal!  0.5". However, 
if the script is executed in Rebol/View, the result is "½ is a decimal 
of value: 0.5". It seems that View handles it correctly, while Core 
doesn't.
Sunanda
16-Mar-2009
[791]
Thanks guys.
Other scripts with the same problem.....there are a couple. 

About 10% of all scripts have at least one extended ASCII char....But 
most of them are acceptable in LATIN-1 code page / charset (eg copyright 
symbol, some accented letters). It's just a very few scripts that 
use 1/4 and similar symbols that cause the problem.


What other editors? Windows NOTEPAD is one example of a common one 
that gets this wrong.
swall
16-Mar-2009
[792]
Vim and Editor² display the chars incorrectly.
Notepad++ shows the chars correctly.
Sunanda
16-Mar-2009
[793]
Of the various editors / word processors I have immediately to hand:

-- credit.exe -- [my usual editor] shows incorrect chars, and has 
no option to switch to UTF-8

-- open office writer -- works fine if you take the UTF-8 option 
when asked
-- ms word -- claims file is corrupt
-- word perfect -- makes a complete mess

-- R2/View's built in editor ( editor %/c/path to my local copy//ascii-math.r) 
-- shows incorrect chars
Anton
17-Mar-2009
[794x2]
Vim supports unicode and on my system shows the characters correctly.
Ok, so there are some editors which don't support unicode, don't 
guess encoding correctly, or can change encoding only with difficulty.

How about this suggestion; if a rebol.org script is known to be UTF-8, 
then an additional link should appear:

[Download as ASCII] download-a-script?script-name=ascii-math.r&encoded-as=8-bit-ascii
which transcodes a UTF-8 file to ASCII.
Just have to get a conversion function in place for this to work.
Gabriele
17-Mar-2009
[796x2]
Sunanda: given that R2 uses the host current code page, I think the 
best way would be for the user to convert the script after downloading 
it. On Linux or Mac for eg, UTF-8 is perfect for Core scripts as 
the terminal is UTF-8. On Windows or for View scripts, you'll get 
the host code page displayed anyway, so the user has to do the conversion. 
A tool to do that automatically would be nice (I have the code, it 
will be released soon, but you may need to wait a couple weeks more).
All these troubles go away with R3... but I think it would be nice 
if R2 recognized UTF-8 and converted it on the fly; we could add 
a BOM at the beginning to make that easier.
Chris
17-Mar-2009
[798]
http://en.wikipedia.org/wiki/Byte_Order_Mark#cite_note-0
swall
17-Mar-2009
[799x2]
Anton: you're right Vim does display the file correctly, although 
not by default. I guess it helps when you read the manual. :-)
Gabriele: Where is the host code page set?
On Windows, is it set differently for View and Core?

Is that why the downloaded script works as expected in View but not 
in Core?
Anton
17-Mar-2009
[801x2]
Yes, use of BOM has its own troubles. I don't think it's a good idea.
swall, yes, strange, I can't remember configuring vim for utf-8 (I 
don't use it regularly), but it displayed correctly straight away 
for me. Must be some dark config option or something...
Sunanda
17-Mar-2009
[803]
Thanks everyone.

I think our first step is to add a warning to any download for scripts 
that contain UTF-8 chars.

So, for that I need a function:

     utf-8?: func [data [string!] [ ...]   ; returns true or false [and 
     perhaps "not sure" in ambiguous cases]

I've done the easy part :-)
Can anyone help with the difficult "..." part ?


It is not as simple as just looking for ASCII > 128 .... some high 
ASCII is acceptable as part of, say, ISO 8859-1
PeterWood
17-Mar-2009
[804x2]
I have a function which finds utf-8 multi byte character sequences 
in a string. Given the code ranges for mulit-byte characters, it 
would be rare to find such a sequence accidentally.
It's about 65 lines so rather than post it here I will email you 
a copy.
Sunanda
18-Mar-2009
[806]
Thanks.
Anton
18-Mar-2009
[807]
Ok, so things seem to be proceeding well. The rebol.org Library's 
support for utf-8 was actually stronger than thought, and what're 
being added are functions to help deal with legacy client apps which 
misidentify the file encoding.
PeterWood
18-Mar-2009
[808]
It's not just legacy client apps unless you consider all Rebol/View 
scripts as legacy apps.
Anton
18-Mar-2009
[809x2]
Yes, I do.
I understand what you mean, and obviously the definition of "legacy" 
is a bit fuzzy.
Sunanda
18-Mar-2009
[811]
Using Peter's code (thanks again!), I've made two changes to the 
download-a-script link:


1. if we find UTF-8 chars in a script, we download it with the HTTP 
content type charset=utf-8


But that probably makes no practical difference.  A downloaded script 
will be saved by the browser, and then opened by a text editor. The 
text editor is unlikey to be passed the charset setting. So:


2. Scripts with UTF-8 encoding are downloaded with a few lines of 
comment at their top. The comment explains the possible problem.


Thanks to all for the comments and help with getting things this 
far.
Anton
19-Mar-2009
[812x2]
Sundanda, good job, I was hoping you'd do that, and you did.
I'm interested in the utf-8 detection function. Can it be published?
Maxim
20-Mar-2009
[814x2]
sunanda:  I have a feature proposal for you  :-)


it would be nice to be able to supply a single picture to link with 
the scripts. this image (jpg, png, gif) would have hefty size limitation 
and I think only one image per script should be enough, but having 
this alongside the various listings of the application and within 
searches, new scripts, etc would be really cool.


sometimes, if you see a thumbnail (ui grab, console example, logo, 
output gfx, whatever), it will help raise people's curiosity.  this 
could probably benefit quite a few scripts, which are possibly overlooked.


having a simple search filter of scripts with pics, could also help 
people to quickly find usefull things at a glance.


what do you think?  it could start out really simple, and slowly 
thumbnails could creep into various listings of scripts.
maybe, we could eventually have more than one picture, like pics 
which are specifically tagged as gui screenshots, for example.
Sunanda
20-Mar-2009
[816]
Nice idea, thanks ..... Let me think about it.
Maxim
20-Mar-2009
[817]
cool, let me know if you want to test it, I'll be happy to supply 
imgs for my scripts.
Alan
22-Mar-2009
[818]
re:pictures. Some but not all scripts also have docs, so that might 
be a good place to add them.For those that don't,a clickable small 
thumbnail?
Maxim
26-Mar-2009
[819]
you mean add the pictures at the header of docs?  or allow us to 
use the pics within the docs?
Sunanda
26-Mar-2009
[820x3]
Max, I assumed you meant have the pic on the main page for the script, 
eg for liquid.r you'd see a thumbnail here:
   http://www.rebol.org/view-script.r?script=liquid.r
That's a nice idea, though there are some technical CSS issues......For 
example, the actual script is displayed in a <pre> block. That means 
images may not float where you'd expect them. It'll take some experimentation 
to find the best way to do it.
....Maybe a better slot for a thumbnail would be in the LHS menu, 
just under the <Script Library Home> link. That would keep it out 
of the flow of the page.
Please suggest better ideas :-)
mhinson
14-Apr-2009
[823]
Hi, I am very new to Rebol so appologies if my questions are very 
simple.


I have been trying to use functions & examples from the library by 
pasting them into the REBOL/View console. When I do this I find most 
of them produce errors or lock up the console so I have to restart 
it.  What am I doing wrong please? Is there some trick to this that 
is so obvious that no one has mentioned it?

Thanks,
Pekr
14-Apr-2009
[824]
give me an example of such script. The thing might be, that some 
scripts are already dated, but most of them should work ...
sqlab
14-Apr-2009
[825x2]
There are two file versions in the library, one for viewing, one 
for downloading. Did you use the one
from http://www.rebol.org/download-a-script.r?script-name=....r
Maybe the other ones have problems.
Mike

I checked your library example from the I'm new group producing errors.

There is probably a weakness, as the script does not regard comment 
lines.
A short enhancement would be
   
parse-ini-file: func [
    file-name [file!]
   /local ini-block
    current-section
    parsed-line
    section-name
][
 ini-block: copy []
    current-section: copy []
    foreach ini-line read/lines file-name [
		if #";" <> first ini-line [ ; do not process comment lines
			section-name: ini-line
			error? try [section-name: first load/all ini-line]
			either any [
				error? try [block? section-name]
				not block? section-name
			][
				parsed-line: parse/all ini-line "="
				append last current-section parsed-line/1
				append last current-section parsed-line/2
			][
				append ini-block current-section
				current-section: copy []
				append current-section form section-name
				append/only current-section copy []
			] ;; either
		]
    ] ;; for
 append ini-block current-section
 return to-hash ini-block
 ]