r3wp [groups: 83 posts: 189283]
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

World: r3wp

[!REBOL3-OLD1]

sqlab
30-Oct-2009
[19283]
R3 should not try to do that automatically, if I do not want that
Maxim
30-Oct-2009
[19284x2]
the 256 char encodings are often mixed up since they used to be loosly 
refered to  ANSI
http://en.wikipedia.org/wiki/Windows-1252
Pekr
30-Oct-2009
[19286]
so what is an option? load/as? What if my script contains string 
in various charset?
sqlab
30-Oct-2009
[19287]
The systems I know before, had a default codepage and did use Unicode 
only as an option.

I think it would have been enough, if R3 just added an Unicode datatype.
But now it's probably too late.
Maxim
30-Oct-2009
[19288x2]
R3  is unicode from A-Z.   the code is unicode.
but for data, I would like to have default encoding of my choice.
Pekr
30-Oct-2009
[19290]
It is really not good, that I can't load my own local codepage. How 
should I make my source-file UTF-8? My Notepad will not probably 
add any BOM header for me automatically ...
Maxim
30-Oct-2009
[19291x3]
utf-8 needs no BOM... its only used as a signature.
if you only use ascii (lower 127 chars)  you will see no difference.
since rebol will load files as UTF-8 by default code doesn't need 
it.
Gabriele
30-Oct-2009
[19294]
guys, please, we are in 2009... how is it possible you're still using 
latin1...
Maxim
30-Oct-2009
[19295x2]
hum... cause everything I use is ascii or latin-1 ?
well, windows ANSI..
sqlab
30-Oct-2009
[19297]
Even you live in 2009, most people are born in the last century.
And many recipes, are older than the people, 
at least the good ones. (recipes for cooking food I mean:)
PeterWood
30-Oct-2009
[19298]
..and sticking to the old ways means living with the old problems 
... like not knowing how to interprete characters properly ... like 
AlrME for example ... it assumes makes the assumption that all text 
in messages is encoded as though it was entered on your own machine. 
So messages from Mac users are incorrecly displayed on Windows machines 
and vice-versa.


For me, moving to utf-8 is a much easier problem to live with than 
not being able to properly share text across different platforms. 
It may be different for you.
Henrik
30-Oct-2009
[19299]
REBOL3's philosophy should be simple: UTF-8 is default. Anything 
else is possible, but must be optionally selected.
sqlab
30-Oct-2009
[19300x3]
Livnig with the old problems means knowing the old solutions.

Displaying text in windows is a different problem to loading programs.
By the way, Ticket #0000589  leads still to a crash, even it was 
set to build again.
build = built
Maxim
30-Oct-2009
[19303]
re-open it.
PeterWood
30-Oct-2009
[19304]
Loading programs are not totally immune from encoding problems. An 
unlikely but possible example:

if name = "Ashley TrŸter" [print "Hello Ashley"]
sqlab
30-Oct-2009
[19305x2]
Then I would prefer, that name and the string to compare have an 
unicode datatype, 
as in
>> type? name
== UTF-8.
if name = U8{Ashley ...
Maxim
30-Oct-2009
[19307x2]
but utf-8 editors aren't rare nowadays, and using utf-8 sequences 
isn't hard... really, if you tuely want to keep using as ascii editor
tuely = truely
sqlab
30-Oct-2009
[19309]
But they do not convert automatically ..
Maxim
30-Oct-2009
[19310x6]
handling encoding is complex in any environment... I had a lot of 
"fun" handling encodings in php, which uses such a unicode datatype... 
its not really easier... cause you can't know by the text if its 
unicode or ascii or binary values unless you tell it to load a sequence 
of bytes AS one or the other.
with mashup software its even worse... when done improperly, you 
end up with data with multiple encodings in the same document .... 
and then all hell breaks loose  :-)
at least converging to utf-8, all scripts by all authors will work 
the same on all systems.
cause there is just ONE encoding.
but having some kind of default for read/write could be usefull, 
instead of having to add a refinement all the time, and force a script 
to expect a specific encoding.
then it would be easier to change it one place, do all I/O without 
the refinement.  and less work for another to change encoding for 
the whole app and having to put conditionals everytime we use read/write.
Pekr
30-Oct-2009
[19316]
Max - so what is an easy solution for me, to load my local scripts 
on my local system, which contain czech alphabet signs > 127?
Maxim
30-Oct-2009
[19317x4]
IIRC there was intended to have a header attribute specifying encodings 
for the script body...
don't know if its implemented or not.
I put a suggestion on the blog about allowing user-creating encoding 
maps... otherwise, you can load it as binary in R3 and just convert 
the czech chars to utf-8 multi-byte sequences and convert the binary 
to string using decode.
is the czech encoding the standard windows ansi  encoding?
PeterWood
30-Oct-2009
[19321]
Yes on Czach machines ..... I think its Windpws codepage 1250. I 
beleive the default codepage on most US machines is 1252 (MS's extended 
version of ISO-8859-1).
Maxim
30-Oct-2009
[19322]
ok yeah a few different diacritics between those two encodings
Pekr
30-Oct-2009
[19323]
how do you aproach the situation, if your script would contain two 
strings, in different encodings? Can it practicall happen?
Maxim
30-Oct-2009
[19324]
R3 will interpret litteral strings and decode them using utf-8 (or 
the header encoding, if its supported) so in this case no.


but if the data is stored within binaries (equivalent to R2 which 
doesn't handle encoding) then, yes, since the binary represents the 
sequence of bytes not chars.


if you use a utf-8 editor, and type characters above 127 and look 
at them in  notepad, you will then see the UTF-8 byte sequences (which 
will look like garbled text, obviously).
Pekr
30-Oct-2009
[19325]
Is there utf-8 version of notepad? :-)
Maxim
30-Oct-2009
[19326]
I don't know if R3 has a way of specifying the encoding litterally... 
like  UTF8{}  UTF16{}  or WIN1252{} ... this would be nice.
PeterWood
30-Oct-2009
[19327]
A script cpud have two different encodings if differenlty encoded 
files were included. For example, you could use a script from Rebol.org 
in one of your scripts. You probably use Windows Code Page 1250 but 
most scripts in the library use other encodings.


This doesn't give big problems as most of the code in the Library 
is "pure" ASCII
Maxim
30-Oct-2009
[19328]
I use uedit which handles unicode natively when you want to... a 
lot of preferences for it ...
PeterWood
30-Oct-2009
[19329]
Notepad can apparently handle both UTF-8 and UTF-16 http://en.wikipedia.org/wiki/Notepad_(Windows)
Maxim
30-Oct-2009
[19330]
it tries to detect UTF based on text content... broken up until vista.
http://en.wikipedia.org/wiki/Notepad_%28Windows%29
Carl
30-Oct-2009
[19331]
Ok, so... no one reads the wiki.  That's ok... we're all developers. 
We don't read things other than code.

So, here's a summary of R3 and Unicode:

http://www.rebol.net/r3blogs/0286.html
Gabriele
31-Oct-2009
[19332]
Max: maybe you should start using a real operating system. But, that 
aside, if you have any software that does not handle utf-8, simply 
trash it. guys, really, this is crazy, we are in 2009, let's put 
an end to this codepage crap!