World: r3wp
[!REBOL3-OLD1]
older newer | first last |
Maxim 11-Sep-2009 [17472] | 0 bytes meaning, bytes with the value "0". which act as null terminators in C land. |
Dockimbel 11-Sep-2009 [17473] | Issue reproduced here, it seems related to unicode strings output by your script. |
Pekr 11-Sep-2009 [17474] | hmm, strange. What can I do about it? IE displays chars correctly, the output in FF is weird, and I can't correct it by changing charset to any other setting ... |
Dockimbel 11-Sep-2009 [17475] | 11/9-16:01:10.375-[uniserve] Output => {C^@o^@n^@t^@e^@n^@t^@-^@t^@y^@p^@e^@ G^@E n^@o^@ 1^@2^@7^@.^@0 } |
Pekr 11-Sep-2009 [17476] | C?o?n?t?e?n?t?-?t?y?p?e? m?a?k?e? ?o?b?j?e?c?t?!? ?[? ? ? ? ? ?r?e?q?u?e?s?t?-?m?e?t?h?o?d?:? ?"?G?E?T?"? ? ? ? ? ?q? |
Maxim 11-Sep-2009 [17477] | the header MUST be printed out in ASCII. |
Pekr 11-Sep-2009 [17478x2] | Do we have any string encodings in R3 already? |
Max - following blog does not imply that. Why should I do it on my localhost? It properly knows the codepage, etc.? http://www.rebol.net/r3blogs/0182.html | |
Maxim 11-Sep-2009 [17480x3] | it being so old, its possible the decault encoding was still askin at that point. |
askin = ascii | |
AFAIK unicode -> ascii is possible in R3 but don't know how... not having done it myself. IIRC its on the R3 wiki or docs pages somehow.... googling it should give you some clues. | |
Pekr 11-Sep-2009 [17483x2] | REBOL 3.0 accepts UTF-8 encoded scripts, and because UTF-8 is a superset of ASCII, that standard is also accepted. If you are not familiar with the UTF-8 Unicode standard, it is an 8 bit encoding that accepts ASCII directly (no special encoding is needed), but allows the full Unicode character set by encoding them with characters that have values 128 or greater. |
It should accept Ascii directly .... | |
Maxim 11-Sep-2009 [17485x4] | that's on input. |
print spits out unicode. | |
AFAIK | |
string! printing, to be more precise. UTF and ASCII are converted to two byte strings IIRC. which is why you must re-encode them before spitting them via print. | |
Pekr 11-Sep-2009 [17489] | see the system/catalog/codecs for a list of loaded codecs - hmm, docs need an update. Dunno why the section was moved to system/codecs ... will ask on R3 chat ... |
PeterWood 11-Sep-2009 [17490] | Max - I believe that Carl has written sone tricky string code and strings can be either single or double byte depending on their content. |
Maxim 11-Sep-2009 [17491] | possible, but I've always seen them output as double byte... this topic has come around a few times in the last months |
PeterWood 11-Sep-2009 [17492] | Running R3 from the Mac terminal the output from the print function is definitely utf-8 encoded. |
Pekr 11-Sep-2009 [17493] | I tried to look-up some codecs, but there are none for text encodings as of yet: SYSTEM/CODECS is an object of value: bmp object! [entry title name type suffixes] gif object! [entry title name type suffixes] png object! [entry title name type suffixes] jpeg object! [entry title name type suffixes] |
PeterWood 11-Sep-2009 [17494] | I think that to binary! will decode a Rebol string! to utf-8 : >> to binary! "^(20ac)" ;; Unicode code point for Euro sign == #{E282AC} ;; utf-8 character sequence for Euro sign |
Maxim 11-Sep-2009 [17495x3] | maybe peter's excellent encoding script on rebol.org could be used as a basis for converting between ascii -> utf8 when using R3 binary as an input. while R3 has them built-in |
while = until | |
sort of like: print to-ascii to-binary "some text" | |
Pekr 11-Sep-2009 [17498] | I don't want to encode anything for simple CGI purposes, gee ;-) |
Maxim 11-Sep-2009 [17499x2] | but R3 is now fully encoded, which is REALLY nice. you don't have a choice. Resistance is futile ;-) |
and the fact that binary gives us the real byte array without any automatic conversion is also VERY nice, for building tcp handlers... it would have made my life much simpler in the past in fact. | |
Pekr 11-Sep-2009 [17501x2] | But this is some low level issue I should not care about. It displays Czech codepage correctly. Also the script is said being by default UTF-8, which is superset to ASCII. IIRC it was said, that unless we will not use special chars, it will work transparently. If it works on input, it should work also on output, no? |
OK, so we have http headers, which are supposed to be in ASCII, and then html content, which can be encoded. Which responsibility is it to provide correct encoding? A coder, or an http server? Hmm, maybe coder, as I am issuing http content headers in my scripts? | |
PeterWood 11-Sep-2009 [17503] | Pekr: Just try a quick test with: print to binary! "Content-type: text/html^/" print to binary! get-env "REQUEST_METHOD" print to binary! get-env "QUERY_STRING" print to binary! get-env "REMOTE_ADDR" to see if it is an encoding problem. |
Pekr 11-Sep-2009 [17504x2] | I think I tried, but it printed binaries ... |
#{436F6E74656E742D74797065 #{474 #{ #{3132372E3 #{0 | |
Maxim 11-Sep-2009 [17506] | but the loading actually does a re-encoding. utf-8 is compact, buts its slow because you cannot skip unless you traverse the string char by char. which is why they are internally converted to 8 or 16 bit unicode chars... it seems strings become 16 bits a bit too often (maybe a change in later releases, where they are always converted to 16 bits for some reason). |
PeterWood 11-Sep-2009 [17507x2] | The content of the binaries are fine but their format is a probelm. Sorry, I forgot about that when I suggested to try them. |
I tested you show.cgi with Apache on OS X. It runs fine and displays the expected output GET 10.0.1.198 | |
Pekr 11-Sep-2009 [17509] | Should I test with Apache too? I don't think Cheyenne is the problem though. But I already downloaded WAMP, so I will unpack it and check over the weekend ... |
Maxim 11-Sep-2009 [17510x5] | possibly the windows version defaults to 16 bits more quickly than linux and OSX versions... :-/ |
cause IIRC linux shell doesn't expect unicode as much as window's console. | |
(as per a past reading on R3 blogs and previous discussions about this) | |
probably why people say that cgi isn't working on windows. | |
or maybe the windows console (or some versions of the OS) doesn't understand ut8 at all, just 8 or 16 bit unicode... so that could explain why the windows version is dumping to stdout in 16 bits all the time. :-( | |
PeterWood 11-Sep-2009 [17515] | As I understand it the Windows console only handles single-byte encoding (ie Windows CodePages). |
BrianH 11-Sep-2009 [17516] | Windows Unicode works in UTF-16. Linux and OSX work in UTF-8. |
PeterWood 11-Sep-2009 [17517] | Pekr: One difference when I ran the cgi was that I used the -c option not the -q option. Perhaps you could try with the -c option in case Carl has done something under the surface about character encoding. |
Pekr 11-Sep-2009 [17518] | Peter - it is the same for both options -c, and -q ... |
BrianH 11-Sep-2009 [17519] | When last I heard, CGI wasn't working on Windows yet. Thanks for the info - now I know why. |
Maxim 11-Sep-2009 [17520x2] | yep its pretty clear now :-) |
maybe a cgi-specific version of print could be added as a mezz which handles the proper encoding issues to make sure that console and cgi printing are both functional on all distros without needing to change the source. | |
older newer | first last |