r3wp [groups: 83 posts: 189283]
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

World: r3wp

[Web] Everything web development related

Oldes
13-Feb-2006
[1077]
I'm administrating some pages where is a lot of text articles published. 
And because 50% of the trafic is done by robots as Google crawler, 
I'm thinking about that I could give the content of the page in Rebol 
format (block). Robot will get the text for indexing and I will lower 
the data amount which is transfered with each robots request, because 
I don't need to generate designs and some webparts, which are not 
important for the robot. What do you think, should I include Rebol 
header?
Sunanda
13-Feb-2006
[1078]
That's a form of cloaking. Google does not like cloaking, even "white 
hat" cloaking of the sort you are suggesting:
http://www.google.com/support/bin/answer.py?answer=745


Better to respond to Google's if-modified-since header -- it may 
reduce total bandwith by a great deal:
http://www.google.com/webmasters/guidelines.html


Also consider supplying a Google Sitemap -- and that can have modification 
dates embedded in it too. It may reduce googlebot's visits to older 
pages
http://www.google.com/webmasters/sitemaps/login
Oldes
13-Feb-2006
[1079]
But it's not just google who is crawling, at this moment I recognize 
11 crawlers who check my sites regularly.
Sunanda
13-Feb-2006
[1080]
Some of them are just bad -- ban them with a robots.txt


Some (like MSNbot) will respond to the (non-standard) crawl-delay 
in robots.txt: that at least keeps them coming at a reasonable speed.


Some are just evil and you need to ban their IP address by other 
means...Like flood control or .htaccess

REBOLorg has a fairly useful robots.txt
http://www.rebol.org/robots.txt
Oldes
13-Feb-2006
[1081]
So you think I should not use different (not so rich) version of 
the page to robots.
Sunanda
13-Feb-2006
[1082]
Yoy could try that as a first step:

-- Create a robots.txt to ban the *unwelcome* bots who visit you 
regularly .

-- Many bots have a URL for help, and that'll tell you if they honour 
crawl-delay....If so, you can get some of the bots you like to pace 
their visits better.

If that doesn't work: you have to play tough with them.
Oldes
13-Feb-2006
[1083]
I don't need to ban them:) I would prefere to play with them:) Never 
mind, I will probably make the Rebol formated output anyway. If I 
have RSS output why not to have REBOL output as well. Maybe it could 
be used in the furure, when Rebol will be able to display rich text.
Sunanda
13-Feb-2006
[1084]
Chceck if you can turn HTTP compression on with your webserver. It 
saves bandwidth with visitors who are served the compressed version.
Oldes
13-Feb-2006
[1085]
The bandwidth is not such a problem now:) I was just thinking if 
it could be used somehow to make Rebol more visible.
Sunanda
13-Feb-2006
[1086]
Having REBOL formatted output is / can be a good idea: REBOL.org 
will supply its RSS that way if you ask it nicely:

http://www.rebol.org/cgi-bin/cgiwrap/rebol/rss-get-feed.r?format=rebol

But *automatically* supplying a different version to a bot than that 
you would show to a human is called cloaking and the search engines 
don't like it at all.

If they spot what you are doing, they may ban you from their indexes 
completely.
Oldes
13-Feb-2006
[1087x3]
Do you specify content-type if you produce the output? It doesn't 
look goot if you open it in browser, I should look better than XML 
for newbies.
(... good ... IT should :)
I hope I'm looking better than XML :)))
Sunanda
13-Feb-2006
[1090]
Yes.

If you clicked the link I gave above, then you saw a page served 
as text/html  [probably should be textplain -- so I've changed it]
If you try format=rss then you get a page served as text/xml


In both cases, the output is not meant for humans: one format is 
for REBOL and one for RSS readers.
Oldes
13-Feb-2006
[1091]
yes, now it's ok:)
Sunanda
13-Feb-2006
[1092]
Good to know, thanks.
Sometimes changes like that break in other browsers.
Oldes
13-Feb-2006
[1093x2]
I know it's now for human readed, but for example this chat is public 
and if someone would click on the link, now it looks much more better. 
Don't forget, that Rebol should be human friendly:)
(I should not write in such a dark:) now = not  readed = readers 
:)
Sunanda
13-Feb-2006
[1095]
As I said, the RSS feed is explicitly intended to feed data to other 
programs for formatting, so it doesn't (perhaps can't) look nice.

All the info is available in human friendly ways elsewhere on the 
site, eg:

script library changes: http://www.rebol.org/cgi-bin/cgiwrap/rebol/script-index.r
Oldes
13-Feb-2006
[1096]
yes, no problem, and the issue with the bots - if the bot don't support 
cookies (non of them does), i can give him whatever I want, I'm not 
cheeting, I just may think, that it's somethink like LYNX and serve 
him pure text pages:) And if he don't like it so it's his problem 
(or its?) And with the robot.txt file - ugly bots will not respect 
robot.txt file anyway :)
Sunanda
13-Feb-2006
[1097]
Some bots (the more evil ones) have an even more evil human at ther 
side....Those bots can handle cookies, and will also use the human 
to step them through any logon procedures.
So, technically, yes: bots can use cookies.


But on the cloaking issue: if you show the *same* content to any 
visitor that does not use cookies then that is *not*cloaking, even 
if you serve different content to those that do. So no problem there.
Anton
14-Feb-2006
[1098x2]
How does one find out where the latest "official" syntax for URI's 
is ? For example, I'm looking at
http://www.w3.org/Addressing/rfc1808.txt
This seems more up to date:
http://www.gbiv.com/protocols/uri/rfc/rfc3986.html
JaimeVargas
14-Feb-2006
[1100]
Kudos to Yahoo!, who today released two pieces of goodness into the 
commons. The first is their UI library, and the second is their Design 
Patterns Library. The UI Library is a collection of DHTML/Ajax/Javascript 
(pick your favourite term) controls and widgets. The Design Patterns 
Library is "intended to provide Web designers prescriptive guidance 
to help solve common design problems on the Web". 

- http://developer.yahoo.net/yui/
- http://developer.yahoo.net/ypatterns/
Anton
15-Feb-2006
[1101]
read/custom - can it send more than one cookie at a time ?
Oldes
15-Feb-2006
[1102x3]
of course it can
I use this script: do http://box.lebeda.ws/~hmm/rebol/cookies-daemon_latest.r
to handle cookies
just run it and than I can do read pages and cookies are processed 
automatically
Anton
15-Feb-2006
[1105]
Ok, thankyou, I will try that.
Anton
23-Feb-2006
[1106]
Thankyou Oldes, it seems to be working.
Thør
2-Apr-2006
[1107]
.
Pekr
4-Apr-2006
[1108]
Hi ... I have following task to acomplish ..... my friend who is 
doing college in archeology, is working on her thesis. Part of the 
thesis are images of various ancient goods. So we've got photos from 
our digital camera. I will produce small View script, which will 
allow her to enter comments for each image. Now I want to do a template 
(table?), with various layouts, mainly two images per A4 page plus 
comments under each of those images. Can I influence table cell size? 
My past experience was, that the cell got resized according to image. 
Are there also various methods, how to "stretch" the image in cell? 
thanks a lot for pointers ...
Sunanda
4-Apr-2006
[1109]
You mean HTML tables?

The cell has a height and width, and the image has a height and width.

You probably need to set both height and width on both the cell and 
the image.
Probably easiest with CSS 
Remember to set the padding and margin to zero.

And remember that IE6 and lower handles this differently to other 
browsers, so it's not easy to get pixel-perfect borders and so on.
Graham
4-Apr-2006
[1110]
why is she doing her thesis in html??
Pekr
4-Apr-2006
[1111x4]
she is not ....
but appendix to thesis is some 90 photos, sorted, commented ..
I have those photos, numbered jpges, I will provide here view script 
to put comments in there, and then I want to automatically generate 
the content ....
uh, the photos - there is much more - some 230 photos ... mostly 
two per page ... you don't want her to do it manually in Word, right? 
:-) That's why the atutomatition - imo a good job for rebol :-)
Graham
4-Apr-2006
[1115]
yeah .. write the script in rebol to make pdfs
Pekr
4-Apr-2006
[1116]
pdfs? hmm, that did not come to my mind - good idea .... is that 
difficult to do with pdf-maker?
Graham
4-Apr-2006
[1117]
Yes, easy enough from memory.
Pekr
4-Apr-2006
[1118x6]
where is the latest pdf-maker?
what a fight for novice like me to get damnes stupid two images with 
two text descriptions under them to print on one A4 ....
each browser is displaying it differently. Mozilla print preview 
stinks, it even generates some strange chars which are not part of 
original document ...
on the friday, I am sitting with IBM guys, they want to show me XForms, 
as they bought one of companies involved in XForms docs products, 
but unless someone fixes browsers to simply displays one signle doc 
in one single way anywhere, it still sucks
So far PDF wins all over here with me ....
last week we finished upgrade of SAP after 5 years .... I saw some 
initial doc done in XML, XSLT etc. .... Firefox was not able to display. 
Imo the thing is, that SAP supports IE only ... what a world ....
Sunanda
4-Apr-2006
[1124]
Browsers aren't meant to display things pixel perfect.

They are designed to pour the content into the shape the user wants.

If I have a 180x360 monochrome phone, I should still be able to see 
HTML-mediated content in a reasonable way.

PDF is a way of replcating what a sheet of paper does. It does it 
well, but it is an outdated concept.

Of course, browsers are also full of bugs which doesn't help.
ScottT
5-Apr-2006
[1125x2]
browsers actually do a good job of pixel-perfect, but printers don't 
do pixels.  using real-world css dimensions, like cm or pt etc. will 
translate between device contexts.  Anyway.  I don't envy the task. 
 


Anyway, I have been messing with embedding REBOL in client-side code, 
which is working pretty well: http://eisic.ws/ext/r/Document2.plugin.r.html


I need to figure out how to keep REBOL from bailing out on me, though. 
 generally, if the console pops up, I have to refresh the page.  
For instance,  any print will pop up the console, and I would really 
rather not pop up the console from the page, because closing it destroys 
the REBOL instance.
actually, I guess this should be in the plugin group.  moving there.