r3wp [groups: 83 posts: 189283]
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

World: r3wp

[Web] Everything web development related

Graham
11-May-2010
[1728]
ie. it remains in the revision history
Andreas
11-May-2010
[1729x2]
the history pages are not supposed to be indexed
and they have `<meta name="robots" content="noindex,nofollow" />` 
in their head to that effect, so simply reverting the change should 
be fine
Graham
11-May-2010
[1731]
Except they are being read by my crawler ..
Andreas
11-May-2010
[1732]
the history pages being _read_ is fine, their content being _indexed_ 
is not :)
Graham
11-May-2010
[1733x4]
Perhaps it's just indexing the user names which are also drug names!

http://129.33.196.33/search/?index=Default&query=albendazole&queryTimeout=3000&ref=http%3A%2F%2F129.33.196.33%3A80%2Fsearch%2F%3Fquery%3Dalbendazole%26queryTimeout%3D3000%26index%3DDefault
Should I exclude these http://www.rebol.net/w/index.php*??
excluding http://www.rebol.net/w/*
Finished now at 27k pages ...
Maxim
11-May-2010
[1737]
you should also index other sites, like www.codeconscious.com/rebol 
  which has the best complementary view information out there, .

it has helped me on sooo many occasions.
Graham
11-May-2010
[1738x2]
Guess I could do ... but this is just to see if the engine is good 
enough.
I've started to crawl Brett's site .. hope he doesn't mind!
Maxim
11-May-2010
[1740]
that was an example, but there are others... reboltutorial, Nick's 
learn programing site, Olde's flash site, rebol.org, rebol weekly 
news links, etc etc... I think that having a unique source for all 
of that rich rebol content is very usefull for everyone.
Graham
11-May-2010
[1741x5]
Well as I said someone has to decide if the quality of the search 
engine is good enough or not.
And if so, we need a permanent host for it
Anyone want to do some comparison searchs between google, and this?
Probably has to be Carl as he is the one with the issues!
Looks like it might be using Oracle as the DB ...
Andreas
12-May-2010
[1746x3]
well, one of carl's original issues looks just as bad with this search 
engine: http://129.33.196.33/search/?query=construct
I can't find Carl's desired http://www.rebol.com/r3/docs/functions/construct.html
at all in above results :)
Seems parts of the R3 docs are not (yet?) indexed: http://129.33.196.33/search/?query=url%3Aconstruct
Maxim
12-May-2010
[1749x2]
I looked at the html source and it should clearly float to the top. 
 strange... 

its got everything needed to be scored high (title, H1, and many 
counts of construct in the page).
Andreas, you're right... same with using a title search with construct. 
 it returns nothing.
Graham
12-May-2010
[1751x9]
Last crawled  	31 December 1969 16:00:00.000 PST
Crawler status 	760 - Excluded by crawl space definition

Parser and index status 	0 - The document has not been added to the 
index.
Looks like my rules were too tight
Hmm.. I had not exclusion rules for rebol.com ... you sure that there 
isn't a no robots directive higher in this path?
Adding http://www.rebol.com/r3/docs/....
if this construct page can't be found by any of the search engines 
... is there a no robots directive ?
Looking for construct now brings up http://www.rebol.com/r3/docs/functions/construct.html
as the top item
33.36k pages indexed
Collection is now 834mb ( 5, 138 documents )
5, 183
Maxim
12-May-2010
[1760]
cool, we get both functions near the top (R2 & R3) so looks like 
the search engine is stepping up its results  :-)
Graham
12-May-2010
[1761x2]
Try searching for beer on this engine as opposed to google!
There's a lot to be said for a custom site specific search engine.
Graham
13-May-2010
[1763x3]
Try searching for "beer" alone ... I found out how to do suggested 
links
featured links
Looks like you can have a great deal of fun setting up the search 
engine parameters
Sunanda
14-May-2010
[1766x2]
If you can get to them from the root, then they are fair game, unless


.....they have a rel=nofollow......We have that on a few simply because 
they duplicate content (eg viewing a script, viewing a script in 
color, downloading a script


....Mailing list -- best to index either the individual posts (http://www.rebol.org/ml-display-message.r) 
or the complete threads (http://www.rebol.org/ml-display-thread.r) 
but not both.


....you may get a __lot__ of duplication when spidering the AltME 
archive as every post has a URL, but we display in batches of 50.....So 
perhaps only spider URLs like
http://www.rebol.org/aga-display-posts.r?post=r3wp291xNNNN
when NNNN is 1, 51, 101, 151, etc....


....I think You already have indexed the ML as on REBOL.,net and 
Carl's latest 300 AltME messages, eg
   http://mail.rebol.net/cgi-bin/mail-list.r?msg=45305
   http://host4.altme.com/altweb/rebol3/chat771.html

It would be better _not_ to have index those; it just creates duplicates 
once you have indexed the equivalents on REBOL.org (especially as 
the AltMe last 300 goes out of date so quickly).

Tell me what is unclear there!
[oops -- that was meant for Graham, privately]
Graham
14-May-2010
[1768x2]
I wondered if that were the case
I've already indexed the mailing list on rebol.net so I guess I should 
avoid ml-display-thread.r and display-message.r
Sunanda
14-May-2010
[1770]
I think the REBOL.org archive is a better place to send people than 
then REBOL.net one for the ML (it has threads not just messages in 
id sequence).
So perhaps index REBOL.org's then drop the REBOL.net one?
Graham
14-May-2010
[1771]
not sure how to drop something ...
Sunanda
14-May-2010
[1772]
Must be possible --- whe pages or sites are removed, or if they later 
post robots.txt exclusions.
Graham
14-May-2010
[1773]
interesting, I don't see it documented but it does use site:

site:rebol.org sunanda
Graham
15-May-2010
[1774]
Interestingly this search engine has a REST interface so you can 
wrap your own custom search around it.  95k pages and still going 
...
Janko
6-Jun-2010
[1775]
has anyone did anything about openId with rebol ?
Pekr
19-Jun-2010
[1776]
Any pointers of how to incorporate PayPal payment method into one's 
website?
Robert
19-Jun-2010
[1777]
paypal has an example. It's pretty easy.