World: r3wp
[Web] Everything web development related
older newer | first last |
Graham 11-May-2010 [1728] | ie. it remains in the revision history |
Andreas 11-May-2010 [1729x2] | the history pages are not supposed to be indexed |
and they have `<meta name="robots" content="noindex,nofollow" />` in their head to that effect, so simply reverting the change should be fine | |
Graham 11-May-2010 [1731] | Except they are being read by my crawler .. |
Andreas 11-May-2010 [1732] | the history pages being _read_ is fine, their content being _indexed_ is not :) |
Graham 11-May-2010 [1733x4] | Perhaps it's just indexing the user names which are also drug names! http://129.33.196.33/search/?index=Default&query=albendazole&queryTimeout=3000&ref=http%3A%2F%2F129.33.196.33%3A80%2Fsearch%2F%3Fquery%3Dalbendazole%26queryTimeout%3D3000%26index%3DDefault |
Should I exclude these http://www.rebol.net/w/index.php*?? | |
excluding http://www.rebol.net/w/* | |
Finished now at 27k pages ... | |
Maxim 11-May-2010 [1737] | you should also index other sites, like www.codeconscious.com/rebol which has the best complementary view information out there, . it has helped me on sooo many occasions. |
Graham 11-May-2010 [1738x2] | Guess I could do ... but this is just to see if the engine is good enough. |
I've started to crawl Brett's site .. hope he doesn't mind! | |
Maxim 11-May-2010 [1740] | that was an example, but there are others... reboltutorial, Nick's learn programing site, Olde's flash site, rebol.org, rebol weekly news links, etc etc... I think that having a unique source for all of that rich rebol content is very usefull for everyone. |
Graham 11-May-2010 [1741x5] | Well as I said someone has to decide if the quality of the search engine is good enough or not. |
And if so, we need a permanent host for it | |
Anyone want to do some comparison searchs between google, and this? | |
Probably has to be Carl as he is the one with the issues! | |
Looks like it might be using Oracle as the DB ... | |
Andreas 12-May-2010 [1746x3] | well, one of carl's original issues looks just as bad with this search engine: http://129.33.196.33/search/?query=construct |
I can't find Carl's desired http://www.rebol.com/r3/docs/functions/construct.html at all in above results :) | |
Seems parts of the R3 docs are not (yet?) indexed: http://129.33.196.33/search/?query=url%3Aconstruct | |
Maxim 12-May-2010 [1749x2] | I looked at the html source and it should clearly float to the top. strange... its got everything needed to be scored high (title, H1, and many counts of construct in the page). |
Andreas, you're right... same with using a title search with construct. it returns nothing. | |
Graham 12-May-2010 [1751x9] | Last crawled 31 December 1969 16:00:00.000 PST Crawler status 760 - Excluded by crawl space definition Parser and index status 0 - The document has not been added to the index. |
Looks like my rules were too tight | |
Hmm.. I had not exclusion rules for rebol.com ... you sure that there isn't a no robots directive higher in this path? | |
Adding http://www.rebol.com/r3/docs/.... | |
if this construct page can't be found by any of the search engines ... is there a no robots directive ? | |
Looking for construct now brings up http://www.rebol.com/r3/docs/functions/construct.html as the top item | |
33.36k pages indexed | |
Collection is now 834mb ( 5, 138 documents ) | |
5, 183 | |
Maxim 12-May-2010 [1760] | cool, we get both functions near the top (R2 & R3) so looks like the search engine is stepping up its results :-) |
Graham 12-May-2010 [1761x2] | Try searching for beer on this engine as opposed to google! |
There's a lot to be said for a custom site specific search engine. | |
Graham 13-May-2010 [1763x3] | Try searching for "beer" alone ... I found out how to do suggested links |
featured links | |
Looks like you can have a great deal of fun setting up the search engine parameters | |
Sunanda 14-May-2010 [1766x2] | If you can get to them from the root, then they are fair game, unless .....they have a rel=nofollow......We have that on a few simply because they duplicate content (eg viewing a script, viewing a script in color, downloading a script ....Mailing list -- best to index either the individual posts (http://www.rebol.org/ml-display-message.r) or the complete threads (http://www.rebol.org/ml-display-thread.r) but not both. ....you may get a __lot__ of duplication when spidering the AltME archive as every post has a URL, but we display in batches of 50.....So perhaps only spider URLs like http://www.rebol.org/aga-display-posts.r?post=r3wp291xNNNN when NNNN is 1, 51, 101, 151, etc.... ....I think You already have indexed the ML as on REBOL.,net and Carl's latest 300 AltME messages, eg http://mail.rebol.net/cgi-bin/mail-list.r?msg=45305 http://host4.altme.com/altweb/rebol3/chat771.html It would be better _not_ to have index those; it just creates duplicates once you have indexed the equivalents on REBOL.org (especially as the AltMe last 300 goes out of date so quickly). Tell me what is unclear there! |
[oops -- that was meant for Graham, privately] | |
Graham 14-May-2010 [1768x2] | I wondered if that were the case |
I've already indexed the mailing list on rebol.net so I guess I should avoid ml-display-thread.r and display-message.r | |
Sunanda 14-May-2010 [1770] | I think the REBOL.org archive is a better place to send people than then REBOL.net one for the ML (it has threads not just messages in id sequence). So perhaps index REBOL.org's then drop the REBOL.net one? |
Graham 14-May-2010 [1771] | not sure how to drop something ... |
Sunanda 14-May-2010 [1772] | Must be possible --- whe pages or sites are removed, or if they later post robots.txt exclusions. |
Graham 14-May-2010 [1773] | interesting, I don't see it documented but it does use site: site:rebol.org sunanda |
Graham 15-May-2010 [1774] | Interestingly this search engine has a REST interface so you can wrap your own custom search around it. 95k pages and still going ... |
Janko 6-Jun-2010 [1775] | has anyone did anything about openId with rebol ? |
Pekr 19-Jun-2010 [1776] | Any pointers of how to incorporate PayPal payment method into one's website? |
Robert 19-Jun-2010 [1777] | paypal has an example. It's pretty easy. |
older newer | first last |