World: r4wp
[Rebol School] REBOL School
older newer | first last |
Sujoy 10-Oct-2012 [1222x2] | trying this right now... |
damn! no luck. >> ls BSD-License.txt change-log.txt clients/ docs/ handlers/ libs/ protocols/ services/ uni-engine.r >> uniserve-path: %./ == %./ >> do %uni-engine.r Script: "UniServe kernel" (17-Jan-2010) Script: "Encap virtual filesystem" (21-Sep-2009) == true >> uniserve/boot booya . http://newsrss.bbc.co.uk/rss/newsonline_uk_edition/business/rss.xml ** Script Error: Cannot use path on none! value ** Where: process-task ** Near: if any [ zero? shared/pool-max shared/pool-max > shared/pool-count ] [fork] either | |
DocKimbel 10-Oct-2012 [1224] | Ah, I stopped at "booya". :-) |
Sujoy 10-Oct-2012 [1225x2] | :) is this because pool-list is empty? i put in a debug "print" cmd in the on-new-client function of task-master.r, which is the only place i could see pool-list being appended to...but it seems the function is not called |
on-new-client: has [job][ ;added this line print client/remote-ip if client/remote-ip <> 127.0.0.1 [close-client exit] set-modes client [keep-alive: on] client/timeout: 15 client/user-data: make task [] ;only place where pool-list is appended to... append pool-list :client | |
DocKimbel 10-Oct-2012 [1227] | No the issue is with 'shared being reset to 'none in %task-master...looks like a regression in Uniserve when working on standalone...I'm looking into it. |
Sujoy 10-Oct-2012 [1228] | thanks doc! |
DocKimbel 10-Oct-2012 [1229] | In %reminder.r, you shouldn't use: scheduler/wait. Uniserve is already providing an event loop. You need to remove that line. |
Sujoy 10-Oct-2012 [1230x2] | ok... |
removed the scheduler/wait line...now: uniserve-path: %./ == %./ >> do %uni-engine.r Script: "UniServe kernel" (17-Jan-2010) Script: "Encap virtual filesystem" (21-Sep-2009) == true >> uniserve/boot booya ** Script Error: Invalid path value: server-ports ** Where: reform ** Near: mold any [uniserve/shared/server-ports port-id] >> | |
DocKimbel 10-Oct-2012 [1232] | I've just pushed a fix for that to Cheyenne SVN repo on Google code. |
Sujoy 10-Oct-2012 [1233] | thanks doc...downloading now... |
DocKimbel 10-Oct-2012 [1234] | From that, it seems to work until the job event is raised, then the server crashes (not sure if it's your code, scheduler or Uniserve that causes that). |
Sujoy 10-Oct-2012 [1235x3] | :( i'm actually trying to do something really simple i have a bunch of feeds i want to download i can do that sequentially (foreach feed feeds [...]), but thought it best to us background worker processes via task-master to download instead is there an alternative? |
or a better way of writing this using uniserve? | |
this is what i get with the latest from googlecode: uniserve-path: %./ == %./ >> do %uni-engine.r Script: "UniServe kernel" (17-Jan-2010) Script: "Encap virtual filesystem" (21-Sep-2009) == true >> uniserve/boot booya 10/10-18:37:48.883-## Error in [uniserve] : Cannot open server reminder on port 9000 ! 10/10-18:37:48.884-## Error in [uniserve] : Cannot open server task-master on port 9799 ! == none >> | |
DocKimbel 10-Oct-2012 [1238x2] | Uniserve task-master is mainly meant for server-side parallel request processing. For your need, you should use an async HTTP client rather, which would be a much simpler solution. |
Cannot open... you need to close any previous Uniserve session. | |
Sujoy 10-Oct-2012 [1240] | sorry - just killed all previous Uniserve sessions. now get: uniserve-path: %./ == %./ >> do %uni-engine.r Script: "UniServe kernel" (17-Jan-2010) Script: "Encap virtual filesystem" (21-Sep-2009) == true >> uniserve/boot booya ** Script Error: Invalid path value: conf-file ** Where: on-started ** Near: if all [ uniserve/shared file: uniserve/shared/conf-file ] [ append worker-args reform [" -cf" mold file] ] >> |
DocKimbel 10-Oct-2012 [1241] | Are you running from SVN repo, or a copy of Uniserve folder? |
Sujoy 10-Oct-2012 [1242] | a copy of the Uniserve folder... |
DocKimbel 10-Oct-2012 [1243x2] | This looks like Cheyenne-dependent code... |
But, you should *really* use a async HTTP client, that's the best solution for your need (multiple HTTP downloads at the same time). | |
Sujoy 10-Oct-2012 [1245x2] | hmmm. ok...will work on this and get back to you thanks for the time Doc |
(cant wait to see Cheyenne on Red ;) | |
DocKimbel 10-Oct-2012 [1247] | Well, you might see some micro-Cheyenne before Christmas. ;-) |
Sujoy 10-Oct-2012 [1248x4] | best christmas ever! |
just to persist with using uniserve...i think something i may be getting there uniserve-path: %./ == %./ >> do %uni-engine.r Script: "UniServe kernel" (17-Jan-2010) Script: "Encap virtual filesystem" (21-Sep-2009) == true >> uniserve/boot booya 127.0.0.1 127.0.0.1 == none >> i commented out the lines from on-started: on-started: has [file][ worker-args: reform [ "-worker" mold any [in uniserve/shared 'server-ports port-id] ;TBD: fix shared object issues ] if not encap? [ append worker-args reform [" -up" mold uniserve-path] if value? 'modules-path [ append worker-args reform [" -mp" mold modules-path] ] if all [ uniserve/shared ;file: uniserve/shared/conf-file ][ ;append worker-args reform [" -cf" mold file] ] ] if integer? shared/pool-start [loop shared/pool-start [fork]] ] ...since conf-file is cheyenne specific i think maybe the scheduler is killing UniServe - it exits while returning none... | |
nope - the scheduler is just fine... i'm now thinking it may have to do with using the shared/do-task in the on-load function... | |
nope will take doc's advice and do something simpler | |
Kaj 10-Oct-2012 [1252] | If you're using R3 or Red/System, you could use the cURL binding in multi-mode |
DocKimbel 10-Oct-2012 [1253] | Sujoy: have a look at this description of one of async HTTP clients available: http://stackoverflow.com/questions/1653969/rebol-multitasking-with-async-why-do-i-get-invalid-port-spec |
Endo 10-Oct-2012 [1254x2] | Doc, I reported that problem before remember? we were agreed on the fix: in task-master.r line 135: if all [ uniserve/shared in uniserve/shared 'conf-file file: uniserve/shared/conf-file ][ append worker-args reform [" -cf" mold file] ] and on line 123: all [ in uniserve/shared 'server-ports uniserve/shared/server-ports ] Endo: "without these patches latest UniServe cannot be used alone. because it fails to start task-master. Ofcourse I need to remove logger, MTA etc. services." - 19-Dec-2011 2:50:29 Dockimbel: "I agree about your changes." - 19-Dec-2011 2:50:56 |
I think it is the same problem for Sujoy. (better to move Cheyenne group) | |
Sujoy 10-Oct-2012 [1256x5] | Thanks Endo...I am still keen on using uniserve - will get there eventually! |
i have another issue - and need help from a parse guru | |
i'm trying to extract article text from an awfully written series of html pages - one sample: http://www.business-standard.com/india/news/vadra-/a-little-helpmy-friends//489109/ | |
there are 160 </table> tags!! worse, article contents are scattered throughout the html mess | |
using beautifulsoup in python however, i can do the following: from bs4 import BeautifulSoup as bs import urllib2 uri = "http://www.business-standard.com/india/news/vadra-/a-little-helpmy-friends//489109/" soup = bs(urllib2.urlopen(uri).read()) p = soup.find_all('p') [s.extract() for s in soup.p.find_all('table')] [s.extract() for s in soup.p.find_all('script')] [s.extract() for s in soup.p.find_all('tstyle')] text = bs(''.join(str(p))).get_text() ...and this gives me exactly what is required... just want to do this in Rebol! ;) | |
Endo 10-Oct-2012 [1261x2] | just a quick answer, to give you an idea, I've used following to extract something from a web page: b: [] parse/all mypage [ any [ thru {<span class="dblClickSpan"} thru ">" copy t to </span> (append b trim/lines t) 7 skip ] ] |
7 skip is to skip </span> tag. | |
Sujoy 10-Oct-2012 [1263] | yeah - thanks Endo that works great for well formed html docs - but this site is an absolute nightmare! |
Kaj 10-Oct-2012 [1264] | I've used the HTML parser from PowerMezz to parse complex web pages like that |
Sujoy 10-Oct-2012 [1265x2] | note from the python code that there are styles and javascript specified inside the <p> element! i was wondering about Gabrielle's HTML niwashi tree |
never used the niwashi - Kaj, do you have a quick example for me to use? i've got the docs open, but am maybe being obtuse - it is 230am here! | |
Kaj 10-Oct-2012 [1267] | It's a bit confusing to set up. I'll have a look |
Sujoy 10-Oct-2012 [1268] | thanks Kaj actually - thanks everyone for all your help on Rebol School |
Kaj 10-Oct-2012 [1269x2] | #! /usr/bin/env r2 REBOL [] here: what-dir program: dirize clean-path here/../../../cms/files/program/PowerMezz do program/mezz/module.r load-module/from program module [ imports: [ %mezz/trees.r %mezz/load-html.r %mezz/html-to-text.r ] ][ ; print mold-tree load-html read http://osslo.nl/leveranciers make-dir %data for id 1 169 1 [ print id page: load-html read join http://osslo.nl/leveranciers?mod=organization&id= id content: get-node page/childs/html/childs/body/childs/div/childs/3/childs/2 body: get-node content/childs/table/childs/tbody ; print form-html/with body [pretty?: yes] ; print mold-tree body ; item: get-node body/childs/10/childs/2 ; print form-html/with item [pretty?: yes] ; print mold-tree item ; print mold item record: copy "" short-name: name: none unless get-node body/childs/tr/childs/th [ ; Missing record foreach item get-node body/childs [ switch/default type: trim get-node item/childs/td/childs/text/prop/value [ "Logo:" [ ; if all [get-node item/childs/2/childs/1 get-node item/childs/2/childs/1/childs/1] [ ; repend record ; ['icon tab tab tab tab get-node item/childs/2/childs/a/childs/img/prop/src newline] ; ] ] "Naam:" [ if get-node item/childs/2/childs/1 [ repend record ['name tab tab tab tab name: trim/lines html-to-text get-node item/childs/2/childs/text/prop/value newline] ] ] ... "Adres:" [ unless empty? trim/lines html-to-text form-html/with get-node item/childs/2 [pretty?: yes] [ street: get-node item/childs/2/childs/1/prop/value place: get-node item/childs/2/childs/3/prop/value number: next find/last street #" " street: trim/lines html-to-text copy/part street number unless empty? street [ repend record ['street tab tab tab tab street newline] ] unless empty? number [ repend record ['number tab tab tab tab number newline] ] unless place/1 = #" " [ where: find skip place 5 #" " repend record ['postal-code tab tab tab copy/part place where newline] place: where ] unless empty? place: trim/lines html-to-text place [ repend record ['place tab tab tab tab place newline] ] ] ] "Telefoon:" [ unless #{C2} = to-binary trim/lines html-to-text form-html/with get-node item/childs/2 [pretty?: yes] [ repend record ['phones tab tab tab tab trim get-node item/childs/2/childs/text/prop/value newline] ] ] "Website:" [ if all [get-node item/childs/2/childs/1 get-node item/childs/2/childs/1/childs/1] [ repend record ['websites tab tab tab trim get-node item/childs/2/childs/a/childs/text/prop/value newline] ] ] "E-mail:" [ if all [get-node item/childs/2/childs/1 get-node item/childs/2/childs/1/childs/1] [ repend record ['mail-addresses tab tab trim/all get-node item/childs/2/childs/a/childs/text/prop/value newline] ] ] "Profiel:" [ unless #{C2} = to-binary trim/lines html-to-text form-html/with get-node item/childs/2 [pretty?: yes] [ repend record [ 'description newline tab replace/all trim html-to-text form-html/with get-node item/childs/2 [pretty?: yes] "^/" "^/^-" newline ] ] ] ][ print ["Onbekend veld: " type] ] ] write rejoin [%data/ replace/all replace/all replace/all any [short-name name] #" " #"-" #"/" #"-" #"." "" %.txt ] record ] ] ] |
That came out bigger than planned. I was trying to cut out some repetitive fields. It scrapes addresses from a web page and converts them to text format | |
Sujoy 10-Oct-2012 [1271] | whoa! |
older newer | first last |