World: r4wp

[Rebol School] REBOL School

older newer	first last
Kaj 10-Oct-2012 [1252]	If you're using R3 or Red/System, you could use the cURL binding in multi-mode
DocKimbel 10-Oct-2012 [1253]	Sujoy: have a look at this description of one of async HTTP clients available: http://stackoverflow.com/questions/1653969/rebol-multitasking-with-async-why-do-i-get-invalid-port-spec
Endo 10-Oct-2012 [1254x2]	Doc, I reported that problem before remember? we were agreed on the fix: in task-master.r line 135: if all [ uniserve/shared in uniserve/shared 'conf-file file: uniserve/shared/conf-file ][ append worker-args reform [" -cf" mold file] ] and on line 123: all [ in uniserve/shared 'server-ports uniserve/shared/server-ports ] Endo: "without these patches latest UniServe cannot be used alone. because it fails to start task-master. Ofcourse I need to remove logger, MTA etc. services." - 19-Dec-2011 2:50:29 Dockimbel: "I agree about your changes." - 19-Dec-2011 2:50:56
Endo 10-Oct-2012 [1254x2]	I think it is the same problem for Sujoy. (better to move Cheyenne group)
Sujoy 10-Oct-2012 [1256x5]	Thanks Endo...I am still keen on using uniserve - will get there eventually!
	i have another issue - and need help from a parse guru
	i'm trying to extract article text from an awfully written series of html pages - one sample: http://www.business-standard.com/india/news/vadra-/a-little-helpmy-friends//489109/
	there are 160 </table> tags!! worse, article contents are scattered throughout the html mess
	using beautifulsoup in python however, i can do the following: from bs4 import BeautifulSoup as bs import urllib2 uri = "http://www.business-standard.com/india/news/vadra-/a-little-helpmy-friends//489109/" soup = bs(urllib2.urlopen(uri).read()) p = soup.find_all('p') [s.extract() for s in soup.p.find_all('table')] [s.extract() for s in soup.p.find_all('script')] [s.extract() for s in soup.p.find_all('tstyle')] text = bs(''.join(str(p))).get_text() ...and this gives me exactly what is required... just want to do this in Rebol! ;)
Endo 10-Oct-2012 [1261x2]	just a quick answer, to give you an idea, I've used following to extract something from a web page: b: [] parse/all mypage [ any [ thru {<span class="dblClickSpan"} thru ">" copy t to </span> (append b trim/lines t) 7 skip ] ]
Endo 10-Oct-2012 [1261x2]	7 skip is to skip </span> tag.
Sujoy 10-Oct-2012 [1263]	yeah - thanks Endo that works great for well formed html docs - but this site is an absolute nightmare!
Kaj 10-Oct-2012 [1264]	I've used the HTML parser from PowerMezz to parse complex web pages like that
Sujoy 10-Oct-2012 [1265x2]	note from the python code that there are styles and javascript specified inside the <p> element! i was wondering about Gabrielle's HTML niwashi tree
Sujoy 10-Oct-2012 [1265x2]	never used the niwashi - Kaj, do you have a quick example for me to use? i've got the docs open, but am maybe being obtuse - it is 230am here!
Kaj 10-Oct-2012 [1267]	It's a bit confusing to set up. I'll have a look
Sujoy 10-Oct-2012 [1268]	thanks Kaj actually - thanks everyone for all your help on Rebol School
Kaj 10-Oct-2012 [1269x2]	#! /usr/bin/env r2 REBOL [] here: what-dir program: dirize clean-path here/../../../cms/files/program/PowerMezz do program/mezz/module.r load-module/from program module [ imports: [ %mezz/trees.r %mezz/load-html.r %mezz/html-to-text.r ] ][ ; print mold-tree load-html read http://osslo.nl/leveranciers make-dir %data for id 1 169 1 [ print id page: load-html read join http://osslo.nl/leveranciers?mod=organization&id= id content: get-node page/childs/html/childs/body/childs/div/childs/3/childs/2 body: get-node content/childs/table/childs/tbody ; print form-html/with body [pretty?: yes] ; print mold-tree body ; item: get-node body/childs/10/childs/2 ; print form-html/with item [pretty?: yes] ; print mold-tree item ; print mold item record: copy "" short-name: name: none unless get-node body/childs/tr/childs/th [ ; Missing record foreach item get-node body/childs [ switch/default type: trim get-node item/childs/td/childs/text/prop/value [ "Logo:" [ ; if all [get-node item/childs/2/childs/1 get-node item/childs/2/childs/1/childs/1] [ ; repend record ; ['icon tab tab tab tab get-node item/childs/2/childs/a/childs/img/prop/src newline] ; ] ] "Naam:" [ if get-node item/childs/2/childs/1 [ repend record ['name tab tab tab tab name: trim/lines html-to-text get-node item/childs/2/childs/text/prop/value newline] ] ] ... "Adres:" [ unless empty? trim/lines html-to-text form-html/with get-node item/childs/2 [pretty?: yes] [ street: get-node item/childs/2/childs/1/prop/value place: get-node item/childs/2/childs/3/prop/value number: next find/last street #" " street: trim/lines html-to-text copy/part street number unless empty? street [ repend record ['street tab tab tab tab street newline] ] unless empty? number [ repend record ['number tab tab tab tab number newline] ] unless place/1 = #" " [ where: find skip place 5 #" " repend record ['postal-code tab tab tab copy/part place where newline] place: where ] unless empty? place: trim/lines html-to-text place [ repend record ['place tab tab tab tab place newline] ] ] ] "Telefoon:" [ unless #{C2} = to-binary trim/lines html-to-text form-html/with get-node item/childs/2 [pretty?: yes] [ repend record ['phones tab tab tab tab trim get-node item/childs/2/childs/text/prop/value newline] ] ] "Website:" [ if all [get-node item/childs/2/childs/1 get-node item/childs/2/childs/1/childs/1] [ repend record ['websites tab tab tab trim get-node item/childs/2/childs/a/childs/text/prop/value newline] ] ] "E-mail:" [ if all [get-node item/childs/2/childs/1 get-node item/childs/2/childs/1/childs/1] [ repend record ['mail-addresses tab tab trim/all get-node item/childs/2/childs/a/childs/text/prop/value newline] ] ] "Profiel:" [ unless #{C2} = to-binary trim/lines html-to-text form-html/with get-node item/childs/2 [pretty?: yes] [ repend record [ 'description newline tab replace/all trim html-to-text form-html/with get-node item/childs/2 [pretty?: yes] "^/" "^/^-" newline ] ] ] ][ print ["Onbekend veld: " type] ] ] write rejoin [%data/ replace/all replace/all replace/all any [short-name name] #" " #"-" #"/" #"-" #"." "" %.txt ] record ] ] ]
Kaj 10-Oct-2012 [1269x2]	That came out bigger than planned. I was trying to cut out some repetitive fields. It scrapes addresses from a web page and converts them to text format
Sujoy 10-Oct-2012 [1271x2]	whoa!
Sujoy 10-Oct-2012 [1271x2]	but i get the idea
Kaj 10-Oct-2012 [1273]	Yeah, not very competitive with the BS code
Sujoy 10-Oct-2012 [1274x2]	well...the bs lib gzipped is 128kb...
Sujoy 10-Oct-2012 [1274x2]	and python is ~30MB but yeah - its a lovely piece of work
Kaj 10-Oct-2012 [1276]	Still looks like it would be nice to have a REBOL implementation :-)
Sujoy 10-Oct-2012 [1277]	yes it certainly would
Sujoy 11-Oct-2012 [1278x3]	Kaj: love your r2 bindings for zeromq i've been trying to implement the push-pull ventilator example ventilator: REBOL [] do %zmq.r pool: zmq/new-pool 1 socket: zmq/open pool zmq/push zmq/serve socket tcp://:5555 ventilate: func[][ print "sending" u: form time/now/precise zmq/send socket to-binary u 0 ] wait 0:00:60 [ ventilate ] worker: REBOL [] do %zmq.r pool: zmq/new-pool 1 socket: zmq/open pool zmq/pull zmq/connect socket tcp://:5555 data: copy #{} forever [ zmq/receive socket data 0 prin ["."] print to-string data ] ...but the worker crashes
	any idea why?
	the weather update server works just fine...
Nicolas 11-Oct-2012 [1281]	Please excuse my ignorance but has rebol been open sourced yet?
Henrik 11-Oct-2012 [1282]	Nicolas, the license is still being discussed.
Nicolas 11-Oct-2012 [1283x2]	Thanks. That's what I suspected but I wasn't sure.
Nicolas 11-Oct-2012 [1283x2]	Excited?
Henrik 11-Oct-2012 [1285]	Well, it will be interesting to see if we can finally get some movement on it, but Red is getting rather distracting.
Nicolas 11-Oct-2012 [1286x2]	Yeah, I'm playing with it now.
Nicolas 11-Oct-2012 [1286x2]	Still. I'm pretty excited to see the code.
Kaj 11-Oct-2012 [1288]	Sujoy, I only did a request/reply example so far, so I'll have to look into it
Sujoy 11-Oct-2012 [1289x2]	Thanks Kaj
Sujoy 11-Oct-2012 [1289x2]	i need to be able to get rebol working with push-pull, so i can get apps running behind zed shaw's mongrel2 server there's nothing better than rebol for parsing - and i want to keep using rebol any help hugely appreciated!
Kaj 11-Oct-2012 [1291]	I'm running Mongrel since several weeks, with Cheyenne and Fossil behind it
Sujoy 11-Oct-2012 [1292]	cool! so you have the mongrel pushing requests to Cheyenne?
Kaj 11-Oct-2012 [1293]	You could also run R2 code on Cheyenne, and use Mongrel as a proxy
Sujoy 11-Oct-2012 [1294x2]	yes - mongrel as a proxy is great, but i was thinking more in terms of zed's idea of a language agnostic web server
Sujoy 11-Oct-2012 [1294x2]	so some apps (or parts of apps) could be written in rebol
Kaj 11-Oct-2012 [1296]	Only as a proxy so far, I'm planning towards running Red and R3 0MQ servers as Mongrel apps
Sujoy 11-Oct-2012 [1297x2]	i noticed that the r3 bindings are much more stable than r2 r2 tends to crash with zmq
Sujoy 11-Oct-2012 [1297x2]	but i'm nervous about using r3...
Kaj 11-Oct-2012 [1299]	I haven't tested them much yet, but Janko is running his business on the R2 binding
Sujoy 11-Oct-2012 [1300]	good to know! i haven't seen janko around in a long time - i've been interested in using his distributed actors library, but cant find it online anywhere
Kaj 11-Oct-2012 [1301]	Janko is around, but he's busy
older newer	first last