• Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

World: r4wp

[Rebol School] REBOL School

DocKimbel
10-Oct-2012
[1243x2]
This looks like Cheyenne-dependent code...
But, you should *really* use a async HTTP client, that's the best 
solution for your need (multiple HTTP downloads at the same time).
Sujoy
10-Oct-2012
[1245x2]
hmmm. ok...will work on this and get back to you
thanks for the time Doc
(cant wait to see Cheyenne on Red ;)
DocKimbel
10-Oct-2012
[1247]
Well, you might see some micro-Cheyenne before Christmas. ;-)
Sujoy
10-Oct-2012
[1248x4]
best christmas ever!
just to persist with using uniserve...i think something i may be 
getting there

uniserve-path: %./
== %./
>> do %uni-engine.r
Script: "UniServe kernel" (17-Jan-2010)
Script: "Encap virtual filesystem" (21-Sep-2009)
== true
>> uniserve/boot
booya
127.0.0.1
127.0.0.1
== none
>>

i commented out the lines from on-started:

on-started: has [file][
		worker-args: reform [

   "-worker" mold any [in uniserve/shared 'server-ports port-id]		;TBD: 
   fix shared object issues
		]
		if not encap? [
			append worker-args reform [" -up" mold uniserve-path]
			if value? 'modules-path [
				append worker-args reform [" -mp" mold modules-path]
			]
			if all [
				uniserve/shared
				;file: uniserve/shared/conf-file 
			][		
				;append worker-args reform [" -cf" mold file]
			]
		]
		if integer? shared/pool-start [loop shared/pool-start [fork]]
	]

...since conf-file is cheyenne specific


i think maybe the scheduler is killing UniServe - it exits while 
returning none...
nope - the scheduler is just fine...

i'm now thinking it may have to do with using the shared/do-task 
in the on-load function...
nope
will take doc's advice and do something simpler
Kaj
10-Oct-2012
[1252]
If you're using R3 or Red/System, you could use the cURL binding 
in multi-mode
DocKimbel
10-Oct-2012
[1253]
Sujoy: have a look at this description of  one of async HTTP clients 
available: http://stackoverflow.com/questions/1653969/rebol-multitasking-with-async-why-do-i-get-invalid-port-spec
Endo
10-Oct-2012
[1254x2]
Doc, I reported that problem before remember? we were agreed on the 
fix:

in task-master.r

line 135: if all [ uniserve/shared in uniserve/shared 'conf-file 
file: uniserve/shared/conf-file ][
	 append worker-args reform [" -cf" mold file] ]

and on line 123: all [ in uniserve/shared 'server-ports uniserve/shared/server-ports 
]


Endo: "without these patches latest UniServe cannot be used alone. 
because it fails to start task-master. Ofcourse I need to remove 
logger, MTA etc. services." - 19-Dec-2011 2:50:29
Dockimbel: "I agree about your changes." - 19-Dec-2011 2:50:56
I think it is the same problem for Sujoy. (better to move Cheyenne 
group)
Sujoy
10-Oct-2012
[1256x5]
Thanks Endo...I am still keen on using uniserve - will get there 
eventually!
i have another issue - and need help from a parse guru
i'm trying to extract article text from an awfully written series 
of html pages - one sample:


http://www.business-standard.com/india/news/vadra-/a-little-helpmy-friends//489109/
there are 160 </table> tags!!
worse, article contents are scattered throughout the html mess
using beautifulsoup in python however, i can do the following:

from bs4 import  BeautifulSoup as bs
import urllib2


uri = "http://www.business-standard.com/india/news/vadra-/a-little-helpmy-friends//489109/"
soup = bs(urllib2.urlopen(uri).read())

p = soup.find_all('p')
[s.extract() for s in soup.p.find_all('table')]
[s.extract() for s in soup.p.find_all('script')]
[s.extract() for s in soup.p.find_all('tstyle')]

text = bs(''.join(str(p))).get_text()

...and this gives me exactly what is required...

just want to do this in Rebol! ;)
Endo
10-Oct-2012
[1261x2]
just a quick answer, to give you an idea, I've used following to 
extract something from a web page:
b: [] parse/all mypage [
        any [

            thru {<span class="dblClickSpan"} thru ">" copy t to </span>
            (append b trim/lines t) 7 skip
        ]
 ]
7 skip
 is to skip </span> tag.
Sujoy
10-Oct-2012
[1263]
yeah - thanks Endo

that works great for well formed html docs - but this site is an 
absolute nightmare!
Kaj
10-Oct-2012
[1264]
I've used the HTML parser from PowerMezz to parse complex web pages 
like that
Sujoy
10-Oct-2012
[1265x2]
note from the python code that there are styles and javascript specified 
inside the <p> element!
i was wondering about Gabrielle's HTML niwashi tree
never used the niwashi - Kaj, do you have a quick example for me 
to use?

i've got the docs open, but am maybe being obtuse - it is 230am here!
Kaj
10-Oct-2012
[1267]
It's a bit confusing to set up. I'll have a look
Sujoy
10-Oct-2012
[1268]
thanks Kaj
actually - thanks everyone for all your help on Rebol School
Kaj
10-Oct-2012
[1269x2]
#! /usr/bin/env r2
REBOL []

here: what-dir

program: dirize clean-path here/../../../cms/files/program/PowerMezz

do program/mezz/module.r

load-module/from program

module [
	imports: [
		%mezz/trees.r
		%mezz/load-html.r
		%mezz/html-to-text.r
	]
][
;	print mold-tree load-html read http://osslo.nl/leveranciers

	make-dir %data

	for id 1 169 1 [
		print id

  page: load-html read join http://osslo.nl/leveranciers?mod=organization&id=
  id


  content: get-node page/childs/html/childs/body/childs/div/childs/3/childs/2

		body: get-node content/childs/table/childs/tbody
;		print form-html/with body [pretty?: yes]
;		print mold-tree body

;		item: get-node body/childs/10/childs/2
;		print form-html/with item [pretty?: yes]
;		print mold-tree item
;		print mold item

		record: copy ""
		short-name: name: none

		unless get-node body/childs/tr/childs/th [  ; Missing record
			foreach item get-node body/childs [

    switch/default type: trim get-node item/childs/td/childs/text/prop/value 
    [
					"Logo:" [

;						if all [get-node item/childs/2/childs/1  get-node item/childs/2/childs/1/childs/1] 
[
;							repend record

;								['icon tab tab tab tab		get-node item/childs/2/childs/a/childs/img/prop/src 
 newline]
;						]
					]
					"Naam:" [
						if get-node item/childs/2/childs/1 [
							repend record

        ['name tab tab tab tab		name: trim/lines html-to-text get-node item/childs/2/childs/text/prop/value 
         newline]
						]
					]
...					"Adres:" [

      unless empty? trim/lines html-to-text form-html/with get-node item/childs/2 
      [pretty?: yes] [
							street: get-node item/childs/2/childs/1/prop/value
							place: get-node item/childs/2/childs/3/prop/value

							number: next find/last street #" "
							street: trim/lines html-to-text copy/part street number

							unless empty? street [
								repend record ['street tab tab tab tab	street newline]
							]
							unless empty? number [
								repend record ['number tab tab tab tab	number newline]
							]
							unless place/1 = #" " [
								where: find  skip place 5  #" "

        repend record ['postal-code tab tab tab	copy/part place where  newline]

								place: where
							]
							unless empty? place: trim/lines html-to-text place [
								repend record ['place tab tab tab tab 	place newline]
							]
						]
					]
					"Telefoon:" [

      unless #{C2} = to-binary trim/lines html-to-text form-html/with get-node 
      item/childs/2 [pretty?: yes] [
							repend record

        ['phones tab tab tab tab	trim get-node item/childs/2/childs/text/prop/value 
         newline]
						]
					]
					"Website:" [

      if all [get-node item/childs/2/childs/1  get-node item/childs/2/childs/1/childs/1] 
      [
							repend record

        ['websites tab tab tab		trim get-node item/childs/2/childs/a/childs/text/prop/value 
         newline]
						]
					]
					"E-mail:" [

      if all [get-node item/childs/2/childs/1  get-node item/childs/2/childs/1/childs/1] 
      [
							repend record

        ['mail-addresses tab tab	trim/all get-node item/childs/2/childs/a/childs/text/prop/value 
         newline]
						]
					]
					"Profiel:" [

      unless #{C2} = to-binary trim/lines html-to-text form-html/with get-node 
      item/childs/2 [pretty?: yes] [
							repend record [
								'description newline
									tab replace/all

          trim html-to-text form-html/with get-node item/childs/2 [pretty?: 
          yes]
										"^/" "^/^-"
									newline
							]
						]
					]
				][
					print ["Onbekend veld: " type]
				]
			]
			write rejoin [%data/
				replace/all replace/all replace/all any [short-name name]
					#" " #"-"
					#"/" #"-"
					#"." ""
				%.txt
			] record
		]
	]
]
That came out bigger than planned. I was trying to cut out some repetitive 
fields. It scrapes addresses from a web page and converts them to 
text format
Sujoy
10-Oct-2012
[1271x2]
whoa!
but i get the idea
Kaj
10-Oct-2012
[1273]
Yeah, not very competitive with the BS code
Sujoy
10-Oct-2012
[1274x2]
well...the bs lib gzipped is 128kb...
and python is ~30MB
but yeah - its a lovely piece of work
Kaj
10-Oct-2012
[1276]
Still looks like it would be nice to have a REBOL implementation 
:-)
Sujoy
10-Oct-2012
[1277]
yes it certainly would
Sujoy
11-Oct-2012
[1278x3]
Kaj: 
love your r2 bindings for zeromq 
i've been trying to implement the push-pull ventilator example

ventilator:
REBOL []

do %zmq.r

pool: zmq/new-pool 1
socket: zmq/open pool zmq/push
zmq/serve socket tcp://*:5555

ventilate: func[][
  print "sending"
  u: form time/now/precise
  zmq/send socket to-binary u 0
]

wait 0:00:60 [
  ventilate
]

worker:
REBOL []

do %zmq.r

pool: zmq/new-pool 1
socket: zmq/open pool zmq/pull
zmq/connect socket tcp://*:5555

data: copy #{}

forever [
  zmq/receive socket data 0
  prin ["."] 
  print to-string data
]

...but the worker crashes
any idea why?
the weather update server works just fine...
Nicolas
11-Oct-2012
[1281]
Please excuse my ignorance but has rebol been open sourced yet?
Henrik
11-Oct-2012
[1282]
Nicolas, the license is still being discussed.
Nicolas
11-Oct-2012
[1283x2]
Thanks. That's what I suspected but I wasn't sure.
Excited?
Henrik
11-Oct-2012
[1285]
Well, it will be interesting to see if we can finally get some movement 
on it, but Red is getting rather distracting.
Nicolas
11-Oct-2012
[1286x2]
Yeah, I'm playing with it now.
Still. I'm pretty excited to see the code.
Kaj
11-Oct-2012
[1288]
Sujoy, I only did a request/reply example so far, so I'll have to 
look into it
Sujoy
11-Oct-2012
[1289x2]
Thanks Kaj
i need to be able to get rebol working with push-pull, so i can get 
apps running behind zed shaw's mongrel2 server

there's nothing better than rebol for parsing - and i want to keep 
using rebol
any help hugely appreciated!
Kaj
11-Oct-2012
[1291]
I'm running Mongrel since several weeks, with Cheyenne and Fossil 
behind it
Sujoy
11-Oct-2012
[1292]
cool! so you have the mongrel pushing requests to Cheyenne?