r3wp [groups: 83 posts: 189283]
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

World: r3wp

[View] discuss view related issues

Ingo
14-Apr-2008
[7613]
I'm on Ubuntu, and it works for me.
Graham
14-Apr-2008
[7614]
Ubuntu is a debian based distro :)
Graham
18-Apr-2008
[7615]
Any view users here?  I'm looking for a routine that will trim the 
whitespace from an image, and will also trim junk from an image. 
I'm taking a rectangular piece of some scanned text, and if I cut 
partly thru the text above, or thru the top of the text below, I 
wish to remove those parts just keeping the word I'm interested in 
....
Henrik
18-Apr-2008
[7616]
I've been looking for something similar.
Graham
18-Apr-2008
[7617x4]
I don't really need to trim the whitespace ... just the junk
Perhaps I can just inch along the top and move down until I reach 
all whitespace across the image to reach my border.
Invert the image and then do it again ....
Henrik, are you doing some image processing too?
Henrik
18-Apr-2008
[7621]
I was thinking about an automatic cropping tool, but I'm not sure 
how to do that.
Anton
18-Apr-2008
[7622x10]
I made a very simple auto-crop function, remember ?
I can probably modify it to perform the above function...
http://anton.wildit.net.au/rebol/gfx/auto-crop.r
http://anton.wildit.net.au/rebol/gfx/demo-auto-crop.r
(that's the function from September 2007)
A possible algorithm for the new junk cropper could be:

1) advance inwards from each edge until there is a full white line 
parallel to the edge.

2) if a white line was found for each edge, then advance inwards 
again, this time searching for a non-full white line, (eg. with a 
few pixels from the desired text in it). Step back a line and you 
have your crop region.
What should I call the function that does this?
auto-trim-bitmap-text
 ?
Hmm, I like "crop" better than "trim"
auto-crop-impure-border
 ?
I think "auto-crop-bitmap-text" is the winner, for now.
Ooh. The algorithm above is too simple. The edge cases are harder 
to manage.
Graham, should the function be fully automatic or can there be user 
selection involved ? eg. we could divide the image into regions and 
let the user click the regions which are junk.
Graham, can the image be assumed to be grayscale ?
Graham
18-Apr-2008
[7632x4]
Anton, I'd like fully automatic, and yes, grayscale.
I think the algorithm can assume that if the line advances more than 
a 1/3 of the way across the depth.. there is no junk.
I personally don't need the horizontal edges cropped as it's usually 
vertical displacement that's a problem with faxes
I remembered your auto-crop function but didn't recall where it was 
.. you shift websites so often!
Anton
19-Apr-2008
[7636x2]
You should just load-thru useful looking rebol urls when you see 
them here, then you can just scan your public cache.
Are there likely to be horizontal black lines (eg. borders) which 
should be considered junk ?
Graham
19-Apr-2008
[7638]
Not in my case and I think you might then come up against the problem 
of deciding what is a border and what is a character.
Anton
19-Apr-2008
[7639x3]
Yes, I would grade the scan line according to ratio of  black : white 
 pixels on the line. Text is probably between 20-85% black pixels, 
and borders could perhaps be detected at > 95% black. Anyway, if 
you don't need it, that's much easier :)
I have something that's starting to work.

If we can preprocess the greyscale images so that they're bitonal 
(black and white), and denoised, then my algorithm has a chance.
http://anton.wildit.net.au/rebol/gfx/auto-crop-bitmap-text.r
http://anton.wildit.net.au/rebol/gfx/demo-auto-crop-bitmap-text.r
Graham
19-Apr-2008
[7642x2]
I'll give it a twirl
if they're placed in an anonymous context, how did you make them 
public?
Anton
19-Apr-2008
[7644x2]
Almost all my function libraries are in anonymous contexts.

Basically, DOing the library file (eg. do %auto-crop-bitmap-text.r) 
returns the context, and you just GET out the words you are interested 
in.
This job is eased a bit by my INCLUDE function.
You should be able to do this instead of use INCLUDE.


 auto-crop-bitmap-text: get in do %auto-crop-bitmap-text.r 'auto-crop-bitmap-text
Graham
19-Apr-2008
[7646]
ok
Anton
19-Apr-2008
[7647x2]
Now you can use the auto-crop-bitmap-text function.
It's really very simple. I've just got this include function which 
kind of hides the simplicity a bit (unfortunately). I wish something 
like that was built in to rebol.
Graham
19-Apr-2008
[7649]
ok, found some images that don't work.
Anton
19-Apr-2008
[7650]
cool, ... why not ?
Graham
19-Apr-2008
[7651]
I'll run them again ...
Anton
19-Apr-2008
[7652]
(note, the cropping is only off the top and bottom edges, ie. vertically 
cropped only.)
Graham
19-Apr-2008
[7653]
what does it do if the image is all white space?
Anton
19-Apr-2008
[7654]
good question. let me check...
Graham
19-Apr-2008
[7655]
what happens if you include two words above and below in the image?
Anton
19-Apr-2008
[7656x2]
All white does not do any cropping. It didn't find any "content" 
(non-white pixels) to crop to. What do you want it to do ?
If there's two words above and below in the image, like this:

	line one
	line two
	middle
	line four
	line five


then the 3 middle lines will be included, unless there is some white 
above the top line or some white below the bottom line (in which 
cases they will be included, respectively).
Graham
19-Apr-2008
[7658]
Hmm.... return none?
Anton
19-Apr-2008
[7659]
Let me add that to the to-do list...
Is this a common case, by the way ?
Graham
19-Apr-2008
[7660x3]
Yes
Let me send you some images ... that it appears to have failed on.
Ok, sent.  some don't have the whitespace cropped at the top.