r3wp [groups: 83 posts: 189283]
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

World: r3wp

[View] discuss view related issues

Henrik
18-Apr-2008
[7616]
I've been looking for something similar.
Graham
18-Apr-2008
[7617x4]
I don't really need to trim the whitespace ... just the junk
Perhaps I can just inch along the top and move down until I reach 
all whitespace across the image to reach my border.
Invert the image and then do it again ....
Henrik, are you doing some image processing too?
Henrik
18-Apr-2008
[7621]
I was thinking about an automatic cropping tool, but I'm not sure 
how to do that.
Anton
18-Apr-2008
[7622x10]
I made a very simple auto-crop function, remember ?
I can probably modify it to perform the above function...
http://anton.wildit.net.au/rebol/gfx/auto-crop.r
http://anton.wildit.net.au/rebol/gfx/demo-auto-crop.r
(that's the function from September 2007)
A possible algorithm for the new junk cropper could be:

1) advance inwards from each edge until there is a full white line 
parallel to the edge.

2) if a white line was found for each edge, then advance inwards 
again, this time searching for a non-full white line, (eg. with a 
few pixels from the desired text in it). Step back a line and you 
have your crop region.
What should I call the function that does this?
auto-trim-bitmap-text
 ?
Hmm, I like "crop" better than "trim"
auto-crop-impure-border
 ?
I think "auto-crop-bitmap-text" is the winner, for now.
Ooh. The algorithm above is too simple. The edge cases are harder 
to manage.
Graham, should the function be fully automatic or can there be user 
selection involved ? eg. we could divide the image into regions and 
let the user click the regions which are junk.
Graham, can the image be assumed to be grayscale ?
Graham
18-Apr-2008
[7632x4]
Anton, I'd like fully automatic, and yes, grayscale.
I think the algorithm can assume that if the line advances more than 
a 1/3 of the way across the depth.. there is no junk.
I personally don't need the horizontal edges cropped as it's usually 
vertical displacement that's a problem with faxes
I remembered your auto-crop function but didn't recall where it was 
.. you shift websites so often!
Anton
19-Apr-2008
[7636x2]
You should just load-thru useful looking rebol urls when you see 
them here, then you can just scan your public cache.
Are there likely to be horizontal black lines (eg. borders) which 
should be considered junk ?
Graham
19-Apr-2008
[7638]
Not in my case and I think you might then come up against the problem 
of deciding what is a border and what is a character.
Anton
19-Apr-2008
[7639x3]
Yes, I would grade the scan line according to ratio of  black : white 
 pixels on the line. Text is probably between 20-85% black pixels, 
and borders could perhaps be detected at > 95% black. Anyway, if 
you don't need it, that's much easier :)
I have something that's starting to work.

If we can preprocess the greyscale images so that they're bitonal 
(black and white), and denoised, then my algorithm has a chance.
http://anton.wildit.net.au/rebol/gfx/auto-crop-bitmap-text.r
http://anton.wildit.net.au/rebol/gfx/demo-auto-crop-bitmap-text.r
Graham
19-Apr-2008
[7642x2]
I'll give it a twirl
if they're placed in an anonymous context, how did you make them 
public?
Anton
19-Apr-2008
[7644x2]
Almost all my function libraries are in anonymous contexts.

Basically, DOing the library file (eg. do %auto-crop-bitmap-text.r) 
returns the context, and you just GET out the words you are interested 
in.
This job is eased a bit by my INCLUDE function.
You should be able to do this instead of use INCLUDE.


 auto-crop-bitmap-text: get in do %auto-crop-bitmap-text.r 'auto-crop-bitmap-text
Graham
19-Apr-2008
[7646]
ok
Anton
19-Apr-2008
[7647x2]
Now you can use the auto-crop-bitmap-text function.
It's really very simple. I've just got this include function which 
kind of hides the simplicity a bit (unfortunately). I wish something 
like that was built in to rebol.
Graham
19-Apr-2008
[7649]
ok, found some images that don't work.
Anton
19-Apr-2008
[7650]
cool, ... why not ?
Graham
19-Apr-2008
[7651]
I'll run them again ...
Anton
19-Apr-2008
[7652]
(note, the cropping is only off the top and bottom edges, ie. vertically 
cropped only.)
Graham
19-Apr-2008
[7653]
what does it do if the image is all white space?
Anton
19-Apr-2008
[7654]
good question. let me check...
Graham
19-Apr-2008
[7655]
what happens if you include two words above and below in the image?
Anton
19-Apr-2008
[7656x2]
All white does not do any cropping. It didn't find any "content" 
(non-white pixels) to crop to. What do you want it to do ?
If there's two words above and below in the image, like this:

	line one
	line two
	middle
	line four
	line five


then the 3 middle lines will be included, unless there is some white 
above the top line or some white below the bottom line (in which 
cases they will be included, respectively).
Graham
19-Apr-2008
[7658]
Hmm.... return none?
Anton
19-Apr-2008
[7659]
Let me add that to the to-do list...
Is this a common case, by the way ?
Graham
19-Apr-2008
[7660x3]
Yes
Let me send you some images ... that it appears to have failed on.
Ok, sent.  some don't have the whitespace cropped at the top.
Anton
19-Apr-2008
[7663]
Currently the algorithm scans downwards and upwards simultaneously, 
looking for non-white content. When it doesn't find any, it has nowhere 
to crop to, so no cropping happens. I can change it so that when 
the scans bump into each other they set that as the "content found" 
position, and, the scan lines being right next to each other, will 
result in a 0-height crop region. I will check for that case and 
return none instead.
Graham
19-Apr-2008
[7664]
some don't lose some rubbish at the bottom.
Anton
19-Apr-2008
[7665]
The first one show this result, indeed. Let me analyse...