World: r3wp
[View] discuss view related issues
older newer | first last |
Henrik 18-Apr-2008 [7616] | I've been looking for something similar. |
Graham 18-Apr-2008 [7617x4] | I don't really need to trim the whitespace ... just the junk |
Perhaps I can just inch along the top and move down until I reach all whitespace across the image to reach my border. | |
Invert the image and then do it again .... | |
Henrik, are you doing some image processing too? | |
Henrik 18-Apr-2008 [7621] | I was thinking about an automatic cropping tool, but I'm not sure how to do that. |
Anton 18-Apr-2008 [7622x10] | I made a very simple auto-crop function, remember ? I can probably modify it to perform the above function... |
http://anton.wildit.net.au/rebol/gfx/auto-crop.r http://anton.wildit.net.au/rebol/gfx/demo-auto-crop.r | |
(that's the function from September 2007) | |
A possible algorithm for the new junk cropper could be: 1) advance inwards from each edge until there is a full white line parallel to the edge. 2) if a white line was found for each edge, then advance inwards again, this time searching for a non-full white line, (eg. with a few pixels from the desired text in it). Step back a line and you have your crop region. | |
What should I call the function that does this? auto-trim-bitmap-text ? | |
Hmm, I like "crop" better than "trim" auto-crop-impure-border ? | |
I think "auto-crop-bitmap-text" is the winner, for now. | |
Ooh. The algorithm above is too simple. The edge cases are harder to manage. | |
Graham, should the function be fully automatic or can there be user selection involved ? eg. we could divide the image into regions and let the user click the regions which are junk. | |
Graham, can the image be assumed to be grayscale ? | |
Graham 18-Apr-2008 [7632x4] | Anton, I'd like fully automatic, and yes, grayscale. |
I think the algorithm can assume that if the line advances more than a 1/3 of the way across the depth.. there is no junk. | |
I personally don't need the horizontal edges cropped as it's usually vertical displacement that's a problem with faxes | |
I remembered your auto-crop function but didn't recall where it was .. you shift websites so often! | |
Anton 19-Apr-2008 [7636x2] | You should just load-thru useful looking rebol urls when you see them here, then you can just scan your public cache. |
Are there likely to be horizontal black lines (eg. borders) which should be considered junk ? | |
Graham 19-Apr-2008 [7638] | Not in my case and I think you might then come up against the problem of deciding what is a border and what is a character. |
Anton 19-Apr-2008 [7639x3] | Yes, I would grade the scan line according to ratio of black : white pixels on the line. Text is probably between 20-85% black pixels, and borders could perhaps be detected at > 95% black. Anyway, if you don't need it, that's much easier :) |
I have something that's starting to work. If we can preprocess the greyscale images so that they're bitonal (black and white), and denoised, then my algorithm has a chance. | |
http://anton.wildit.net.au/rebol/gfx/auto-crop-bitmap-text.r http://anton.wildit.net.au/rebol/gfx/demo-auto-crop-bitmap-text.r | |
Graham 19-Apr-2008 [7642x2] | I'll give it a twirl |
if they're placed in an anonymous context, how did you make them public? | |
Anton 19-Apr-2008 [7644x2] | Almost all my function libraries are in anonymous contexts. Basically, DOing the library file (eg. do %auto-crop-bitmap-text.r) returns the context, and you just GET out the words you are interested in. This job is eased a bit by my INCLUDE function. |
You should be able to do this instead of use INCLUDE. auto-crop-bitmap-text: get in do %auto-crop-bitmap-text.r 'auto-crop-bitmap-text | |
Graham 19-Apr-2008 [7646] | ok |
Anton 19-Apr-2008 [7647x2] | Now you can use the auto-crop-bitmap-text function. |
It's really very simple. I've just got this include function which kind of hides the simplicity a bit (unfortunately). I wish something like that was built in to rebol. | |
Graham 19-Apr-2008 [7649] | ok, found some images that don't work. |
Anton 19-Apr-2008 [7650] | cool, ... why not ? |
Graham 19-Apr-2008 [7651] | I'll run them again ... |
Anton 19-Apr-2008 [7652] | (note, the cropping is only off the top and bottom edges, ie. vertically cropped only.) |
Graham 19-Apr-2008 [7653] | what does it do if the image is all white space? |
Anton 19-Apr-2008 [7654] | good question. let me check... |
Graham 19-Apr-2008 [7655] | what happens if you include two words above and below in the image? |
Anton 19-Apr-2008 [7656x2] | All white does not do any cropping. It didn't find any "content" (non-white pixels) to crop to. What do you want it to do ? |
If there's two words above and below in the image, like this: line one line two middle line four line five then the 3 middle lines will be included, unless there is some white above the top line or some white below the bottom line (in which cases they will be included, respectively). | |
Graham 19-Apr-2008 [7658] | Hmm.... return none? |
Anton 19-Apr-2008 [7659] | Let me add that to the to-do list... Is this a common case, by the way ? |
Graham 19-Apr-2008 [7660x3] | Yes |
Let me send you some images ... that it appears to have failed on. | |
Ok, sent. some don't have the whitespace cropped at the top. | |
Anton 19-Apr-2008 [7663] | Currently the algorithm scans downwards and upwards simultaneously, looking for non-white content. When it doesn't find any, it has nowhere to crop to, so no cropping happens. I can change it so that when the scans bump into each other they set that as the "content found" position, and, the scan lines being right next to each other, will result in a 0-height crop region. I will check for that case and return none instead. |
Graham 19-Apr-2008 [7664] | some don't lose some rubbish at the bottom. |
Anton 19-Apr-2008 [7665] | The first one show this result, indeed. Let me analyse... |
older newer | first last |