r3wp [groups: 83 posts: 189283]
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

World: r3wp

[View] discuss view related issues

Anton
19-Apr-2008
[7647x2]
Now you can use the auto-crop-bitmap-text function.
It's really very simple. I've just got this include function which 
kind of hides the simplicity a bit (unfortunately). I wish something 
like that was built in to rebol.
Graham
19-Apr-2008
[7649]
ok, found some images that don't work.
Anton
19-Apr-2008
[7650]
cool, ... why not ?
Graham
19-Apr-2008
[7651]
I'll run them again ...
Anton
19-Apr-2008
[7652]
(note, the cropping is only off the top and bottom edges, ie. vertically 
cropped only.)
Graham
19-Apr-2008
[7653]
what does it do if the image is all white space?
Anton
19-Apr-2008
[7654]
good question. let me check...
Graham
19-Apr-2008
[7655]
what happens if you include two words above and below in the image?
Anton
19-Apr-2008
[7656x2]
All white does not do any cropping. It didn't find any "content" 
(non-white pixels) to crop to. What do you want it to do ?
If there's two words above and below in the image, like this:

	line one
	line two
	middle
	line four
	line five


then the 3 middle lines will be included, unless there is some white 
above the top line or some white below the bottom line (in which 
cases they will be included, respectively).
Graham
19-Apr-2008
[7658]
Hmm.... return none?
Anton
19-Apr-2008
[7659]
Let me add that to the to-do list...
Is this a common case, by the way ?
Graham
19-Apr-2008
[7660x3]
Yes
Let me send you some images ... that it appears to have failed on.
Ok, sent.  some don't have the whitespace cropped at the top.
Anton
19-Apr-2008
[7663]
Currently the algorithm scans downwards and upwards simultaneously, 
looking for non-white content. When it doesn't find any, it has nowhere 
to crop to, so no cropping happens. I can change it so that when 
the scans bump into each other they set that as the "content found" 
position, and, the scan lines being right next to each other, will 
result in a 0-height crop region. I will check for that case and 
return none instead.
Graham
19-Apr-2008
[7664]
some don't lose some rubbish at the bottom.
Anton
19-Apr-2008
[7665x3]
The first one show this result, indeed. Let me analyse...
I understand the bug in my code. I did not implement the weighting 
quite correctly.
hmm.. more issues... it's complex when you want to scan from top 
and from bottom simultaneously.
Graham
19-Apr-2008
[7668]
Anton, I found that the  OCR engine I am using needs a white space 
border, so I am padding the image back again with a little white 
space.
Anton
19-Apr-2008
[7669]
That would help my algorithm. Text which is right up against the 
edge is likely to be classified as 'junk'. When there is text at 
the top edge and text at the bottom edge only, then we have two possibly 
'content' texts. But which one is the content and which is the junk 
? The algorithm is forced to either make a choice (which it could 
do by choosing the larger one), or not choose at all (which is what 
currently happens), so including both as the 'content'. If you put 
just one line of white outside the text you consider 'content' then 
it will be surrounded by white and the algorithm will select it as 
'content'.
Graham
19-Apr-2008
[7670]
I would always select the larger ...
Anton
20-Apr-2008
[7671x2]
Rewritten algorithm (selects the larger now).
load-thru/update these two:
http://anton.wildit.net.au/rebol/gfx/auto-crop-bitmap-text.r
http://anton.wildit.net.au/rebol/gfx/demo-auto-crop-bitmap-text.r
And download this new test script:
http://anton.wildit.net.au/rebol/gfx/test-auto-crop-bitmap-text.r
You can fiddle with the last script to make it load your 6 test files 
(which all yield correct looking results).
Graham
20-Apr-2008
[7673x2]
Cool.
if the region is blank, your scan routine returns none, and then 
the crop errors.
Anton
20-Apr-2008
[7675]
Oops, forgot the simplest input.
Anton
21-Apr-2008
[7676x2]
I've fixed that oversight. Update these files:
	auto-crop-bitmap-text.r 
	test-auto-crop-bitmap-text.r
The above update also cleans up loose words in the auto-crop-bitmap-text.r 
file.
Graham
21-Apr-2008
[7678x2]
I added a /pad option to mine so that it returns the text with a 
white space border.
which is needed for some ocr engines
Anton
21-Apr-2008
[7680x5]
/border
 makes more sense, doesn't it ?
maybe not...
updated 
	auto-crop-bitmap-text.r
removed old code and comments (file is 3.5k smaller)
updated again
	auto-crop-bitmap-text.r
replaced old comments with new ones.
Hmm.. I think the image padding might be outside the responsibility 
of an auto-crop function. Its job is to remove stuff, not add. It's 
probably better to write a small generalised function to do the padding 
(which could be useful elsewhere) and just feed the result of the 
auto-crop to it.
Graham
21-Apr-2008
[7685x2]
you're probably right
though it could be auto-crop to  as it were.
Anton
21-Apr-2008
[7687]
I think if you're going to make an "all-in-one" function, then its 
name should reflect that. eg. 
	crop-and-pad-ready-for-ocr: func [image][
		pad-image auto-crop-bitmap-text image 1x1
	]

(where pad-image is adding a 1x1 white border around the cropped 
image.)
Graham
21-Apr-2008
[7688x2]
auto-crop-bitmap-text: func ["Returns a cropped image, or none if 
the input image was blank"
	image [image!] 
	/local region
][
	if region: find-bitmap-text-crop-region image [ 

  copy/part skip image region/1 region/2  ; return a cropped image
	]
]

Looking at this, it appears to return unset! if region is none!
How about this 

	

auto-crop-bitmap-text: func ["Returns a cropped image, or none if 
the input image was blank"
	image [image!] 
	/local region
][
	all [
	 	region: find-bitmap-text-crop-region image  
		region: copy/part skip image region/1 region/2 
	]
	region
]
Anton
21-Apr-2008
[7690x2]
IF returns none when given false.
And your code redefines the meaning of 'region (which by itself is 
bad because it can cause confusion later) unnecessarily.
I could rewrite it more simply:
	all [
		region: find-bitmap-text-crop-region image
		copy/part skip image region/1 region/2
	]
but that's just equivalent to my IF above.
Graham
21-Apr-2008
[7692x3]
Ah ... ok.
I think it would be nice now to have the crop work on the sides as 
well.
Would it be hard to write a deskewing function?  basically I guess 
one finds a best fit horizontal line for the base of the text one 
finds, and then returns the angle needed to deskew it.
Anton
22-Apr-2008
[7695]
Is it really skew or do you mean rotate ?
Graham
22-Apr-2008
[7696]
It's normally called skew but it's the same.