Niko's Project Corner

Image distortion estimation and compensation

Description Robustly undistorting receipt images for easier OCR analysis.
Languages Matlab
Tags Com­puter Vi­sion
Duration Summer 2014
Modified 9th August 2014

This pro­ject's goal was to au­to­mat­ically and ro­bustly es­ti­mate and com­pen­sate dis­tor­tion from any re­ceipt pho­tos. The user is able to just snap the photo and OCR could ac­cu­rately iden­tify bought prod­ucts and their prices. How­ever this task is some­what chal­leng­ing be­cause typ­ically re­ceipts tend to get crum­bled and bent. Thus they won't lie nicely flat on a sur­face for easy anal­ysis. This set of al­go­rithms solves that prob­lem and pro­duces dis­tor­tion-free thresh­olded im­ages for the next OCR step.

The first step is to con­vert the im­age to black & white and to adap­tively thresh­old it to sep­arate black let­ters from white-to-gray back­ground. Typ­ically the light comes from top so the used cam­era, phone or a tabled tends to cast a shadow on it. Hope­fully its bor­der will be at least slightly "smooth", so that it won't get mis-thresh­olded as ac­tual ink on the re­ceipt. I used this for­mula to com­pen­sate light and shadow ef­fects from the im­age:

    I'x,y = Clamp[0, 1]( a + b · log Ix,yGaussBlurσ,x,y(I) )(1)

Ba­si­cally each (x, y) lo­ca­tion's bright­ness is com­pared to it sur­round­ing bright­ness, which is de­ter­mined by ap­ply­ing the stan­dard Gaus­sian blur at cho­sen σ. Pa­ram­eters a and b ad­just the "black­ness" range and the re­sult is clamped to have val­ues be­tween 0.0 and 1.0.

The re­sult­ing black pix­els are sep­arated to clus­ters based on their con­nec­tiv­ity with neigh­bour­ing pix­els. Clus­ters with 60 to 500 pix­els are as­sumed to be in­di­vid­ual char­ac­ters. Each valid clus­ter's ma­jor and mi­nor axises are de­ter­mined by run­ning the stan­dard PCA al­go­rithm and check­ing its ma­jor and mi­nor sin­gu­lar val­ues. Clus­ters with 75% of vari­ance on the ma­jor di­rec­tion and hav­ing a ma­jor sin­gu­lar value of at least 50 are ac­cepted for ori­en­ta­tion vot­ing. These thresh­olds fil­ter out spu­ri­ous votes from too round or too small clus­ters. The out­come can be seen in Fig­ure 1.

Figure 1: The ini­tial ro­ta­tion es­ti­ma­tion is based on as­sump­tions on the font's height to width ra­tio. Dark blue ma­jor axises are used to find the me­dian ro­ta­tion.

Af­ter the main ro­ta­tion has been com­pen­sated, the next step is to es­ti­mate dis­tor­tions caused by the per­spec­tive and/or bent re­ceipt. It is ac­com­plished by gen­er­at­ing the De­lau­nay tri­an­gu­la­tion of ob­served char­ac­ters and iden­ti­fy­ing strictly hor­izon­tal and ver­ti­cal seg­ments. And ex­am­ple re­sult can be seen in Fig­ure 2. These lines are used to ro­bustly fit the fol­low­ing lin­ear mod­els for hor­izon­tal di­rec­tion α, ver­ti­cal di­rec­tion β and the scale (S) at the lo­ca­tion (x,y):

    α(x,y) = a1,x x + b1,y y + c1(2)
    β(x,y) = a2,x x + b2,y y + c2(3)
    S(x,y) = exp(a3,x x + b3,y y + c3)(4)

Since the main ro­ta­tion has been com­pen­sated al­ready, c1 and c2 should be approximately zero. The distortion is compensated by following the dynamic model for horizontal (ph) and vertical (pv) movement:

    ∂x ph(x,y) = cos(α(x,y)) · S(x,y)(5)
    ∂y ph(x,y) = sin(α(x,y)) · S(x,y)(6)
    ∂x pv(x,y) = -sin(β(x,y)) · S(x,y)(7)
    ∂y pv(x,y) = cos(β(x,y)) · S(x,y)(8)
Figure 2: Iden­ti­fied hor­izon­tal (ma­genta) and ver­ti­cal (cyan) lines (a par­tial crop of the whole im­age).

These can be used to con­struct an in­verse map­ping which straight­ens out re­main­ing cur­va­tures and scale changes. Its out­come is vi­su­al­ized at Fig­ure 3, where it is ap­par­ent that ver­ti­cal and hor­izon­tal lines align very well with the un­der­ly­ing mono­face font.

Figure 3: Es­ti­mated un­der­ly­ing mono­face font's grid is shown in green and blue lines.

The fi­nal step is to ren­der the undis­torted im­age and to do fi­nal re­ceipt and text line de­tec­tion. Re­ceipt is hor­izon­tally cropped if sig­nif­icant black bor­der sep­arates it from the left and right parts of the im­age. Be­cause dis­tor­tions have been cor­rected al­ready these lines should be fairly ac­cu­rately purely ver­ti­cal. Then text lines are de­tected and sep­arated from oth­ers based on white hor­izon­tal rows be­tween them. This out­put should be fairly easy for OCR al­go­rithms to an­alyze.

Figure 4: The pro­cess­ing starts from the im­age on the left, it is thesh­olded and ro­tated (shown in mid­dle) and fi­nally re­main­ing dis­tor­tions are cor­rected, the re­ceipt is cropped and in­di­vid­ual lines are de­tected.

Related blog posts: