Niko's Project Corner

This project's goal was to automatically and robustly estimate and compensate distortion from any receipt photos. The user is able to just snap the photo and OCR could accurately identify bought products and their prices. However this task is somewhat challenging because typically receipts tend to get crumbled and bent. Thus they won't lie nicely flat on a surface for easy analysis. This set of algorithms solves that problem and produces distortion-free thresholded images for the next OCR step.

The first step is to convert the image to black & white and to adaptively threshold it to separate black letters from white-to-gray background. Typically the light comes from top so the used camera, phone or a tabled tends to cast a shadow on it. Hopefully its border will be at least slightly "smooth", so that it won't get mis-thresholded as actual ink on the receipt. I used this formula to compensate light and shadow effects from the image:

_x,y

_{[0, 1]}

^I_x,y

_{GaussBlur_σ,x,y(I)}

(1)

Basically each (x, y) location's brightness is compared to it surrounding brightness, which is determined by applying the standard Gaussian blur at chosen σ. Parameters a and b adjust the "blackness" range and the result is clamped to have values between 0.0 and 1.0.

The resulting black pixels are separated to clusters based on their connectivity with neighbouring pixels. Clusters with 60 to 500 pixels are assumed to be individual characters. Each valid cluster's major and minor axises are determined by running the standard PCA algorithm and checking its major and minor singular values. Clusters with 75% of variance on the major direction and having a major singular value of at least 50 are accepted for orientation voting. These thresholds filter out spurious votes from too round or too small clusters. The outcome can be seen in Figure 1.

Figure 1: The initial rotation estimation is based on assumptions on the font's height to width ratio. Dark blue major axises are used to find the median rotation.

After the main rotation has been compensated, the next step is to estimate distortions caused by the perspective and/or bent receipt. It is accomplished by generating the Delaunay triangulation of observed characters and identifying strictly horizontal and vertical segments. And example result can be seen in Figure 2. These lines are used to robustly fit the following linear models for horizontal direction α, vertical direction β and the scale (S) at the location (x,y):

_1,x

_1,y

₁

(2)

_2,x

_2,y

₂

(3)

_3,x

_3,y

₃

(4)

Since the main rotation has been compensated already, c₁ and c₂ should be approximately zero. The distortion is compensated by following the dynamic model for horizontal (p_h) and vertical (p_v) movement:

^∂

_∂x

(5)

^∂

_∂y

(6)

and

^∂

_∂x

(7)

^∂

_∂y

(8)

Figure 2: Identified horizontal (magenta) and vertical (cyan) lines (a partial crop of the whole image).

These can be used to construct an inverse mapping which straightens out remaining curvatures and scale changes. Its outcome is visualized at Figure 3, where it is apparent that vertical and horizontal lines align very well with the underlying monoface font.

Figure 3: Estimated underlying monoface font's grid is shown in green and blue lines.

The final step is to render the undistorted image and to do final receipt and text line detection. Receipt is horizontally cropped if significant black border separates it from the left and right parts of the image. Because distortions have been corrected already these lines should be fairly accurately purely vertical. Then text lines are detected and separated from others based on white horizontal rows between them. This output should be fairly easy for OCR algorithms to analyze.

Figure 4: The processing starts from the image on the left, it is thesholded and rotated (shown in middle) and finally remaining distortions are corrected, the receipt is cropped and individual lines are detected.

Cheating at Bananagrams with real-time AI, part 1, 2023 Apr (Matching: Computer Vision)
Introduction to Stable Diffusion's parameters, 2022 Nov (Matching: Computer Vision)
Matching puzzle pieces together, 2022 Jul (Matching: Computer Vision)
Image and video clustering with an autoencoder, 2022 Jan (Matching: Computer Vision)
Helsinki Deblur Challenge 2021, 2021 Dec (Matching: Computer Vision)

Home	(Home page)
About	(About me)
Platform	(About this blog)

LinkedIn	(Niko Nyrhilä)
GitHub	(nikonyrh)
Stackoverflow	(nikonyrh)

Bruteforcing Countdown numbe...	(2023 Apr)
Cheating at Bananagrams with...	(2023 Apr)
Introduction to Stable Diffu...	(2022 Nov)
Matching puzzle pieces together	(2022 Jul)
Single channel speech / musi...	(2022 Feb)

Computer Vision	(13)
GitHub	(12)
Databases	(9)
Elasticsearch	(6)
FFT	(5)
Rendering	(5)
Applied mathematics	(4)

Image distortion estimation and compensation

Related blog posts:

Home

Navigation

External

Most recent

Most frequent tags

Most frequent languages

Co-occurrence matrix

	Matl	Pyth	C++	Cloj	Bash	Kera
Comput	6	6	3	1	0	5
GitHub	0	2	1	4	3	0
Databa	0	3	2	2	1	0
Render	3	0	3	0	0	0
Nginx	0	1	0	0	4	0
Autoen	0	3	0	1	0	2
Elasti	0	2	0	3	0	0
FFT	3	1	1	0	0	1
Data S	2	1	2	1	0	1
JVM	0	1	0	3	1	0
Docker	0	1	0	0	3	0
FastCG	0	0	3	0	0	0
Applie	2	2	0	0	0	0
Field	2	0	2	0	0	0
Omnidi	2	0	2	0	0	0
Affine	2	0	2	0	0	0
Master	1	0	2	0	0	0
Archit	0	1	0	0	2	0
Visual	1	0	2	0	0	0
Spark	0	1	0	0	2	0
Blog	0	0	0	2	0	0
Hyphen	0	0	0	2	0	0
Stack	0	1	1	0	0	0
SQL	0	0	1	1	0	0
Busine	0	1	0	1	0	0
Signal	0	1	0	0	0	1
Encryp	0	0	0	0	1	0
Git	0	0	0	1	0	0
Stable	0	1	0	0	0	0
Redis	0	1	0	0	0	0
Thrust	0	0	1	0	0	0
Kibana	0	0	0	1	0	0
Astron	1	0	0	0	0	0
Mustac	0	0	1	0	0	0
NAT	0	0	0	0	1	0
jQuery	0	0	1	0	0	0
SSH	0	0	0	0	1	0
Happyh	0	0	1	0	0	0
Backup	0	0	0	0	1	0
Pthrea	0	0	1	0	0	0
AWS	0	0	0	0	1	0
SIFT	0	0	1	0	0	0
SURF	0	0	1	0	0	0
Conjug	0	0	1	0	0	0
Kalman	0	0	1	0	0	0
Partic	0	0	1	0	0	0
Gradie	0	0	1	0	0	0
Simult	0	0	1	0	0	0
Roboti	0	0	1	0	0	0
Princi	1	0	0	0	0	0
Receiv	1	0	0	0	0	0
Linear	1	0	0	0	0	0
Suppor	1	0	0	0	0	0
Machin	1	0	0	0	0	0
Discre	1	0	0	0	0	0

Python	(13)
C++	(11)
Matlab	(10)
Keras	(6)
Clojure	(6)
Bash	(6)
PHP	(6)