background

Niko's Project Corner

Computer Vision at other sites


Cheating at Bananagrams with real-time AI, part 1

(2nd April 2023)

Ba­nana­grams is a real-time word game in which par­tic­ipants race to build their own cross­words. It re­quires com­pre­hen­sive vo­cab­ulary, fast think­ing and good de­ci­sion mak­ing. Some times sit­ua­tions arise which re­quire an ei­ther-or de­ci­sion: do I scrap my par­tial so­lu­tion and do a fresh start or not? Or you may be left with one un­us­able ex­tra let­ter, which you can put back into the pile and pick up three ran­dom new ones. This first ar­ti­cle of the topic de­scribes a two-phase ar­ti­fi­cial neu­ral net­work which de­tects and iden­ti­fies the tiles from an im­age. The sec­ond ar­ti­cle will de­scribe how to gen­er­ate a valid cross­word from given let­ters.

Languages: Python Keras
Tags: Computer Vision

Introduction to Stable Diffusion's parameters

(10th November 2022)

Sta­ble Dif­fu­sion is an im­age gen­er­ation net­work, which was re­leased to the pub­lic in 2022. It is based on a dif­fu­sion pro­cess, in which the model gets a noisy im­age as an in­put and it tries to gen­er­ate a noise-free im­age as an out­put. This pro­cess can be guided by de­scrib­ing the tar­get im­age in plain En­glish (aka txt2img), and op­tion­ally even giv­ing it a tar­get im­age (aka. img2img). This ar­ti­cle doesn't de­scribe how the model works and how to run it your­self, in­stead this is more of a tu­to­rial on how var­ious pa­ram­eters af­fect the re­sult­ing im­age. Non-tech­ni­cal peo­ple can use these im­age gen­er­at­ing AIs via web­pages such as Ar­tis­tic.wtf (my and my friend's pro­ject), Craiyon.com, Mid­jour­ney.com and others.

Languages: Python PyTorch
Tags: Computer Vision Autoencoder Stable Diffusion

Matching puzzle pieces together

(19th July 2022)

Some peo­ple en­joy solv­ing puz­zles in the old fash­ioned way, but en­gi­neers would like to au­to­mate te­dious tasks like that. This is a well-suited task for su­per­vised learn­ing, but nat­urally it re­quires train­ing data. Gath­er­ing it from real-life puz­zles would be time-con­sum­ing as well, so I opted for gen­er­at­ing it in­stead. This gives a lot of con­trol on the data, but the re­sult­ing sys­tem might not even work with real-life in­puts. There are also sev­eral dif­fer­ent styles of puz­zles, but in this pro­ject each "base-piece" is a rect­an­gle of iden­ti­cal size. An ex­am­ple 3 × 3 puz­zle is shown in the thumb­nail.

Languages: Python Keras
Tags: Computer Vision

Image and video clustering with an autoencoder

(15th January 2022)

This ar­ti­cle de­scribes a neu­ral net­work which au­to­mat­ically pro­jects a large col­lec­tion of video frames (or im­ages) into 2D co­or­di­nates, based on their con­tent and sim­ilar­ity. It can be used to find con­tent such as ex­plo­sions from Arnold's movies, or car sce­nes from Bonds. It was orig­inally de­vel­oped to or­ga­nize over 6 hours of Go­Pro footage from Åre bike trip from the sum­mer of 2020, and cre­ate a high-res poster which shows the beau­ti­ful and vary­ing land­scape (Fig­ure 9).

Languages: Python Keras
Tags: Computer Vision Autoencoder

Helsinki Deblur Challenge 2021

(15th December 2021)

The Finnish In­verse Prob­lems So­ci­ety (FIPS) or­ga­nized the Helsinki De­blur Chal­lenge 2021 dur­ing the sum­mer and fall of 2021. The chal­lenge is to "de­blur" (de­con­volve) im­ages of a known amount of blur, and run the re­sult­ing im­age through and OCR al­go­rithm. De­blur-re­sults are scored based on how well the pytesser­act OCR al­go­rithm is able to read the text. They also kindly pro­vided un­blurred ver­sions of the pic­tures, so we can train neu­ral net­works us­ing any su­per­vised learn­ing meth­ods at hand. The net­work de­scribed in this ar­ti­cle isn't of­fi­cially reg­is­tered to the con­test, but since the eval­ua­tion dataset is also pub­lic we can run the statis­tics our­selves. Hy­per­pa­ram­eter tun­ing got a lot more dif­fi­cult once it started tak­ing 12 - 24 hours to train the model. I might re-visit this pro­ject later, but here its sta­tus de­scribed as of De­cem­ber 2021. Had the cur­rent best net­work been sub­mit­ted to the chal­lenge, it would have ranked 7th out of the 10 (nine plus this one) par­tic­ipants. There is al­ready a long list of known pos­si­ble im­prove­ments at the end of this ar­ti­cle, so stay tuned for fu­ture ar­ti­cles.

Languages: Python Keras
Tags: Computer Vision

Chess video search engine

(13th June 2021)

Youtube has a quite good search func­tion­al­ity based on video ti­tles, de­scrip­tions and maybe even sub­ti­tles but it doesn't go into ac­tual video con­tents and provide ac­cu­rate times­tamps for users' searches. An youtu­ber "Agad­ma­tor" has a very pop­ular chan­nel (1.1 mil­lion sub­scribers, 454 mil­lion video views at the time of writ­ing) which show­cases ma­jor chess games from past and re­cent tour­na­ments and on­line games. Here a search en­gine is in­tro­duced which an­alyzes the videos, rec­og­nizes chess pieces and builds a database of all of the po­si­tions on the board ready to be searched. It keeps track of the ex­act times­tamps of the videos in which the queried po­si­tion oc­curs so it is able to provide di­rect links to rel­evant videos.

Languages: Python Keras Clojure
Tags: Computer Vision Data Structures Autoencoder

Automatic map stitching

(10th September 2014)

Nowa­days there are many HTML5-based map ser­vices, but typ­ically they don't of­fer any ex­port func­tion­al­ity. To cre­ate a full view of the de­sired re­gion, one can ei­ther zoom out (and lose map de­tails) or take many screen­shots of dif­fer­ent lo­ca­tions and man­ually stitch them to­gether. This pro­ject can au­to­mat­ically load all stored screen­shots, de­tect the map, crop rel­evant re­gions, de­ter­mine im­ages rel­ative off­sets and gen­er­ate the high-res out­put with zero con­fig­ura­tion from any map ser­vice.

Languages: Matlab
Tags: Computer Vision Rendering FFT

Image distortion estimation and compensation

(9th August 2014)

This pro­ject's goal was to au­to­mat­ically and ro­bustly es­ti­mate and com­pen­sate dis­tor­tion from any re­ceipt pho­tos. The user is able to just snap the photo and OCR could ac­cu­rately iden­tify bought prod­ucts and their prices. How­ever this task is some­what chal­leng­ing be­cause typ­ically re­ceipts tend to get crum­bled and bent. Thus they won't lie nicely flat on a sur­face for easy anal­ysis. This set of al­go­rithms solves that prob­lem and pro­duces dis­tor­tion-free thresh­olded im­ages for the next OCR step.

Languages: Matlab
Tags: Computer Vision

Real-time car tracking and counting

(7th June 2014)

From my of­fice win­dow I've got an un­blocked size-view to the Ring Road I (Kehä I) in Es­poo, Fin­land. It is one of the bus­iest roads in Fin­land, hav­ing up-to 100.000 cars / day. I wanted to cre­ate a pro­gram which would re­ceive a video feed from a we­bcam and would pro­cess im­ages in real time on com­mon hard­ware.

Languages: Matlab
Tags: Computer Vision FFT

Real-time interest point tracking

(15th July 2013)

As men­tioned in an other ar­ti­cle about om­ni­di­rec­tional cam­eras, my Mas­ter's The­sis' main topic was real-time in­ter­est point ex­trac­tion and track­ing on an om­ni­di­rec­tional im­age in a chal­leng­ing forest en­vi­ron­ment. I found OpenCV's rou­ti­nes mostly rather slow and run­ning in a sin­gle thread, so I ended up im­ple­ment­ing ev­ery­thing my­self to gain more con­trol on the data flow and threads' de­pen­den­cies. The im­ple­mented code would si­mul­ta­ne­ously use 4 threads on CPU and a few hun­dred on the GPU, ex­ecut­ing in­ter­est point ex­trac­tion and match­ing at 27 fps (37 ms/frame) for 1800 × 360 pix­els (≈0.65 Mpix) panoramic im­age.

Languages: C++ FFTW CUDA
Tags: Computer Vision FFT

Omnidirectional cameras

(6th July 2013)

My Mas­ters of Sci­ence The­sis in­volved the us­age of a so-called "om­ni­di­rec­tional cam­era". There are vari­ous ways of achiev­ing 180° or even 360° view, with their dis­tinct pros and cons. The gen­eral ben­efit of these al­ter­na­tive cam­era sys­tems is that ob­jects don't need to be tracked, be­cause gen­er­ally they stay with­ing the ex­tremely broad Field of View (FoV) of the cam­era. This is also very ben­efi­cial in vi­sual odom­etry tasks, be­cause land­marks can be tracked for longer pe­ri­ods of time.

Languages: Matlab C++
Tags: Computer Vision

Coin recognition algorithm

(26th June 2013)

I de­vel­oped a coin recog­ni­tion sys­tem which can rec­og­nize eight dif­fer­ent groups of coins. The used set is all five coins of Sin­ga­pore, but a few cat­egories can­not be dis­tin­guished from each other with­out knowl­edge of the coin's size in re­la­tion to oth­ers.

Languages: Matlab
Tags: Computer Vision Machine Learning FFT

Fingerprint matching algorithm

(25th June 2013)

For my Bach­elor of Sci­ence de­gree I de­vel­oped a novel fin­ger­print match­ing al­go­rithm, which ended up beat­ing many al­ter­na­tive meth­ods which were de­vel­oped by re­search groups around the world. The used dataset the same which was used for FVC 2000 (Fin­ger­print Ver­ifi­ca­tion Com­pe­ti­tion).

Languages: Matlab C++ SDL
Tags: Computer Vision Data Structures