background

Niko's Project Corner

Python at other sites


Bruteforcing Countdown numbers game with CUDA

(19th April 2023)

Youtu­ber An­other Roof posed an in­ter­est­ing ques­tion on his video "The Sur­pris­ing Maths of Britain's Old­est* Game Show". The chal­lenge was stated in the video's de­scrip­tion: "I want to see a list of the per­cent­age of solv­able games for ALL op­tions of large num­bers. Like I did for the 15 op­tions of the form {n, n+25, n+50, n+75}, but for all of them. The op­tions for large num­bers should be four dis­tinct num­bers in the range from 11 to 100. As I said there are 2555190 such op­tions so this will re­quire a clever bit of code, but I think it’s pos­si­ble!". His ref­er­ence Python im­ple­men­ta­tion would have taken 1055 days (25000 hours) of CPU time (when all four large num­bers are used in the game), but with CUDA and a RTX 4080 card it could be solved in just 0.8 hours, or 31000x faster!

Languages: Python CUDA Numba
Tags: Applied mathematics

Cheating at Bananagrams with real-time AI, part 1

(2nd April 2023)

Ba­nana­grams is a real-time word game in which par­tic­ipants race to build their own cross­words. It re­quires com­pre­hen­sive vo­cab­ulary, fast think­ing and good de­ci­sion mak­ing. Some times sit­ua­tions arise which re­quire an ei­ther-or de­ci­sion: do I scrap my par­tial so­lu­tion and do a fresh start or not? Or you may be left with one un­us­able ex­tra let­ter, which you can put back into the pile and pick up three ran­dom new ones. This first ar­ti­cle of the topic de­scribes a two-phase ar­ti­fi­cial neu­ral net­work which de­tects and iden­ti­fies the tiles from an im­age. The sec­ond ar­ti­cle will de­scribe how to gen­er­ate a valid cross­word from given let­ters.

Languages: Python Keras
Tags: Computer Vision

Introduction to Stable Diffusion's parameters

(10th November 2022)

Sta­ble Dif­fu­sion is an im­age gen­er­ation net­work, which was re­leased to the pub­lic in 2022. It is based on a dif­fu­sion pro­cess, in which the model gets a noisy im­age as an in­put and it tries to gen­er­ate a noise-free im­age as an out­put. This pro­cess can be guided by de­scrib­ing the tar­get im­age in plain En­glish (aka txt2img), and op­tion­ally even giv­ing it a tar­get im­age (aka. img2img). This ar­ti­cle doesn't de­scribe how the model works and how to run it your­self, in­stead this is more of a tu­to­rial on how var­ious pa­ram­eters af­fect the re­sult­ing im­age. Non-tech­ni­cal peo­ple can use these im­age gen­er­at­ing AIs via web­pages such as Ar­tis­tic.wtf (my and my friend's pro­ject), Craiyon.com, Mid­jour­ney.com and others.

Languages: Python PyTorch
Tags: Computer Vision Autoencoder Stable Diffusion

Matching puzzle pieces together

(19th July 2022)

Some peo­ple en­joy solv­ing puz­zles in the old fash­ioned way, but en­gi­neers would like to au­to­mate te­dious tasks like that. This is a well-suited task for su­per­vised learn­ing, but nat­urally it re­quires train­ing data. Gath­er­ing it from real-life puz­zles would be time-con­sum­ing as well, so I opted for gen­er­at­ing it in­stead. This gives a lot of con­trol on the data, but the re­sult­ing sys­tem might not even work with real-life in­puts. There are also sev­eral dif­fer­ent styles of puz­zles, but in this pro­ject each "base-piece" is a rect­an­gle of iden­ti­cal size. An ex­am­ple 3 × 3 puz­zle is shown in the thumb­nail.

Languages: Python Keras
Tags: Computer Vision

Single channel speech / music separation

(3rd February 2022)

Hu­mans are nat­urally ca­pa­ble of sep­arat­ing an ob­ject from the back­ground of an im­age, or speech from mu­sic on an au­dio clip. Photo edit­ing is an easy task, but per­son­ally I don't know how to re­move mu­sic from the back­ground. A first ap­proach would be to use band-pass fil­ters, but it wouldn't re­sult in a sat­is­fac­tory end re­sult since there is so much over­lap be­tween the fre­quen­cies. This ar­ti­cle de­scripbes a su­per­vised learn­ing ap­proach on solv­ing this prob­lem.

Languages: Python Keras
Tags: Signal Processing FFT

Image and video clustering with an autoencoder

(15th January 2022)

This ar­ti­cle de­scribes a neu­ral net­work which au­to­mat­ically pro­jects a large col­lec­tion of video frames (or im­ages) into 2D co­or­di­nates, based on their con­tent and sim­ilar­ity. It can be used to find con­tent such as ex­plo­sions from Arnold's movies, or car sce­nes from Bonds. It was orig­inally de­vel­oped to or­ga­nize over 6 hours of Go­Pro footage from Åre bike trip from the sum­mer of 2020, and cre­ate a high-res poster which shows the beau­ti­ful and vary­ing land­scape (Fig­ure 9).

Languages: Python Keras
Tags: Computer Vision Autoencoder

Helsinki Deblur Challenge 2021

(15th December 2021)

The Finnish In­verse Prob­lems So­ci­ety (FIPS) or­ga­nized the Helsinki De­blur Chal­lenge 2021 dur­ing the sum­mer and fall of 2021. The chal­lenge is to "de­blur" (de­con­volve) im­ages of a known amount of blur, and run the re­sult­ing im­age through and OCR al­go­rithm. De­blur-re­sults are scored based on how well the pytesser­act OCR al­go­rithm is able to read the text. They also kindly pro­vided un­blurred ver­sions of the pic­tures, so we can train neu­ral net­works us­ing any su­per­vised learn­ing meth­ods at hand. The net­work de­scribed in this ar­ti­cle isn't of­fi­cially reg­is­tered to the con­test, but since the eval­ua­tion dataset is also pub­lic we can run the statis­tics our­selves. Hy­per­pa­ram­eter tun­ing got a lot more dif­fi­cult once it started tak­ing 12 - 24 hours to train the model. I might re-visit this pro­ject later, but here its sta­tus de­scribed as of De­cem­ber 2021. Had the cur­rent best net­work been sub­mit­ted to the chal­lenge, it would have ranked 7th out of the 10 (nine plus this one) par­tic­ipants. There is al­ready a long list of known pos­si­ble im­prove­ments at the end of this ar­ti­cle, so stay tuned for fu­ture ar­ti­cles.

Languages: Python Keras
Tags: Computer Vision

Satellite crash course

(13th June 2021)

Since hear­ing the news of a fal­ing Chi­nese rocket booster Long March 5B and be­ing re­minded that Earth's sur­face is about 70% ocean, I got in­ter­ested on how the or­bital pa­ram­eters af­fect the odds of crash­ing to ocean vs. ground. I was nearly fin­ished with the pro­ject when I re­al­ized that Earth is ro­tat­ing un­der the satel­lite, thus in­val­idat­ing all the re­sults! This doesn't take or­bits' ec­cen­tric­ity into ac­count ei­ther, but I've heard that the ath­mo­spheric drag has a den­dency of re­duc­ing it to zero as the or­bit falls. Any­way, I found the var­ious "straight" paths around the globe in­ter­est­ing and de­cided to pub­lish these re­sults any­way. In con­clu­sion there are paths which spend only 9 % on top of land (in­clud­ing lakes) and 91% on top of ocean or up-to 57% on top of land and 43% on top of ocean.

Languages: Python
Tags: Applied mathematics

Chess video search engine

(13th June 2021)

Youtube has a quite good search func­tion­al­ity based on video ti­tles, de­scrip­tions and maybe even sub­ti­tles but it doesn't go into ac­tual video con­tents and provide ac­cu­rate times­tamps for users' searches. An youtu­ber "Agad­ma­tor" has a very pop­ular chan­nel (1.1 mil­lion sub­scribers, 454 mil­lion video views at the time of writ­ing) which show­cases ma­jor chess games from past and re­cent tour­na­ments and on­line games. Here a search en­gine is in­tro­duced which an­alyzes the videos, rec­og­nizes chess pieces and builds a database of all of the po­si­tions on the board ready to be searched. It keeps track of the ex­act times­tamps of the videos in which the queried po­si­tion oc­curs so it is able to provide di­rect links to rel­evant videos.

Languages: Python Keras Clojure
Tags: Computer Vision Data Structures Autoencoder

An efficient schema for hierarchical data on Elasticsearch

(20th November 2016)

Many busi­nesses gen­er­ate rich datasets from which valu­able in­sights can be dis­cov­ered. A ba­sic start­ing point is to an­alyze sep­arate events such as item sales, tourist at­trac­tion vis­its or movies seen. From these a time se­ries (to­tal sales / item / day, to­tal vis­its / tourist spot / week) or ba­sic met­rics (his­togram of movie rat­ings) can be ag­gre­gated. Things get a lot more in­ter­est­ing when in­di­vid­ual data points can be linked to­gether by a com­mon id, such as items be­ing bought in the same bas­ket or by the same house hold (iden­ti­fied by a loy­alty card), the spots vis­ited by a tourist group through out their jour­ney or movie rat­ings given by a speci­fic user. This richer data can be used to build rec­om­men­da­tion en­gi­nes, iden­tify sub­sti­tute prod­ucts or ser­vices and do clus­ter­ing anal­ysis. This ar­ti­cle de­scribes a schema for Elas­tic­search which sup­ports ef­fi­cient fil­ter­ing and ag­gre­ga­tions, and is au­to­mat­ically com­pat­ible with new data val­ues.

Languages: Python
Tags: Business Intelligence Databases Elasticsearch

Caching and perf. monitoring with Redis and Python

(10th October 2016)

When im­ple­ment­ing real-time APIs most of the time server load can greatly be re­duced by caching fre­quently ac­cessed and rarely mod­ified data, or re-us­able cal­cu­la­tion re­sults. Luck­ily Python has sev­eral fea­tures which make it easy to add new con­structs and wrap­pers to the lan­guage, for ex­am­ple thanks to *args, **kwargs func­tion ar­gu­ments, first-class func­tions, dec­ora­tors and so fort. Thus it doesn't take too much ef­fort to im­ple­ment a @cached dec­ora­tor with business-specific logic on cache invalidation. Redis is the perfect fit for the job thanks to its high performance, binary-friendly key-value store with TTL and different data eviction policies and support for other data structures which make it trivial to store additional key metrics there.

Languages: Python
Tags: Databases Redis

Scalable analytics with Docker, Spark and Python

(23rd December 2015)

Tra­di­tion­ally data sci­en­tists in­stalled soft­ware pack­ages di­rectly to their ma­chi­nes, wrote code, trained mod­els, saved re­sults to lo­cal files and ap­plied mod­els to new data in batch pro­cess­ing style. New data-driven prod­ucts re­quire rapid de­vel­op­ment of new mod­els, scal­able train­ing and easy in­te­gra­tion to other as­pects of the busi­ness. Here I am propos­ing one (per­haps al­ready well-known) cloud-ready ar­chi­tec­ture to meet these re­quire­ments.

Languages: Bash Python
Tags: Architecture Docker Spark Nginx GitHub JVM
GitHub: nikonyrh/docker-scripts

Very fuzzy searching with Elasticsearch

(21st October 2015)

I en­coun­tered an in­ter­est­ing ques­tion at Stack Over­flow about fuzzy search­ing of hashes on Elas­tic­search and de­cided to give it a go. It has na­tive sup­port for fuzzy text searches but due to per­for­mance rea­sons it only sup­ports an edit dis­tance up-to 2. In this con­text the max­imum al­lowed dis­tance was eight so an al­ter­na­tive so­lu­tion was needed. A so­lu­tion was found from lo­cal­ity-sen­si­tive hash­ing.

Languages: Python
Tags: Elasticsearch Databases GitHub Stack Overflow
GitHub: nikonyrh/stackoverflow-scripts