Niko's Project Corner

Clojure at other sites

Chess video search engine

(13th June 2021)

Youtube has a quite good search func­tion­al­ity based on video ti­tles, de­scrip­tions and maybe even sub­ti­tles but it doesn't go into ac­tual video con­tents and provide ac­cu­rate times­tamps for users' searches. An youtu­ber "Agad­ma­tor" has a very pop­ular chan­nel (1.1 mil­lion sub­scribers, 454 mil­lion video views at the time of writ­ing) which show­cases ma­jor chess games from past and re­cent tour­na­ments and on­line games. Here a search en­gine is in­tro­duced which an­alyzes the videos, rec­og­nizes chess pieces and builds a database of all of the po­si­tions on the board ready to be searched. It keeps track of the ex­act times­tamps of the videos in which the queried po­si­tion oc­curs so it is able to provide di­rect links to rel­evant videos.

Languages: Python Keras Clojure
Tags: Computer Vision Data Structures

JGit blame for fun and profit(?)

(2nd April 2018)

Soft­ware pro­jects are typ­ically "tracked" on a ver­sion con­trol sys­tem (VCS). Each "ver­sion" of the code is called a "com­mit", which does not only store file con­tents, but also plenty of meta­data. This cre­ates a very rich set of data, and in the age of open source there are thou­sands of pro­jects to study. A few ex­am­ples are Git of The­seus and Gi­ten­tial, but by fo­cus­ing on "git blame" (see who has com­mit­ted each line on each file) I hope to bring some­thing new to the table. In short I have an­alyzed how source code gets re­placed by newer code, track­ing the top­ics of who, when and why, and how old the code was.

Languages: Clojure
Tags: Git Elasticsearch

Benchmarking Elasticsearch and MS SQL on NYC Taxis

(7th May 2017)

The NYC Taxi dataset has been used on quite many bench­marks (for ex­am­ple by Mark Litwintschik), per­haps be­cause it has a quite rich set of columns but their mean­ing is mostly triv­ial to un­der­stand. I de­vel­oped a Clo­jure pro­ject which gen­er­ates Elas­tic­search and SQL queries with three dif­fer­ent tem­plates for fil­ters and four dif­fer­ent tem­plates of ag­gre­ga­tions. This should give a de­cent in­di­ca­tion of these databases per­for­mance un­der a typ­ical work­load, al­though this test did not run queries con­cur­rently and it does not mix dif­fer­ent query types when the bench­mark is run­ning. How­ever bench­marks are al­ways tricky to de­sign and ex­ecute prop­erly so I'm sure there is room for im­prove­ments. In this pro­ject the tested database en­gi­nes were Elas­tic­search 5.2.2 (with Or­acle JVM 1.8.0_121) and MS SQL Server 2014.

Languages: Clojure
Tags: GitHub Databases Elasticsearch SQL
GitHub: nikonyrh/nyc-taxi-data

Analyzing NYC Taxi dataset with Elasticsearch and Kibana

(19th March 2017)

The NYC taxi­cab dataset has seen lots of love from many data sci­en­tists such as Todd W. Schei­der and Mark Litwintschik. I de­cided to give it a go while learn­ing Clo­jure, as I sus­pected that it might be a good lan­guage for ETL jobs. This ar­ti­cle de­scribes how I loaded the dataset, nor­mal­ized its con­ven­tions and columns, con­verted from CSV to JSON and stored them to Elas­tic­search.

Languages: Clojure
Tags: GitHub JVM Elasticsearch Databases Business Intelligence Kibana
GitHub: nikonyrh/nyc-taxi-data

Mustache templates in Clojure

(25th January 2017)

Mus­tache is a well-known tem­plate sys­tem with im­ple­men­ta­tions in most pop­ular lan­guages. At its core it is log­icless same tem­plates can be di­rectly used on other pro­jects. For ex­am­ple I am plan­ning to port this blgo en­gine from PHP to Clo­jure but I only need to re­place La­TeX pars­ing and HTML gen­er­ation parts, I should be able to use ex­ist­ing Mus­tache tem­plates with­out any mod­ifi­ca­tions. To learn Clo­jure pro­gram­ming I de­cided not to use the rec­om­mended li­brary but in­stead im­ple­ment my own.

Languages: Clojure
Tags: Blog GitHub JVM
GitHub: nikonyrh/mustache-clj

English hyphenation algorithm in Clojure

(17th August 2016)

This is noth­ing that spec­tac­ular (as if any­thing on my blog is), but I still wanted to de­scribe the out­line of the pro­ject of port­ing the hy­phen­ation al­go­rithm from PHP to Clo­jure. The im­ple­men­ta­tion is only about 80 lines of code + com­ments + 20 lines of unit tests. For com­par­ison the orig­inal PHP abom­ina­tion is about is about 160 LoCs, al­though it is a bit bloated by im­ple­ment­ing the pat­terns search via a trie data struc­ture in­stead of us­ing the str­pos func­tion.

Languages: Clojure
Tags: Hyphenation Blog GitHub JVM
GitHub: nikonyrh/hyphenator-clj