Niko's Project Corner

Language Clojure and tag GitHub at other sites

Benchmarking Elasticsearch and MS SQL on NYC Taxis

(7th May 2017)

The NYC Taxi dataset has been used on quite many bench­marks (for ex­am­ple by Mark Litwintschik), per­haps be­cause it has a quite rich set of columns but their mean­ing is mostly triv­ial to un­der­stand. I de­vel­oped a Clo­jure pro­ject which gen­er­ates Elas­tic­search and SQL queries with three dif­fer­ent tem­plates for fil­ters and four dif­fer­ent tem­plates of ag­gre­ga­tions. This should give a de­cent in­di­ca­tion of these databases per­for­mance un­der a typ­ical work­load, al­though this test did not run queries con­cur­rently and it does not mix dif­fer­ent query types when the bench­mark is run­ning. How­ever bench­marks are al­ways tricky to de­sign and ex­ecute prop­erly so I'm sure there is room for im­prove­ments. In this pro­ject the tested database en­gi­nes were Elas­tic­search 5.2.2 (with Or­acle JVM 1.8.0_121) and MS SQL Server 2014.

Languages: Clojure
Tags: GitHub Databases Elasticsearch SQL
GitHub: nikonyrh/nyc-taxi-data

Analyzing NYC Taxi dataset with Elasticsearch and Kibana

(19th March 2017)

The NYC taxi­cab dataset has seen lots of love from many data sci­en­tists such as Todd W. Schei­der and Mark Litwintschik. I de­cided to give it a go while learn­ing Clo­jure, as I sus­pected that it might be a good lan­guage for ETL jobs. This ar­ti­cle de­scribes how I loaded the dataset, nor­mal­ized its con­ven­tions and columns, con­verted from CSV to JSON and stored them to Elas­tic­search.

Languages: Clojure
Tags: GitHub JVM Elasticsearch Databases Business Intelligence Kibana
GitHub: nikonyrh/nyc-taxi-data

Mustache templates in Clojure

(25th January 2017)

Mus­tache is a well-known tem­plate sys­tem with im­ple­men­ta­tions in most pop­ular lan­guages. At its core it is log­icless same tem­plates can be di­rectly used on other pro­jects. For ex­am­ple I am plan­ning to port this blgo en­gine from PHP to Clo­jure but I only need to re­place La­TeX pars­ing and HTML gen­er­ation parts, I should be able to use ex­ist­ing Mus­tache tem­plates with­out any mod­ifi­ca­tions. To learn Clo­jure pro­gram­ming I de­cided not to use the rec­om­mended li­brary but in­stead im­ple­ment my own.

Languages: Clojure
Tags: Blog GitHub JVM
GitHub: nikonyrh/mustache-clj

English hyphenation algorithm in Clojure

(17th August 2016)

This is noth­ing that spec­tac­ular (as if any­thing on my blog is), but I still wanted to de­scribe the out­line of the pro­ject of port­ing the hy­phen­ation al­go­rithm from PHP to Clo­jure. The im­ple­men­ta­tion is only about 80 lines of code + com­ments + 20 lines of unit tests. For com­par­ison the orig­inal PHP abom­ina­tion is about is about 160 LoCs, al­though it is a bit bloated by im­ple­ment­ing the pat­terns search via a trie data struc­ture in­stead of us­ing the str­pos func­tion.

Languages: Clojure
Tags: Hyphenation Blog GitHub JVM
GitHub: nikonyrh/hyphenator-clj