Benchmarking Elasticsearch and MS SQL on NYC Taxis(7th May 2017) |
|||||||
The NYC Taxi dataset has been used on quite many benchmarks (for example by Mark Litwintschik), perhaps because it has a quite rich set of columns but their meaning is mostly trivial to understand. I developed a Clojure project which generates Elasticsearch and SQL queries with three different templates for filters and four different templates of aggregations. This should give a decent indication of these databases performance under a typical workload, although this test did not run queries concurrently and it does not mix different query types when the benchmark is running. However benchmarks are always tricky to design and execute properly so I'm sure there is room for improvements. In this project the tested database engines were Elasticsearch 5.2.2 (with Oracle JVM 1.8.0_121) and MS SQL Server 2014.
|
|
||||||
Analyzing NYC Taxi dataset with Elasticsearch and Kibana(19th March 2017) |
|||||||
The NYC taxicab dataset has seen lots of love from many data scientists such as Todd W. Scheider and Mark Litwintschik. I decided to give it a go while learning Clojure, as I suspected that it might be a good language for ETL jobs. This article describes how I loaded the dataset, normalized its conventions and columns, converted from CSV to JSON and stored them to Elasticsearch.
|
|
||||||
Mustache templates in Clojure(25th January 2017) |
|||||||
Mustache is a well-known template system with implementations in most popular languages. At its core it is logicless same templates can be directly used on other projects. For example I am planning to port this blgo engine from PHP to Clojure but I only need to replace LaTeX parsing and HTML generation parts, I should be able to use existing Mustache templates without any modifications. To learn Clojure programming I decided not to use the recommended library but instead implement my own.
|
|
||||||
English hyphenation algorithm in Clojure(17th August 2016) |
|||||||
This is nothing that spectacular (as if anything on my blog is), but I still wanted to describe the outline of the project of porting the hyphenation algorithm from PHP to Clojure. The implementation is only about 80 lines of code + comments + 20 lines of unit tests. For comparison the original PHP abomination is about is about 160 LoCs, although it is a bit bloated by implementing the patterns search via a trie data structure instead of using the strpos function.
|
|
Home |
Home | (Home page) |
About | (About me) |
Platform | (About this blog) |
(Niko Nyrhilä) | |
GitHub | (nikonyrh) |
Stackoverflow | (nikonyrh) |
Bruteforcing Countdown numbe... | (2023 Apr) |
Cheating at Bananagrams with... | (2023 Apr) |
Introduction to Stable Diffu... | (2022 Nov) |
Matching puzzle pieces together | (2022 Jul) |
Single channel speech / musi... | (2022 Feb) |
Computer Vision | (13) |
GitHub | (12) |
Databases | (9) |
Elasticsearch | (6) |
FFT | (5) |
Rendering | (5) |
Applied mathematics | (4) |
Python | (13) |
C++ | (11) |
Matlab | (10) |
Keras | (6) |
Clojure | (6) |
Bash | (6) |
PHP | (6) |
Matl | Pyth | C++ | Cloj | Bash | Kera | |
Comput | 6 | 6 | 3 | 1 | 0 | 5 |
GitHub | 0 | 2 | 1 | 4 | 3 | 0 |
Databa | 0 | 3 | 2 | 2 | 1 | 0 |
Render | 3 | 0 | 3 | 0 | 0 | 0 |
Nginx | 0 | 1 | 0 | 0 | 4 | 0 |
Autoen | 0 | 3 | 0 | 1 | 0 | 2 |
Elasti | 0 | 2 | 0 | 3 | 0 | 0 |
FFT | 3 | 1 | 1 | 0 | 0 | 1 |
Data S | 2 | 1 | 2 | 1 | 0 | 1 |
JVM | 0 | 1 | 0 | 3 | 1 | 0 |
Docker | 0 | 1 | 0 | 0 | 3 | 0 |
FastCG | 0 | 0 | 3 | 0 | 0 | 0 |
Applie | 2 | 2 | 0 | 0 | 0 | 0 |
Field | 2 | 0 | 2 | 0 | 0 | 0 |
Omnidi | 2 | 0 | 2 | 0 | 0 | 0 |
Affine | 2 | 0 | 2 | 0 | 0 | 0 |
Master | 1 | 0 | 2 | 0 | 0 | 0 |
Archit | 0 | 1 | 0 | 0 | 2 | 0 |
Visual | 1 | 0 | 2 | 0 | 0 | 0 |
Spark | 0 | 1 | 0 | 0 | 2 | 0 |
Blog | 0 | 0 | 0 | 2 | 0 | 0 |
Hyphen | 0 | 0 | 0 | 2 | 0 | 0 |
Stack | 0 | 1 | 1 | 0 | 0 | 0 |
SQL | 0 | 0 | 1 | 1 | 0 | 0 |
Busine | 0 | 1 | 0 | 1 | 0 | 0 |
Signal | 0 | 1 | 0 | 0 | 0 | 1 |
Encryp | 0 | 0 | 0 | 0 | 1 | 0 |
Git | 0 | 0 | 0 | 1 | 0 | 0 |
Stable | 0 | 1 | 0 | 0 | 0 | 0 |
Redis | 0 | 1 | 0 | 0 | 0 | 0 |
Thrust | 0 | 0 | 1 | 0 | 0 | 0 |
Kibana | 0 | 0 | 0 | 1 | 0 | 0 |
Astron | 1 | 0 | 0 | 0 | 0 | 0 |
Mustac | 0 | 0 | 1 | 0 | 0 | 0 |
NAT | 0 | 0 | 0 | 0 | 1 | 0 |
jQuery | 0 | 0 | 1 | 0 | 0 | 0 |
SSH | 0 | 0 | 0 | 0 | 1 | 0 |
Happyh | 0 | 0 | 1 | 0 | 0 | 0 |
Backup | 0 | 0 | 0 | 0 | 1 | 0 |
Pthrea | 0 | 0 | 1 | 0 | 0 | 0 |
AWS | 0 | 0 | 0 | 0 | 1 | 0 |
SIFT | 0 | 0 | 1 | 0 | 0 | 0 |
SURF | 0 | 0 | 1 | 0 | 0 | 0 |
Conjug | 0 | 0 | 1 | 0 | 0 | 0 |
Kalman | 0 | 0 | 1 | 0 | 0 | 0 |
Partic | 0 | 0 | 1 | 0 | 0 | 0 |
Gradie | 0 | 0 | 1 | 0 | 0 | 0 |
Simult | 0 | 0 | 1 | 0 | 0 | 0 |
Roboti | 0 | 0 | 1 | 0 | 0 | 0 |
Princi | 1 | 0 | 0 | 0 | 0 | 0 |
Receiv | 1 | 0 | 0 | 0 | 0 | 0 |
Linear | 1 | 0 | 0 | 0 | 0 | 0 |
Suppor | 1 | 0 | 0 | 0 | 0 | 0 |
Machin | 1 | 0 | 0 | 0 | 0 | 0 |
Discre | 1 | 0 | 0 | 0 | 0 | 0 |