An efficient schema for hierarchical data on Elasticsearch(20th November 2016) |
|||||||
|
Many businesses generate rich datasets from which valuable insights can be discovered. A basic starting point is to analyze separate events such as item sales, tourist attraction visits or movies seen. From these a time series (total sales / item / day, total visits / tourist spot / week) or basic metrics (histogram of movie ratings) can be aggregated. Things get a lot more interesting when individual data points can be linked together by a common id, such as items being bought in the same basket or by the same house hold (identified by a loyalty card), the spots visited by a tourist group through out their journey or movie ratings given by a specific user. This richer data can be used to build recommendation engines, identify substitute products or services and do clustering analysis. This article describes a schema for Elasticsearch which supports efficient filtering and aggregations, and is automatically compatible with new data values.
|
|
||||||
Caching and perf. monitoring with Redis and Python(10th October 2016) |
|||||||
|
When implementing real-time APIs most of the time server load can greatly be reduced by caching frequently accessed and rarely modified data, or re-usable calculation results. Luckily Python has several features which make it easy to add new constructs and wrappers to the language, for example thanks to *args, **kwargs function arguments, first-class functions, decorators and so fort. Thus it doesn't take too much effort to implement a @cached decorator with business-specific logic on cache invalidation. Redis is the perfect fit for the job thanks to its high performance, binary-friendly key-value store with TTL and different data eviction policies and support for other data structures which make it trivial to store additional key metrics there.
|
|
||||||
Very fuzzy searching with Elasticsearch(21st October 2015) |
|||||||
|
I encountered an interesting question at Stack Overflow about fuzzy searching of hashes on Elasticsearch and decided to give it a go. It has native support for fuzzy text searches but due to performance reasons it only supports an edit distance up-to 2. In this context the maximum allowed distance was eight so an alternative solution was needed. A solution was found from locality-sensitive hashing.
|
|
||||||
Home
|
| Home | (Home page) |
| About | (About me) |
| Platform | (About this blog) |
| (Niko Nyrhilä) | |
| GitHub | (nikonyrh) |
| Stackoverflow | (nikonyrh) |
| Bruteforcing Countdown numbe... | (2023 Apr) |
| Cheating at Bananagrams with... | (2023 Apr) |
| Introduction to Stable Diffu... | (2022 Nov) |
| Matching puzzle pieces together | (2022 Jul) |
| Single channel speech / musi... | (2022 Feb) |
| Computer Vision | (13) |
| GitHub | (12) |
| Databases | (9) |
| Elasticsearch | (6) |
| FFT | (5) |
| Rendering | (5) |
| Applied mathematics | (4) |
| Python | (13) |
| C++ | (11) |
| Matlab | (10) |
| Keras | (6) |
| Clojure | (6) |
| Bash | (6) |
| PHP | (6) |
| Matl | Pyth | C++ | Cloj | Bash | Kera | |
| Comput | 6 | 6 | 3 | 1 | 0 | 5 |
| GitHub | 0 | 2 | 1 | 4 | 3 | 0 |
| Databa | 0 | 3 | 2 | 2 | 1 | 0 |
| Render | 3 | 0 | 3 | 0 | 0 | 0 |
| Nginx | 0 | 1 | 0 | 0 | 4 | 0 |
| Autoen | 0 | 3 | 0 | 1 | 0 | 2 |
| Elasti | 0 | 2 | 0 | 3 | 0 | 0 |
| FFT | 3 | 1 | 1 | 0 | 0 | 1 |
| Data S | 2 | 1 | 2 | 1 | 0 | 1 |
| JVM | 0 | 1 | 0 | 3 | 1 | 0 |
| Docker | 0 | 1 | 0 | 0 | 3 | 0 |
| FastCG | 0 | 0 | 3 | 0 | 0 | 0 |
| Applie | 2 | 2 | 0 | 0 | 0 | 0 |
| Field | 2 | 0 | 2 | 0 | 0 | 0 |
| Omnidi | 2 | 0 | 2 | 0 | 0 | 0 |
| Affine | 2 | 0 | 2 | 0 | 0 | 0 |
| Master | 1 | 0 | 2 | 0 | 0 | 0 |
| Archit | 0 | 1 | 0 | 0 | 2 | 0 |
| Visual | 1 | 0 | 2 | 0 | 0 | 0 |
| Spark | 0 | 1 | 0 | 0 | 2 | 0 |
| Blog | 0 | 0 | 0 | 2 | 0 | 0 |
| Hyphen | 0 | 0 | 0 | 2 | 0 | 0 |
| Stack | 0 | 1 | 1 | 0 | 0 | 0 |
| SQL | 0 | 0 | 1 | 1 | 0 | 0 |
| Busine | 0 | 1 | 0 | 1 | 0 | 0 |
| Signal | 0 | 1 | 0 | 0 | 0 | 1 |
| Encryp | 0 | 0 | 0 | 0 | 1 | 0 |
| Git | 0 | 0 | 0 | 1 | 0 | 0 |
| Stable | 0 | 1 | 0 | 0 | 0 | 0 |
| Redis | 0 | 1 | 0 | 0 | 0 | 0 |
| Thrust | 0 | 0 | 1 | 0 | 0 | 0 |
| Kibana | 0 | 0 | 0 | 1 | 0 | 0 |
| Astron | 1 | 0 | 0 | 0 | 0 | 0 |
| Mustac | 0 | 0 | 1 | 0 | 0 | 0 |
| NAT | 0 | 0 | 0 | 0 | 1 | 0 |
| jQuery | 0 | 0 | 1 | 0 | 0 | 0 |
| SSH | 0 | 0 | 0 | 0 | 1 | 0 |
| Happyh | 0 | 0 | 1 | 0 | 0 | 0 |
| Backup | 0 | 0 | 0 | 0 | 1 | 0 |
| Pthrea | 0 | 0 | 1 | 0 | 0 | 0 |
| AWS | 0 | 0 | 0 | 0 | 1 | 0 |
| SIFT | 0 | 0 | 1 | 0 | 0 | 0 |
| SURF | 0 | 0 | 1 | 0 | 0 | 0 |
| Conjug | 0 | 0 | 1 | 0 | 0 | 0 |
| Kalman | 0 | 0 | 1 | 0 | 0 | 0 |
| Partic | 0 | 0 | 1 | 0 | 0 | 0 |
| Gradie | 0 | 0 | 1 | 0 | 0 | 0 |
| Simult | 0 | 0 | 1 | 0 | 0 | 0 |
| Roboti | 0 | 0 | 1 | 0 | 0 | 0 |
| Princi | 1 | 0 | 0 | 0 | 0 | 0 |
| Receiv | 1 | 0 | 0 | 0 | 0 | 0 |
| Linear | 1 | 0 | 0 | 0 | 0 | 0 |
| Suppor | 1 | 0 | 0 | 0 | 0 | 0 |
| Machin | 1 | 0 | 0 | 0 | 0 | 0 |
| Discre | 1 | 0 | 0 | 0 | 0 | 0 |