Scalable analytics with Docker, Spark and Python(23rd December 2015) |
|||||||
Traditionally data scientists installed software packages directly to their machines, wrote code, trained models, saved results to local files and applied models to new data in batch processing style. New data-driven products require rapid development of new models, scalable training and easy integration to other aspects of the business. Here I am proposing one (perhaps already well-known) cloud-ready architecture to meet these requirements.
|
|
||||||
Very fuzzy searching with Elasticsearch(21st October 2015) |
|||||||
I encountered an interesting question at Stack Overflow about fuzzy searching of hashes on Elasticsearch and decided to give it a go. It has native support for fuzzy text searches but due to performance reasons it only supports an edit distance up-to 2. In this context the maximum allowed distance was eight so an alternative solution was needed. A solution was found from locality-sensitive hashing.
|
|
Home |
Home | (Home page) |
About | (About me) |
Platform | (About this blog) |
(Niko Nyrhilä) | |
GitHub | (nikonyrh) |
Stackoverflow | (nikonyrh) |
Bruteforcing Countdown numbe... | (2023 Apr) |
Cheating at Bananagrams with... | (2023 Apr) |
Introduction to Stable Diffu... | (2022 Nov) |
Matching puzzle pieces together | (2022 Jul) |
Single channel speech / musi... | (2022 Feb) |
Computer Vision | (13) |
GitHub | (12) |
Databases | (9) |
Elasticsearch | (6) |
FFT | (5) |
Rendering | (5) |
Applied mathematics | (4) |
Python | (13) |
C++ | (11) |
Matlab | (10) |
Keras | (6) |
Clojure | (6) |
Bash | (6) |
PHP | (6) |
Matl | Pyth | C++ | Cloj | Bash | Kera | |
Comput | 6 | 6 | 3 | 1 | 0 | 5 |
GitHub | 0 | 2 | 1 | 4 | 3 | 0 |
Databa | 0 | 3 | 2 | 2 | 1 | 0 |
Render | 3 | 0 | 3 | 0 | 0 | 0 |
Nginx | 0 | 1 | 0 | 0 | 4 | 0 |
Autoen | 0 | 3 | 0 | 1 | 0 | 2 |
Elasti | 0 | 2 | 0 | 3 | 0 | 0 |
FFT | 3 | 1 | 1 | 0 | 0 | 1 |
Data S | 2 | 1 | 2 | 1 | 0 | 1 |
JVM | 0 | 1 | 0 | 3 | 1 | 0 |
Docker | 0 | 1 | 0 | 0 | 3 | 0 |
FastCG | 0 | 0 | 3 | 0 | 0 | 0 |
Applie | 2 | 2 | 0 | 0 | 0 | 0 |
Field | 2 | 0 | 2 | 0 | 0 | 0 |
Omnidi | 2 | 0 | 2 | 0 | 0 | 0 |
Affine | 2 | 0 | 2 | 0 | 0 | 0 |
Master | 1 | 0 | 2 | 0 | 0 | 0 |
Archit | 0 | 1 | 0 | 0 | 2 | 0 |
Visual | 1 | 0 | 2 | 0 | 0 | 0 |
Spark | 0 | 1 | 0 | 0 | 2 | 0 |
Blog | 0 | 0 | 0 | 2 | 0 | 0 |
Hyphen | 0 | 0 | 0 | 2 | 0 | 0 |
Stack | 0 | 1 | 1 | 0 | 0 | 0 |
SQL | 0 | 0 | 1 | 1 | 0 | 0 |
Busine | 0 | 1 | 0 | 1 | 0 | 0 |
Signal | 0 | 1 | 0 | 0 | 0 | 1 |
Encryp | 0 | 0 | 0 | 0 | 1 | 0 |
Git | 0 | 0 | 0 | 1 | 0 | 0 |
Stable | 0 | 1 | 0 | 0 | 0 | 0 |
Redis | 0 | 1 | 0 | 0 | 0 | 0 |
Thrust | 0 | 0 | 1 | 0 | 0 | 0 |
Kibana | 0 | 0 | 0 | 1 | 0 | 0 |
Astron | 1 | 0 | 0 | 0 | 0 | 0 |
Mustac | 0 | 0 | 1 | 0 | 0 | 0 |
NAT | 0 | 0 | 0 | 0 | 1 | 0 |
jQuery | 0 | 0 | 1 | 0 | 0 | 0 |
SSH | 0 | 0 | 0 | 0 | 1 | 0 |
Happyh | 0 | 0 | 1 | 0 | 0 | 0 |
Backup | 0 | 0 | 0 | 0 | 1 | 0 |
Pthrea | 0 | 0 | 1 | 0 | 0 | 0 |
AWS | 0 | 0 | 0 | 0 | 1 | 0 |
SIFT | 0 | 0 | 1 | 0 | 0 | 0 |
SURF | 0 | 0 | 1 | 0 | 0 | 0 |
Conjug | 0 | 0 | 1 | 0 | 0 | 0 |
Kalman | 0 | 0 | 1 | 0 | 0 | 0 |
Partic | 0 | 0 | 1 | 0 | 0 | 0 |
Gradie | 0 | 0 | 1 | 0 | 0 | 0 |
Simult | 0 | 0 | 1 | 0 | 0 | 0 |
Roboti | 0 | 0 | 1 | 0 | 0 | 0 |
Princi | 1 | 0 | 0 | 0 | 0 | 0 |
Receiv | 1 | 0 | 0 | 0 | 0 | 0 |
Linear | 1 | 0 | 0 | 0 | 0 | 0 |
Suppor | 1 | 0 | 0 | 0 | 0 | 0 |
Machin | 1 | 0 | 0 | 0 | 0 | 0 |
Discre | 1 | 0 | 0 | 0 | 0 | 0 |