background

Niko's Project Corner

Very fuzzy searching with CUDA

Description Visualizing laser scanned point clouds.
Languages C++
CUDA
Tags Thrust
Databases
GitHub
Stack Over­flow
Duration Fall 2015
Modified 2nd November 2015
GitHub nikonyrh/stackoverflow-scripts
thumbnail

This is an al­ter­na­tive an­swer to the ques­tion I en­coun­tered at Stack Over­flow about fuzzy search­ing of hashes on Elas­tic­search. My orig­inal an­swer used lo­cal­ity-sen­si­tive hash­ing. Su­pe­rior speed and sim­ple im­ple­men­ta­tion were gained by us­ing nVidia's CUDA via Thrust li­brary.

Ac­tu­ally this pro­ject didn't need more than a few dozen lines of code to solve the prob­lem, most code is about bench­mark­ing the code with vary­ing prob­lem sizes. Ham­ming dis­tance was cal­cu­lated as the Ham­ming weight of the two 64-bit in­te­gers, the used al­go­rithm is "pop­count_3" of Wikipedia ar­ti­cle. In­ter­est­ingly it was more ef­fi­cient to cal­cu­late the sum of two 32-bit "halves" weights in­stead of cal­cu­lat­ing it for all 64-bits at once, it would take more bench­mark­ing and pro­fil­ing to find out why this was the case.

Any­way, my lap­top's GeForce GT 650M was able to scan through 700 mil­lion 64-bit hashes / sec­ond. This was far su­pe­rior per­for­mance that I got form Elas­tic­search which would need 100 mil­lisec­onds to search through 1 mil­lion doc­uments but for CUDA it takes only 2.5 mil­lisec­onds! The per­for­mance dif­fer­ence gets even greater when more doc­uments are searched, for ex­am­ple for 7 mil­lion im­ages it took 6 - 7 sec­onds for ES but only 10 mil­lisec­onds for CUDA. Also with this so­lu­tion we get 100% ac­cu­racy on re­sults.

Ker­nel launch in­curs some over­head but it di­min­ishes when search­ing on more than 5 mil­lion im­ages.

images_per_second
Figure 1: Search per­for­mance of 700 mil­lion im­ages / sec­onds was reached.

The CUDA so­lu­tion could be made HTTP-ac­ces­si­ble by for ex­am­ple im­ple­ment­ing a FastCGI in­ter­face to the script and hav­ing a Ng­inx web server rout­ing re­quests there. This is very easy to set-up es­pe­cially if im­age hashes don't change of­ten and fit in sin­gle ma­chine's GPU's mem­ory.


Related blog posts:

FuzzySearchEs
BenchmarkTaxiridesEsSql
CljTaxirides
CljMustache
HierarchicalSchemaEs