background

Niko's Project Corner

Nginx docker image for easy file access via HTTP

Description An alternative for SSHFS, Samba shares etc.
Languages Bash
Tags Docker
Spark
Ng­inx
GitHub
Duration Spring 2016
Modified 16th April 2016
GitHub nikonyrh/docker-scripts
DockerHub nikonyrh/nginx_bridge
thumbnail

Of­ten I find my­self hav­ing a SSH con­nec­tion to a re­mote server, and I'd like to re­trieve some files to my own ma­chine. Com­mon meth­ods for this in­clude Win­dows/Samba share, SSHFS and up­load to cloud (which isn't triv­ial to do via plain cURL). Here an easy-to-use al­ter­na­tive is de­scribed: a sin­gle line com­mand to load and run a docker im­age which con­tains a pre-con­fig­ured Ng­inx in­stance. Then files can be ac­cessed via plain HTTP at the user-as­signed port (as­sum­ing fire­wall isn't block­ing it).

I found that writ­ing Dock­er­files is way eas­ier than for ex­am­ple make files, maybe be­cause its op­er­ations closely match those you'd ex­ecute via bash any­way when set­ting up a new box. Ad­di­tion­ally there are con­ve­nient pub­lished im­ages to base your im­ages on, thus min­imiz­ing the num­ber of cus­tom steps you need to think of and im­ple­ment.

The im­ple­mented docker im­age is based "FROM ng­inx:1.9", and just con­tains a cus­tom ng­inx.conf and main.sh files. When docker run -p 1234:80 -v "$PWD:/vol­ume" -d nikonyrh/ng­inx_bridge is ex­ecuted it starts the con­tainer, mounts cur­rent work­ing di­rec­tory to /vol­ume path (could be read-only) and ex­poses its con­tents as Ng­inx auto-in­dexed folder at http://lo­cal­host:1234. By de­fault ac­cess log is avail­able at http://lo­cal­host:1234/logs/logx.txt but it can be dis­abled with -no-log flag at start-up. The im­age is about 190 MB, gzip com­presses it down to 72 MB and Dock­er­hub says it is 75 MB.

Also some ef­fi­ciency ex­per­iments were run. A few gi­ga­bytes of JPG im­ages (40 - 400 kB in size) were trans­ferred and at best 90% of the 1 Gbps band­width was achieved. Files were trans­ferred from an Ubuntu server to a router, to a Win­dows ma­chine run­ning Ubuntu in a Vir­tu­al­Box, via curl to /dev/null. Re­sult­ing band­width is shown in Fig­ure 1. Par­al­lel ex­ecu­tion was achieved via xargs, thus the over­head of three-way TCP hand­shake was sig­nif­icant un­less the cURL pro­cesses fetched mul­ti­ple im­ages.

performance
Figure 1: Achieved band­width on trans­fer­ring medium-size files over 1 Gb eth­erned and HTTP.

In con­clu­sion this seems to be a vi­able way of dis­tribut­ing JPG im­ages to other ma­chi­nes within the LAN for fur­ther pro­cess­ing. The first task might be cal­cu­lat­ing color his­tograms of we­bcam im­ages on dif­fer­ent cal­en­dar dates and times of day. Cal­cu­la­tion dis­tri­bu­tion will be han­dled by the Spark frame­work. Also it would be in­ter­est­ing to mea­sure this per­for­mance to al­ter­na­tives such as HDFS.


Related blog posts:

AnalyticsPlatform
ServiceDiscovery
WebcamMon
BenchmarkTaxiridesEsSql
InternalNetwork