Niko's Project Corner

Nginx docker image for easy file access via HTTP

Description An alternative for SSHFS, Samba shares etc.
Languages Bash
Tags Docker
Duration Spring 2016
Modified 16th April 2016
GitHub nikonyrh/docker-scripts
DockerHub nikonyrh/nginx_bridge

Of­ten I find my­self hav­ing a SSH con­nec­tion to a re­mote server, and I'd like to re­trieve some files to my own ma­chine. Com­mon meth­ods for this in­clude Win­dows/Samba share, SSHFS and up­load to cloud (which isn't triv­ial to do via plain cURL). Here an easy-to-use al­ter­na­tive is de­scribed: a sin­gle line com­mand to load and run a docker im­age which con­tains a pre-con­fig­ured Ng­inx in­stance. Then files can be ac­cessed via plain HTTP at the user-as­signed port (as­sum­ing fire­wall isn't block­ing it).

I found that writ­ing Dock­er­files is way eas­ier than for ex­am­ple make files, maybe be­cause its op­er­ations closely match those you'd ex­ecute via bash any­way when set­ting up a new box. Ad­di­tion­ally there are con­ve­nient pub­lished im­ages to base your im­ages on, thus min­imiz­ing the num­ber of cus­tom steps you need to think of and im­ple­ment.

The im­ple­mented docker im­age is based "FROM ng­inx:1.9", and just con­tains a cus­tom ng­inx.conf and files. When docker run -p 1234:80 -v "$PWD:/vol­ume" -d nikonyrh/ng­inx_bridge is ex­ecuted it starts the con­tainer, mounts cur­rent work­ing di­rec­tory to /vol­ume path (could be read-only) and ex­poses its con­tents as Ng­inx auto-in­dexed folder at http://lo­cal­host:1234. By de­fault ac­cess log is avail­able at http://lo­cal­host:1234/logs/logx.txt but it can be dis­abled with -no-log flag at start-up. The im­age is about 190 MB, gzip com­presses it down to 72 MB and Dock­er­hub says it is 75 MB.

Also some ef­fi­ciency ex­per­iments were run. A few gi­ga­bytes of JPG im­ages (40 - 400 kB in size) were trans­ferred and at best 90% of the 1 Gbps band­width was achieved. Files were trans­ferred from an Ubuntu server to a router, to a Win­dows ma­chine run­ning Ubuntu in a Vir­tu­al­Box, via curl to /dev/null. Re­sult­ing band­width is shown in Fig­ure 1. Par­al­lel ex­ecu­tion was achieved via xargs, thus the over­head of three-way TCP hand­shake was sig­nif­icant un­less the cURL pro­cesses fetched mul­ti­ple im­ages.

Figure 1: Achieved band­width on trans­fer­ring medium-size files over 1 Gb eth­erned and HTTP.

In con­clu­sion this seems to be a vi­able way of dis­tribut­ing JPG im­ages to other ma­chi­nes within the LAN for fur­ther pro­cess­ing. The first task might be cal­cu­lat­ing color his­tograms of we­bcam im­ages on dif­fer­ent cal­en­dar dates and times of day. Cal­cu­la­tion dis­tri­bu­tion will be han­dled by the Spark frame­work. Also it would be in­ter­est­ing to mea­sure this per­for­mance to al­ter­na­tives such as HDFS.

Related blog posts: