background

Niko's Project Corner

Anonymous and secure information storing and sharing

Description End-to-end encrypted storage service
Languages PHP
Tags GitHub
En­cryp­tion
Duration Fall 2014
Modified 25th April 2015
GitHub nikonyrh/noknowledgenotes
thumbnail

Nowa­days en­cryp­tion is stan­dard prac­tice on web when data is in tran­si­tion, and there are even a few ser­vices which of­fer client-side en­cryp­tion and thus are truly end-to-end. Nev­er­the­less for some rea­son they all re­quire you to cre­ate and ac­count by pro­vid­ing your email and pass­word, al­though this is not strictly nec­es­sary for stor­ing and shar­ing data. In this sys­tem the doc­ument id, en­cryp­tion key and HMAC key are gen­er­ated ad-hoc on the client and only min­imal nec­es­sary in­for­ma­tion is re­vealed to the server. A live demo should be avail­able at no­knowl­edgenotes.nikonyrh.org.

The ser­vice is based on well un­der­stood and ex­am­ined AES and SHA-256 se­cu­rity prim­itives, but the idea can be im­ple­mented by us­ing any other en­cryp­tion and hash­ing al­go­rithms. A pub­lic-key en­cryp­tion al­go­rithm would be an ap­peal­ing choice but here a sym­met­ric al­go­rithm was used for sim­plic­ity. The server is im­ple­mented in PHP and is only a bit over 200 lines long. The pro­vided client is con­sid­er­ably more com­plex at 600 lines of code (HTML + JavaScript), ex­clud­ing 3rd party li­braries (jQuery, Cryp­toJS).

The client's hash func­tion is H(a,b,c) = SHA-256(a || SHA-256(a || b || salt) || c) where "a" is the hash out­put's name such as key_enc, "b" is the user­name and "c" is the pass­word. In to­tal there are two cryp­to­graphic key (AES and HMAC) and a "write to­ken" de­rived from the user­name and pass­word, and the doc­ument id is de­rived from just the en­cryp­tion key. These re­la­tion­ships are shown in Fig­ure 1.

key_derivation
Figure 1: Two steps of key and id deriva­tions from the user­name and pass­word. A SHA-256 con­struct is used as a one-way func­tion. Only a min­imum nec­es­sary sub­set of keys is re­vealed to other par­ties such as the server or in a read-only link.

Only the doc­ument cre­ator should know the user­name, pass­word and HMAC key. Server only needs to know the doc­ument id and the as­so­ci­ated "write to­ken". For some­one else to have a read-only ac­cess he only needs to know the AES key, from which he/she can de­rive the doc­ument id. Write to­ken is used for the server to check for write per­mis­sion when the doc­ument is be­ing up­dated but it has no cryp­to­graphic pur­pose. HMAC key pre­vents the server sub­mit­ting ar­bi­trary con­tent when a doc­ument is re­quested. Only the per­son who knows this key can ver­ify con­tent va­lid­ity and cal­cu­late a new hash for new con­tent. Ma­li­cious server can only ei­ther delete the doc­ument or provide an older ver­sion but all other tam­per­ing would be im­me­di­ately no­ticed. Even if the server did not check the write to­ken va­lid­ity when the doc­ument is up­dated by a 3rd party, the al­ter­ation would be no­ticed be­cause it wouldn't have a cor­rect HMAC key stored alongside with it.

If AES and HMAC keys would be de­rived from the write to­ken then a read-write ac­cess could be eas­ily given with­out re­veal­ing the user­name and pass­word. This tweak wouldn't re­quire any mod­ifi­ca­tions to the server or the pro­to­col as all this hap­pens on the client side.

Stor­ing the data is fairly triv­ial, and it can be seen at Fig­ure 2. Plain-text con­tent is first com­pressed with LZW, then en­crypted with 256-bit AES, then HMAC and other meta­data such as date are at­tached and then it is con­verted into a JSON string. Com­pres­sion pro­duces a UTF-16 string but AES pro­duces base64-en­coded ci­pher­text so the fi­nal JSON has only ASCII char­ac­ters. The client stores all past HMAC hashes and dates in the doc­ument's meta­data, but op­tion­ally it could store all past ver­sions as well. This wouldn't in­flate the fi­nal file­size that much be­cause of the com­pres­sion step.

When the doc­ument is be­ing stored on the server, server checks that the re­quest em­beds a write to­ken and it has to match to the to­ken which is stored within the server's ver­sion of the doc­ument. This step en­ables easy read-only shar­ing of doc­uments.

dataflow
Figure 2: Data stor­age pipeline has steps: com­pres­sion, en­cryp­tion, HMAC and JSON gen­er­ation.

Doc­ument load­ing works by sim­ply ap­ply­ing those steps in the re­verse or­der. First a doc­ument is re­quested from the server based on its id and the re­sponse's JSON is de­coded into a JavaScript ob­ject. Doc­ument's HMAC is cal­cu­lated and it is con­firmed to match the hash which is stored in the re­sponse. If they match then con­tents are de­crypted and un­com­pressed.

Read-only shar­ing is achieved by de­liv­er­ing the sym­met­ric AES en­cryp­tion key to the de­sired party, from that he can de­rive the doc­ument's id. He is able to re­trieve the ci­pher­text from the server and de­crypt it, but its HMAC can­not be con­firmed un­less the HMAC key is also re­vealed or if the cur­rent hash value is de­liv­ered alongside with the AES key.

When the server is asked to cre­ate a new doc­ument it asks the client to solve a Proof-of-work prob­lem. An unique prob­lem in­stance is gen­er­ated and the client is asked to provide such in­puts to a salted hash func­tion that it has the server-spec­ified num­ber of lead­ing ze­ros. By hav­ing to solve many easy prob­lems in­stead of a few harder ones the cal­cu­la­tion time vari­ance is greatly re­duced. On cur­rent set­tings it takes the JS client 1 - 3 sec­onds to solve the prob­lem be­fore the new doc­ument can be cre­ated.

The client could be up­dated to be vi­su­ally more pleas­ing and to of­fer for ex­am­ple multi-server mir­ror­ing. This stor­age sys­tem is ideal for ex­am­ple syn­chro­niz­ing browser book­marks be­tween com­put­ers with­out re­veal­ing your ac­tual book­marks to the server or any other 3rd party. Clearly this is not meant for tak­ing back­ups of pho­tos or other mul­ti­me­dia, it is best suited for plain-text, JSON or XML con­tent which gen­er­ally com­presses well. Per­haps even a chat ap­pli­ca­tion could be built based on this "pro­to­col" but it wouldn't have some de­sired prop­er­ties such as for­ward se­crecy or de­ni­able cryp­tog­ra­phy. So far no pub­lic-pri­vate key-pairs have been used be­cause it wasn't deemed nec­es­sary. At least it would en­able read-only shar­ing with­out re­veal­ing the en­cryp­tion key but it wouldn't make much dif­fer­ence on the cur­rent sys­tem.


Related blog posts:

WebcamMon
BenchmarkTaxiridesEsSql
CljTaxirides
CljMustache
CljHyphenation