Niko's Project Corner

Nowadays encryption is standard practice on web when data is in transition, and there are even a few services which offer client-side encryption and thus are truly end-to-end. Nevertheless for some reason they all require you to create and account by providing your email and password, although this is not strictly necessary for storing and sharing data. In this system the document id, encryption key and HMAC key are generated ad-hoc on the client and only minimal necessary information is revealed to the server. A live demo should be available at noknowledgenotes.nikonyrh.org.

The service is based on well understood and examined AES and SHA-256 security primitives, but the idea can be implemented by using any other encryption and hashing algorithms. A public-key encryption algorithm would be an appealing choice but here a symmetric algorithm was used for simplicity. The server is implemented in PHP and is only a bit over 200 lines long. The provided client is considerably more complex at 600 lines of code (HTML + JavaScript), excluding 3rd party libraries (jQuery, CryptoJS).

The client's hash function is H(a,b,c) = SHA-256(a || SHA-256(a || b || salt) || c) where "a" is the hash output's name such as key_enc, "b" is the username and "c" is the password. In total there are two cryptographic key (AES and HMAC) and a "write token" derived from the username and password, and the document id is derived from just the encryption key. These relationships are shown in Figure 1.

Figure 1: Two steps of key and id derivations from the username and password. A SHA-256 construct is used as a one-way function. Only a minimum necessary subset of keys is revealed to other parties such as the server or in a read-only link.

Only the document creator should know the username, password and HMAC key. Server only needs to know the document id and the associated "write token". For someone else to have a read-only access he only needs to know the AES key, from which he/she can derive the document id. Write token is used for the server to check for write permission when the document is being updated but it has no cryptographic purpose. HMAC key prevents the server submitting arbitrary content when a document is requested. Only the person who knows this key can verify content validity and calculate a new hash for new content. Malicious server can only either delete the document or provide an older version but all other tampering would be immediately noticed. Even if the server did not check the write token validity when the document is updated by a 3rd party, the alteration would be noticed because it wouldn't have a correct HMAC key stored alongside with it.

If AES and HMAC keys would be derived from the write token then a read-write access could be easily given without revealing the username and password. This tweak wouldn't require any modifications to the server or the protocol as all this happens on the client side.

Storing the data is fairly trivial, and it can be seen at Figure 2. Plain-text content is first compressed with LZW, then encrypted with 256-bit AES, then HMAC and other metadata such as date are attached and then it is converted into a JSON string. Compression produces a UTF-16 string but AES produces base64-encoded ciphertext so the final JSON has only ASCII characters. The client stores all past HMAC hashes and dates in the document's metadata, but optionally it could store all past versions as well. This wouldn't inflate the final filesize that much because of the compression step.

When the document is being stored on the server, server checks that the request embeds a write token and it has to match to the token which is stored within the server's version of the document. This step enables easy read-only sharing of documents.

Figure 2: Data storage pipeline has steps: compression, encryption, HMAC and JSON generation.

Document loading works by simply applying those steps in the reverse order. First a document is requested from the server based on its id and the response's JSON is decoded into a JavaScript object. Document's HMAC is calculated and it is confirmed to match the hash which is stored in the response. If they match then contents are decrypted and uncompressed.

Read-only sharing is achieved by delivering the symmetric AES encryption key to the desired party, from that he can derive the document's id. He is able to retrieve the ciphertext from the server and decrypt it, but its HMAC cannot be confirmed unless the HMAC key is also revealed or if the current hash value is delivered alongside with the AES key.

When the server is asked to create a new document it asks the client to solve a Proof-of-work problem. An unique problem instance is generated and the client is asked to provide such inputs to a salted hash function that it has the server-specified number of leading zeros. By having to solve many easy problems instead of a few harder ones the calculation time variance is greatly reduced. On current settings it takes the JS client 1 - 3 seconds to solve the problem before the new document can be created.

The client could be updated to be visually more pleasing and to offer for example multi-server mirroring. This storage system is ideal for example synchronizing browser bookmarks between computers without revealing your actual bookmarks to the server or any other 3rd party. Clearly this is not meant for taking backups of photos or other multimedia, it is best suited for plain-text, JSON or XML content which generally compresses well. Perhaps even a chat application could be built based on this "protocol" but it wouldn't have some desired properties such as forward secrecy or deniable cryptography. So far no public-private key-pairs have been used because it wasn't deemed necessary. At least it would enable read-only sharing without revealing the encryption key but it wouldn't make much difference on the current system.

Automated image capturing + API, 2015 Apr (Matching: GitHub, PHP)
Benchmarking Elasticsearch and MS SQL on NYC Taxis, 2017 May (Matching: GitHub)
Analyzing NYC Taxi dataset with Elasticsearch and Kibana, 2017 Mar (Matching: GitHub)
Mustache templates in Clojure, 2017 Jan (Matching: GitHub)
English hyphenation algorithm in Clojure, 2016 Aug (Matching: GitHub)

Home	(Home page)
About	(About me)
Platform	(About this blog)

LinkedIn	(Niko Nyrhilä)
GitHub	(nikonyrh)
Stackoverflow	(nikonyrh)

Bruteforcing Countdown numbe...	(2023 Apr)
Cheating at Bananagrams with...	(2023 Apr)
Introduction to Stable Diffu...	(2022 Nov)
Matching puzzle pieces together	(2022 Jul)
Single channel speech / musi...	(2022 Feb)

Computer Vision	(13)
GitHub	(12)
Databases	(9)
Elasticsearch	(6)
FFT	(5)
Rendering	(5)
Applied mathematics	(4)

Anonymous and secure information storing and sharing

Related blog posts:

Home

Navigation

External

Most recent

Most frequent tags

Most frequent languages

Co-occurrence matrix

	Matl	Pyth	C++	Cloj	Bash	Kera
Comput	6	6	3	1	0	5
GitHub	0	2	1	4	3	0
Databa	0	3	2	2	1	0
Render	3	0	3	0	0	0
Nginx	0	1	0	0	4	0
Autoen	0	3	0	1	0	2
Elasti	0	2	0	3	0	0
FFT	3	1	1	0	0	1
Data S	2	1	2	1	0	1
JVM	0	1	0	3	1	0
Docker	0	1	0	0	3	0
FastCG	0	0	3	0	0	0
Applie	2	2	0	0	0	0
Field	2	0	2	0	0	0
Omnidi	2	0	2	0	0	0
Affine	2	0	2	0	0	0
Master	1	0	2	0	0	0
Archit	0	1	0	0	2	0
Visual	1	0	2	0	0	0
Spark	0	1	0	0	2	0
Blog	0	0	0	2	0	0
Hyphen	0	0	0	2	0	0
Stack	0	1	1	0	0	0
SQL	0	0	1	1	0	0
Busine	0	1	0	1	0	0
Signal	0	1	0	0	0	1
Encryp	0	0	0	0	1	0
Git	0	0	0	1	0	0
Stable	0	1	0	0	0	0
Redis	0	1	0	0	0	0
Thrust	0	0	1	0	0	0
Kibana	0	0	0	1	0	0
Astron	1	0	0	0	0	0
Mustac	0	0	1	0	0	0
NAT	0	0	0	0	1	0
jQuery	0	0	1	0	0	0
SSH	0	0	0	0	1	0
Happyh	0	0	1	0	0	0
Backup	0	0	0	0	1	0
Pthrea	0	0	1	0	0	0
AWS	0	0	0	0	1	0
SIFT	0	0	1	0	0	0
SURF	0	0	1	0	0	0
Conjug	0	0	1	0	0	0
Kalman	0	0	1	0	0	0
Partic	0	0	1	0	0	0
Gradie	0	0	1	0	0	0
Simult	0	0	1	0	0	0
Roboti	0	0	1	0	0	0
Princi	1	0	0	0	0	0
Receiv	1	0	0	0	0	0
Linear	1	0	0	0	0	0
Suppor	1	0	0	0	0	0
Machin	1	0	0	0	0	0
Discre	1	0	0	0	0	0