background

Niko's Project Corner

Mustache templates in Clojure

Description Implementing Mustache templates algorithm in Clojure.
Languages Clojure
Tags Blog
GitHub
JVM
Duration Summer 2016
Modified 25th January 2017
GitHub nikonyrh/mustache-clj
thumbnail

Mus­tache is a well-known tem­plate sys­tem with im­ple­men­ta­tions in most pop­ular lan­guages. At its core it is log­icless same tem­plates can be di­rectly used on other pro­jects. For ex­am­ple I am plan­ning to port this blgo en­gine from PHP to Clo­jure but I only need to re­place La­TeX pars­ing and HTML gen­er­ation parts, I should be able to use ex­ist­ing Mus­tache tem­plates with­out any mod­ifi­ca­tions. To learn Clo­jure pro­gram­ming I de­cided not to use the rec­om­mended li­brary but in­stead im­ple­ment my own.

I started by look­ing at ex­ist­ing im­ple­men­ta­tions, most no­tably Clostache's parser.clj. It is about 380 lines of code, whereas my ref­er­ence im­ple­men­ta­tion is 115 lines of code. No­tably Clostache is heav­ily type-an­no­tated, uses reg­ular ex­pres­sions and also im­ple­ments lamb­das. I opted for not hav­ing type an­no­ta­tions, not im­ple­ment­ing much logic in regexes and not im­ple­ment­ing lamb­das :)

It was a very ex­cit­ing mo­ment to fi­nally bring all the pieces to­gether, as the ren­der func­tion is im­ple­mented as a sin­gle pipeline through lexer and parser steps: (->> tem­plate lexer (re­solve-par­tials par­tials) parser (merge-ast-and-data data) flat­ten (map es­ca­per) (ap­ply str)). lexer takes an input string as its argument, uses str-replace regexes to normalize a few alternative syntaxes, splits it into tokens and adds metadata to them. resolve-partials first lexes partials (as they are normal strings) and then iteratively replaces partial references by the referenced values. As partials can refer to other partials this process needs to be repeated until everything has been resolved.

parser takes in a se­quence of to­kens and first adds ``path'' in­for­ma­tion to them as Mus­tache sup­ports nested struc­tures called ``Sec­tions''. Path is de­scribed as a list of names of sec­tion-start­ing nodes be­tween the node and the root. On a sec­ond pass to­kens are split into par­ti­tions based on their path at the cur­rent level, es­sen­tially build­ing a lo­cal view of the ab­stract syn­tax tree (AST). Once all that re­cur­sion has been done we are left with the com­plete AST.

The fi­nal in­ter­est­ing piece to the puz­zle is merge-ast-and-data which, as the name sug­gests, merges AST with the in­put data. If it en­coun­ters a ``ref­er­ence'' type to­ken then it is re­placed by the cor­re­spond­ing value in in­put data, on other cases we are deal­ing with an AST node. If it has a path de­fined then cor­re­spond­ing data-se­quence is loaded from the in­put and it­er­ated over, re­cur­sively call­ing it­self with sub­se­quent AST nodes. If the node doesn't have a path then sim­ply AST nodes are re­cur­sively pro­cessed. This gen­er­ates a new AST which is sim­ilar to the orig­inal but new nodes have been cre­ated based on el­ements of the data.

At this point the tree struc­ture is not needed any­more, thus it is flat­tened via flat­ten. The only re­main­ing task is to walk over them once more, and check­ing if they rep­re­sent a ``raw'' tex­tual value or if it should be es­caped. Es­cap­ing is based on walk­ing over string's char­ac­ters one at a time and check­ing if their es­caped coun­ter­part is found from a hash-map. The fi­nal step is to con­cate­nate these strings by (ap­ply str).

This im­ple­men­ta­tion passes all rel­evat unit tests of Clostache, only lamb­das and file-based func­tions are not im­ple­mented at this point. Next steps is to make this ``im­portable'' as a li­brary (still think­ing whether to put it on Clo­jars or not). Then I can pro­ceed to im­ple­ment a LaTeX parser and in­te­grate it with this and my hy­phen­ator-clj pro­jects and I'm one step closer to ditch­ing my PHP-based blog en­gine. But it is go­ing to be quite a lot of ef­fort as the PHP pro­ject has cu­mu­lated quite many fea­tures and hacks, such as de­ter­min­ing file's rep­re­sen­ta­tive mod­ified date from git blame's out­put.


Related blog posts:

CljHyphenation
CljTaxirides
AnalyticsPlatform
BenchmarkTaxiridesEsSql
PhpHyphenation