background

Niko's Project Corner

Blogging platform — What Would TeX Do?

Description PHP, LaTeX and Git based blogging engine.
Languages PHP
LaTeX
Mustache
Tags Blog
Git
Hy­phen­ation
Duration Summer 2013
Modified 6th July 2013
thumbnail

I wrote this blog en­gine to en­able the cre­ation of new ar­ti­cles in LaTeX for­mat and ef­fort­lessly pub­lish them in Web. As a by-pro­duct it en­ables triv­ial PDF gen­er­ation of each ar­ti­cle, or even a com­bined PDF with all the ar­ti­cles con­cate­nated with a in­ter­ac­tive table of con­tents, au­to­mat­ically nu­mer­ated fig­ures with ref­er­ences from the text, and many other fea­tures that LaTeX users take for granted.

First I looked at stan­dard blog­ging so­lu­tions, but I quickly re­al­ized that my spe­cial re­quire­ments could dif­fi­cult to im­ple­ment as plu­gins or oth­er­wise. At least it would take the same amount of time, and even then be more trou­ble­some to main­tain, de­ploy and backup. I re­ally en­joy us­ing LaTeX for writ­ing doc­uments, so I fig­ured that why not to use it also for blog­ging.

This plat­form uses Git for SCM and CMS, LaTeX for writing, PHP to generate navigation links etc. and mus­tache tem­plat­ing to generate the final HTML. Most of the server side functionalities can be avoided by utilizing 3rd party services such as Google An­alyt­ics and Dis­qus. To my surprise it took only about 1000 lines of PHP to get the basic stuff working such as tag-based browsing of articles and references to automatically numbered figures. The current architecture can be seen in Figure 1. All three subsystems (LaTeX, PHP and Mustache) are stored in a Git repository, and the logic to bring everything together is implemented in PHP. Actually this resembles a lot of the MVC pattern. The ar­ti­cle (model) is stored in .tex files, for which two dif­fer­ent views (PDF and HTML) can be pro­duced. In this case the user can­not al­ter any of the views, so there is no need to de­ploy the ac­tual model or con­troller com­po­nents. In­stead only static HTML, PDF etc. files need to be copied to the web server (or Drop­box's pub­lic folder).

architecture
Figure 1: Cur­rent blog pro­ject's ar­chi­tec­ture. Ev­ery­thing is based on Git, which stores and sup­ports the three sub-sys­tems. This re­sem­bles the MVC model, where model has none of the logic.

PHP reads the .tex file to pro­duce the blog posts' body and meta data, which is passed to a Mus­tache tem­plate which pro­duces HTML and CSS. PHP also has the logic to gen­er­ate nav­iga­tion links, find out most pop­ular tags and lan­guages, and to ex­ecute other use­ful tasks.

Cur­rently the plat­form sup­ports fol­low­ing fea­tures (in PDF and/or HTML):

  • Au­to­mat­ically num­bered fig­ures with named ref­er­enc­ing from the main text
  • Writ­ing ar­ti­cles us­ing any text ed­itor, but IDEs such as TeX­works have their ben­efits
  • Ar­ti­cle brows­ing by pro­ject lan­guage or tag (in HTML)
  • Links to ex­ter­nal sites (with an in­di­cat­ing icon in HTML)
  • So­phis­ti­cated sym­bol an­no­ta­tion, such as α = 360° and LaTeX
  • Sort­ing the ar­ti­cles by last mod­ifi­ca­tion date (us­ing Git blame to ig­nore mi­nor changes)
  • Iden­ti­fy­ing the mod­ifi­ca­tion date of each line in the ar­ti­cle (us­ing Git blame)
  • PDF down­load of any ar­ti­cle, or all ar­ti­cles in a chrono­log­ical or­der
  • Easy de­ploy­ment (just a sin­gle folder to be copied, which con­tains all static gen­er­ated con­tent)
  • BibTeX style article reference management
  • Im­age mag­ni­fi­ca­tion on click (easy with jQuery)
  • Ar­ti­cles' lan­guage and tag co-oc­cur­rence ma­trix
  • Au­to­matic hy­phen­ation in PHP to avoid sparse line when us­ing jus­ti­fied text in HTML

At the file sys­tem & Git level, each ar­ti­cle is stored un­der its own folder. This also en­sures that each ar­ti­cle has an unique iden­ti­fier, which is used as the file name of the HTML files, and also as a pre­fix for all pho­tos in ar­ti­cles. This avoids any name col­li­sions, and lets us put all static con­tent un­der the same folder for pub­li­ca­tion. The out­put gen­er­ation script is also re­spon­si­ble for call­ing pdfLa­Tex for each ar­ti­cle, and to copy PDFs along with ar­ti­cle's im­ages to the out­put folder.

A snap­shot of my Git com­mit mes­sages and times­tamps can be seen in Fig­ure 2. Ba­sic fea­tures such as tag based brows­ing and mus­tache based tem­plat­ing were quite fast to im­ple­ment. My nor­mal de­vel­op­ment hours seem to be 2pm to 2am.

git_log
Figure 2: A snap­shot of the de­vel­op­ment Git log.

These are some of the next fea­tures to be im­ple­mented (stroked out are done):

  • Im­prove LaTeX to­keniza­tion, cur­rent reg­ular ex­pres­sions based im­ple­men­ta­tion is lim­ited
  • Have a more scal­able method of han­dling im­ages, cur­rently they are part of the Git repos­itory
  • Sup­port for equa­tions (pos­si­bly in MathML)
  • Better graphical design with better support for IE9
  • Support for embedded source code (at least LaTeX, Matlab and PHP) with syntax highlight
  • Support for data in a table format
  • Free text search engine (it is possible to use PHP to generate a static JavaScript file for this)
  • Ability to comment on posts (using Disqus or something similar)

Writ­ing a new ar­ti­cle is a fairly triv­ial pro­cess. There are four .tex files which are shared among all ar­ti­cles:

  • com­mands.tex: De­fines com­mands for de­scrib­ing ar­ti­cle's meta­data, and for in­sert­ing the header, footer, fig­ures, ref­er­ences etc.
  • header.tex: Has all the \usep­ack­age com­mands, spec­ifies page mar­gins and starts the doc­ument.
  • sec­tion­Header.tex: Af­ter the ar­ti­cle's meta­data has been spec­ified, this in­serts the pro­ject's ti­tle, thumb­nail and dis­plays the meta­data.
  • footer.tex: This only has \end{doc­ument}

The be­gin­ning of this blog ar­ti­cle looks like this:

  • \input{../_latexCommon/commands.tex}
  • \myInput{../_latexCommon/header.tex}
  • \renewcommand{\basepath}{BlogPlatform}
  • \renewcommand{\projectTitle}{Blogging platform — What Would TeX Do?}
  • \renewcommand{\projectStart}{Summer 2013}
  • \renewcommand{\projectEnd}{Summer 2013}
  • \renewcommand{\projectLanguages}{PHP, LaTeX, Mustache}
  • \renewcommand{\projectDescription}{PHP, LaTeX and Git based blogging engine.}
  • \renewcommand{\projectTags}{\tagBlog, \tagGit}
  • \renewcommand{\projectChallenges}{To develop and maintain a fully functional blog.}
  • \input{../_latexCommon/sectionHeader.tex}

In the blog­ging plat­form all ''non-ar­ti­cles'' are in fold­ers which are pre­fixed with an un­der­score, so that PHP can eas­ily de­tect and ig­nore them dur­ing HTML gen­er­ation. The rest of the con­tent can be nor­mally writ­ten, and the file ends with an \my­Footer.

Fig­ures are in­serted by us­ing a cus­tom com­mand: \in­sert­Fig­ure{git_log.png}{A snap­shot of the de­vel­op­ment Git log.}{git_log}} Pa­ram­eters are file­name, cap­tion and fig­ure name. This has two ben­efits: stan­dard­ized im­age set­tings across all ar­ti­cles, and eas­ier TeX

To en­able multi-ar­ti­cle PDF gen­er­ation, the header isn't in­cluded by us­ing the stan­dard \in­clude, but in­stead use \my­In­clude. In com­mands.tex this is de­fined to use the nor­mal \in­clude, but when the multi-ar­ti­cle PDF is gen­er­ated this com­mands is re-de­fined not to do any­thing. Sim­ilarly the footer would not ac­tu­ally end the ar­ti­cle, but in­stead just in­sert a \clearpage com­mand to make the next ar­ti­cle start from a clean page.

With all these com­mands and prac­tices in place, gen­er­at­ing the multi-ar­ti­cle PDF is very easy. The com­bined.tex has iden­ti­cal first two lines as nor­mal ar­ti­cles, which in­puts the com­mands.tex and header.tex files. Then there are two tweaks, which dis­ables the \my­In­put and re­news \my­Footer to in­sert a \clearpage. Then the cover page and table of con­tents are gen­er­ated (I'm sure the line po­si­tion­ing could be done in a pret­tier way).

Af­ter the table of con­tents, an ad­di­tional _com­bined.tex is in­serted. This is auto-gen­er­ated by PHP, which just lists all ex­ist­ing ar­ti­cles. Be­fore in­putting each ar­ti­cle the \pro­ject­Mod­ified is de­fined, which is the file's mod­ifi­ca­tion date which is read from Git's com­mit log. Ar­ti­cles are au­to­mat­ically sorted from newest to old­est. When a sin­gle ar­ti­cle is gen­er­ated, this date is not avail­able. It would be pos­si­ble for PHP to write this times­tamp into a file in­side each ar­ti­cle's folder.

  • \input{../_latexCommon/commands.tex}
  •  
  • \renewcommand{\setFigureNumbering}{
  • \usepackage{chngcntr}
  • \counterwithin{figure}{section}
  • \counterwithin{table}{section}
  • }
  •  
  • \renewcommand{\Section}[1]{
  • \section{#1}
  • }
  •  
  • \input{../_latexCommon/header.tex}
  •  
  • \renewcommand{\myInput}[1]
  • \renewcommand{\myFooter}{\clearpage}
  •  
  • \vspace{2cm}
  •  
  • \begin{center}
  • \line(1,0){200}
    \\
  • \LARGE Niko Nyrhilä\\
  • \Large Combined project portfolio

  • \\
  • Generated \today \\
  • \line(1,0){200} \\
  • \end{center}
  •  
  • \vspace{1cm}
  • \tableofcontents
  • \clearpage
  •  
  • \input{_combined.tex}
  •  
  • \input{../_latexCommon/footer.tex}

Cur­rently the five first items of _com­bined.tex looks like this:

  • \renewcommand{\projectModified}{\textbf{Modified} & 2nd April 2018 \\}
  • \input{../JGitBlame/jgitblame.tex}
  •  
  • \renewcommand{\projectModified}{\textbf{Modified} & 7th May 2017 \\}
  • \input{../BenchmarkTaxiridesEsSql/benchmarktaxiridesessql.tex}
  •  
  • \renewcommand{\projectModified}{\textbf{Modified} & 19th March 2017 \\}
  • \input{../CljTaxirides/cljtaxirides.tex}
  •  
  • \renewcommand{\projectModified}{\textbf{Modified} & 25th January 2017 \\}
  • \input{../CljMustache/cljmustache.tex}
  •  
  • \renewcommand{\projectModified}{\textbf{Modified} & 20th November 2016 \\}
  • \input{../HierarchicalSchemaEs/hierarchicalschemaes.tex}


Related blog posts:

JGitBlame
CljMustache
CljHyphenation
PhpHyphenation
FinnishInvoiceTemplate