Niko's Project Corner

I wrote this blog engine to enable the creation of new articles in L^aT_eX format and effortlessly publish them in Web. As a by-product it enables trivial PDF generation of each article, or even a combined PDF with all the articles concatenated with a interactive table of contents, automatically numerated figures with references from the text, and many other features that L^aT_eX users take for granted.

First I looked at standard blogging solutions, but I quickly realized that my special requirements could difficult to implement as plugins or otherwise. At least it would take the same amount of time, and even then be more troublesome to maintain, deploy and backup. I really enjoy using L^aT_eX for writing documents, so I figured that why not to use it also for blogging.

This platform uses Git for SCM and CMS, L^aT_eX for writing, PHP to generate navigation links etc. and mustache templating to generate the final HTML. Most of the server side functionalities can be avoided by utilizing 3^rd party services such as Google Analytics and Disqus. To my surprise it took only about 1000 lines of PHP to get the basic stuff working such as tag-based browsing of articles and references to automatically numbered figures. The current architecture can be seen in Figure 1. All three subsystems (L^aT_eX, PHP and Mustache) are stored in a Git repository, and the logic to bring everything together is implemented in PHP. Actually this resembles a lot of the MVC pattern.The article (model) is stored in .tex files, for which two different views (PDF and HTML) can be produced. In this case the user cannot alter any of the views, so there is no need to deploy the actual model or controller components. Instead only static HTML, PDF etc. files need to be copied to the web server (or Dropbox's public folder).

Figure 1: Current blog project's architecture. Everything is based on Git, which stores and supports the three sub-systems. This resembles the MVC model, where model has none of the logic.

PHP reads the .tex file to produce the blog posts' body and meta data, which is passed to a Mustache template which produces HTML and CSS. PHP also has the logic to generate navigation links, find out most popular tags and languages, and to execute other useful tasks.

Currently the platform supports following features (in PDF and/or HTML):

Automatically numbered figures with named referencing from the main text
Writing articles using any text editor, but IDEs such as TeXworks have their benefits
Article browsing by project language or tag (in HTML)
Links to external sites (with an indicating icon in HTML)
Sophisticated symbol annotation, such as α = 360° and L^aT_eX
Sorting the articles by last modification date (using Git blame to ignore minor changes)
Identifying the modification date of each line in the article (using Git blame)
PDF download of any article, or all articles in a chronological order
Easy deployment (just a single folder to be copied, which contains all static generated content)
B^ibT_eX style article reference management
Image magnification on click (easy with jQuery)
Articles' language and tag co-occurrence matrix
Automatic hyphenation in PHP to avoid sparse line when using justified text in HTML

At the file system & Git level, each article is stored under its own folder. This also ensures that each article has an unique identifier, which is used as the file name of the HTML files, and also as a prefix for all photos in articles. This avoids any name collisions, and lets us put all static content under the same folder for publication. The output generation script is also responsible for calling pdfLaTex for each article, and to copy PDFs along with article's images to the output folder.

A snapshot of my Git commit messages and timestamps can be seen in Figure 2. Basic features such as tag based browsing and mustache based templating were quite fast to implement. My normal development hours seem to be 2pm to 2am.

Figure 2: A snapshot of the development Git log.

These are some of the next features to be implemented (stroked out are done):

Improve L^aT_eX tokenization, current regular expressions based implementation is limited
Have a more scalable method of handling images, currently they are part of the Git repository
Support for equations (possibly in MathML)
~~Better graphical design with better support for IE9~~
~~Support for embedded source code (at least L^aT_eX, Matlab and PHP) with syntax highlight~~
~~Support for data in a table format~~
~~Free text search engine (it is possible to use PHP to generate a static JavaScript file for this)~~
~~Ability to comment on posts (using Disqus or something similar)~~

Writing a new article is a fairly trivial process. There are four .tex files which are shared among all articles:

commands.tex: Defines commands for describing article's metadata, and for inserting the header, footer, figures, references etc.
header.tex: Has all the \usepackage commands, specifies page margins and starts the document.
sectionHeader.tex: After the article's metadata has been specified, this inserts the project's title, thumbnail and displays the metadata.
footer.tex: This only has \end{document}

The beginning of this blog article looks like this:

\input{../_latexCommon/commands.tex}
\myInput{../_latexCommon/header.tex}
\renewcommand{\basepath}{BlogPlatform}
\renewcommand{\projectTitle}{Blogging platform — What Would T_eX Do?}
\renewcommand{\projectStart}{Summer 2013}
\renewcommand{\projectEnd}{Summer 2013}
\renewcommand{\projectLanguages}{PHP, L^aT_eX, Mustache}
\renewcommand{\projectDescription}{PHP, L^aT_eX and Git based blogging engine.}
\renewcommand{\projectTags}{\tagBlog, \tagGit}
\renewcommand{\projectChallenges}{To develop and maintain a fully functional blog.}
\input{../_latexCommon/sectionHeader.tex}

In the blogging platform all ''non-articles'' are in folders which are prefixed with an underscore, so that PHP can easily detect and ignore them during HTML generation. The rest of the content can be normally written, and the file ends with an \myFooter.

Figures are inserted by using a custom command: \insertFigure{git_log.png}{A snapshot of the development Git log.}{git_log}}Parameters are filename, caption and figure name. This has two benefits: standardized image settings across all articles, and easier T_eX

To enable multi-article PDF generation, the header isn't included by using the standard \include, but instead use \myInclude. In commands.tex this is defined to use the normal \include, but when the multi-article PDF is generated this commands is re-defined not to do anything. Similarly the footer would not actually end the article, but instead just insert a \clearpage command to make the next article start from a clean page.

With all these commands and practices in place, generating the multi-article PDF is very easy. The combined.tex has identical first two lines as normal articles, which inputs the commands.tex and header.tex files. Then there are two tweaks, which disables the \myInput and renews \myFooter to insert a \clearpage. Then the cover page and table of contents are generated (I'm sure the line positioning could be done in a prettier way).

After the table of contents, an additional _combined.tex is inserted. This is auto-generated by PHP, which just lists all existing articles. Before inputting each article the \projectModified is defined, which is the file's modification date which is read from Git's commit log. Articles are automatically sorted from newest to oldest. When a single article is generated, this date is not available. It would be possible for PHP to write this timestamp into a file inside each article's folder.

\input{../_latexCommon/commands.tex}
\renewcommand{\setFigureNumbering}{
\usepackage{chngcntr}
\counterwithin{figure}{section}
\counterwithin{table}{section}
}
\renewcommand{\Section}[1]{
\section{#1}
}
\input{../_latexCommon/header.tex}
\renewcommand{\myInput}[1]
\renewcommand{\myFooter}{\clearpage}
\vspace{2cm}
\begin{center}
\line(1,0){200}
\\
\LARGE Niko Nyrhilä\\
\Large Combined project portfolio
\\
Generated \today \\
\line(1,0){200} \\
\end{center}
\vspace{1cm}
\tableofcontents
\clearpage
\input{_combined.tex}
\input{../_latexCommon/footer.tex}

Currently the five first items of _combined.tex looks like this:

\renewcommand{\projectModified}{\textbf{Modified} & 19th April 2023 \\}
\input{../Countdown/countdown.tex}
\renewcommand{\projectModified}{\textbf{Modified} & 2nd April 2023 \\}
\input{../Bananagrams1/bananagrams1.tex}
\renewcommand{\projectModified}{\textbf{Modified} & 10th November 2022 \\}
\input{../StableDiffusionBasics/stablediffusionbasics.tex}
\renewcommand{\projectModified}{\textbf{Modified} & 19th July 2022 \\}
\input{../Puzzles/puzzles.tex}
\renewcommand{\projectModified}{\textbf{Modified} & 3rd February 2022 \\}
\input{../SpeechMusicSeparation/speechmusicseparation.tex}

JGit blame for fun and profit(?), 2018 Apr (Matching: Git)
Mustache templates in Clojure, 2017 Jan (Matching: Blog)
English hyphenation algorithm in Clojure, 2016 Aug (Matching: Blog)
English hyphenation algorithm in PHP, 2013 Jul (Matching: Blog, PHP)
Finnish Invoice Template, 2016 Aug (Matching: L^aT_eX)

Home	(Home page)
About	(About me)
Platform	(About this blog)

LinkedIn	(Niko Nyrhilä)
GitHub	(nikonyrh)
Stackoverflow	(nikonyrh)

Bruteforcing Countdown numbe...	(2023 Apr)
Cheating at Bananagrams with...	(2023 Apr)
Introduction to Stable Diffu...	(2022 Nov)
Matching puzzle pieces together	(2022 Jul)
Single channel speech / musi...	(2022 Feb)

Computer Vision	(13)
GitHub	(12)
Databases	(9)
Elasticsearch	(6)
FFT	(5)
Rendering	(5)
Applied mathematics	(4)

Blogging platform — What Would T_eX Do?

Related blog posts:

Home

Navigation

External

Most recent

Most frequent tags

Most frequent languages

Co-occurrence matrix

	Matl	Pyth	C++	Cloj	Bash	Kera
Comput	6	6	3	1	0	5
GitHub	0	2	1	4	3	0
Databa	0	3	2	2	1	0
Render	3	0	3	0	0	0
Nginx	0	1	0	0	4	0
Autoen	0	3	0	1	0	2
Elasti	0	2	0	3	0	0
FFT	3	1	1	0	0	1
Data S	2	1	2	1	0	1
JVM	0	1	0	3	1	0
Docker	0	1	0	0	3	0
FastCG	0	0	3	0	0	0
Applie	2	2	0	0	0	0
Field	2	0	2	0	0	0
Omnidi	2	0	2	0	0	0
Affine	2	0	2	0	0	0
Master	1	0	2	0	0	0
Archit	0	1	0	0	2	0
Visual	1	0	2	0	0	0
Spark	0	1	0	0	2	0
Blog	0	0	0	2	0	0
Hyphen	0	0	0	2	0	0
Stack	0	1	1	0	0	0
SQL	0	0	1	1	0	0
Busine	0	1	0	1	0	0
Signal	0	1	0	0	0	1
Encryp	0	0	0	0	1	0
Git	0	0	0	1	0	0
Stable	0	1	0	0	0	0
Redis	0	1	0	0	0	0
Thrust	0	0	1	0	0	0
Kibana	0	0	0	1	0	0
Astron	1	0	0	0	0	0
Mustac	0	0	1	0	0	0
NAT	0	0	0	0	1	0
jQuery	0	0	1	0	0	0
SSH	0	0	0	0	1	0
Happyh	0	0	1	0	0	0
Backup	0	0	0	0	1	0
Pthrea	0	0	1	0	0	0
AWS	0	0	0	0	1	0
SIFT	0	0	1	0	0	0
SURF	0	0	1	0	0	0
Conjug	0	0	1	0	0	0
Kalman	0	0	1	0	0	0
Partic	0	0	1	0	0	0
Gradie	0	0	1	0	0	0
Simult	0	0	1	0	0	0
Roboti	0	0	1	0	0	0
Princi	1	0	0	0	0	0
Receiv	1	0	0	0	0	0
Linear	1	0	0	0	0	0
Suppor	1	0	0	0	0	0
Machin	1	0	0	0	0	0
Discre	1	0	0	0	0	0