Contact me

Twitter  ⟐  LinkedIn
Christophe Delord


News!

Monday 18. july 2016: Updates on my new simulation framework project in Haskell.

Friday 25. march 2016: Dear backers, unfortunately, the FUN project was not successfully funded. I will now focus on FRP (Functional Reactive Programming) applied to real-time critical system specification and simulation.

CDSoft :: CV/Resume :: Free softwares Essays Haskell Handy Calc pp TPG BonaLuna Calculadoira todo pwd w Live :: AI tools in Prolog AI dialog

PP - Generic preprocessor (with pandoc in mind)

PP is a text preprocessor designed for Pandoc (and more generally Markdown and reStructuredText).

The PP package used to contain three preprocessors for Pandoc.

I started using Markdown and Pandoc with GPP. Then I wrote DPP to embed diagrams in Markdown documents. And finally PP which merges the functionalities of GPP and DPP.

GPP and DPP are no longer included in PP as pp can now be used standalone. dpp and gpp can be found in the legacy DPP repository.

pp now implements:

Open source

PP is an Open source software. Anybody can contribute on GitHub to:

Installation

Compilation:

  1. Download and extract pp.tgz.
  2. Run make dep to install Haskell required packages.
  3. Run make.

Installation:

pp requires Graphviz and Java (PlantUML and ditaa are embedded in pp).

Precompiled binaries:

The recommended way to get PP binaries is to compile them from the sources. Anyway if you have no Haskell compiler, you can try some precompiled binaries.

Usage

pp is a simple preprocessor written in Haskell. It’s mainly designed for Pandoc but may be used as a generic preprocessor. It is not intended to be as powerful as GPP, for instance, but is a simple implementation for my own needs, as well as an opportunity to play with Haskell.

pp takes strings as input and incrementally builds an environment which is a lookup table containing variables and various other information. Built-in macros are Haskell functions that takes arguments (strings) and the current environment and build a new environment in the IO monad. User defined macros are simple definitions, arguments are numbered 1 to N.

pp emits the preprocessed document on the standard output. Inputs are listed on the command line and concatenated, the standard input is used when no input is specified.

Command line

pp executes arguments in the same order as the command line. It starts with an initial environment containing:

The dialect is used to format links and images in the output documents. Currently only Markdown and reStructuredText are supported.

If no input file is specified, pp preprocesses the standard input.

The command line arguments are intentionally very basic. The user can define and undefine variables and list input files.

-h
displays some help and exits.
-v
displays the current version and exits.
-DSYMBOL[=VALUE] or -D SYMBOL[=VALUE]
adds the symbol SYMBOL to the current environment and associates it to the optional value VALUE. If no value is provided, the symbol is simply defined with an empty value.
-USYMBOL or -U SYMBOL
removes the symbol SYMBOL from the current environment.
-fr|-it|-en
changes the current language.
-html|-pdf|-odt|-epub|-mobi
changes the current output file format.
-md|-rst
changes the current dialect (-md is the default dialect).
-img=PREFIX or -img PREFIX
changes the prefix of the images output path.
-import=FILE or -import FILE
preprocessed FILE but discards its output. It only keeps macro definitions and other side effects.

Other arguments are filenames.

Files are read and preprocessed using the current state of the environment. The special filename “-” can be used to preprocess the standard input.

Macros

Diagram and script examples

Diagrams

Diagrams are written in code blocks as argument of a diagram macro. The first line contains the macro:

Block delimiters are made of three or more tilda or back quotes, at the beginning of the line (no space and no tab). The end delimiter must at least as long as the beginning delimiter.

\raw{\dot(path/imagename)(optional legend)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    graph {
        "source code of the diagram"
    }
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
}

This extremely meaningful diagram is rendered as path/imagename.png and looks like:

(pp-syntax)(optional legend) ~~~~~ graph { “source code of the diagram” } ~~~~~

The image link in the output markdown document may have to be different than the actual path in the file system. This happens when then .md or .html files are not generated in the same path than the source document. Brackets can be used to specify the part of the path that belongs to the generated image but not to the link in the output document. For instance a diagram declared as:

\raw(\dot([mybuildpath/]img/diag42)...)

will be actually generated in:

mybuildpath/img/diag42.png

and the link in the output document will be:

img/diag42.png

For instance, if you use Pandoc to generate HTML documents with diagrams in a different directory, there are two possibilities:

  1. the document is a self contained HTML file (option --self-contained), i.e. the CSS and images are stored inside the document:
  2. the document is not self contained, i.e. the CSS and images are stored apart from the document:

Pandoc also accepts additional attributes on images (link_attributes extension). These attributes can be added between curly brackets to the first argument. e.g.:

\raw(\dot(image.png { width=50 % })(caption)(...))

will generate the following link in the markdown output:

![caption](image.png){ width=50 % }

The diagram generator can be:

pp will not create any directory,i the path where the image is written must already exist.

(pp-generators) ~~~~~ digraph {

subgraph cluster_cmd {
    label = "diagram generators"
    dot neato twopi circo fdp sfdp patchwork osage uml ditaa
}

PP [label="pp" shape=diamond]
dot neato twopi circo fdp sfdp patchwork osage uml ditaa
GraphViz [shape=box]
PlantUML [shape=box]
DITAA [shape=box label=ditaa]

PP -> {dot neato twopi circo fdp sfdp patchwork osage uml ditaa}
dot -> GraphViz
neato -> GraphViz
twopi -> GraphViz
circo -> GraphViz
fdp -> GraphViz
sfdp -> GraphViz
patchwork -> GraphViz
osage -> GraphViz
uml -> PlantUML
ditaa -> DITAA

} ~~~~~

Scripts

Scripts are also written in code blocks as arguments of a macro.

\raw{\bash
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
echo Hello World!
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
}

With no surprise, this script generates:

\bash
~~~~~
echo Hello World!
~~~~~

The script language macro can be:

pp will create a temporary script before calling the associated interpretor.

(pp-scripts) ~~~~~ digraph {

subgraph cluster_cmd {
    label = "script languages"
    bash sh python haskell cmd powershell
}

PP [shape=diamond label="pp"]
bash sh cmd python haskell
Bash [shape=box label="bash\nor bash.exe"]
Sh [shape=box label="sh\nor sh.exe"]
Python [shape=box label="python\nor python.exe"]
Haskell [shape=box label="runhaskell\nor runhaskell.exe"]
Cmd [shape=box label="wine cmd /c\nor cmd /c"]
PowerShell [shape=box label="(Windows only)\npowershell.exe"]

PP -> {bash sh python haskell cmd powershell}
bash -> Bash
sh -> Sh
python -> Python
haskell -> Haskell
cmd -> Cmd
powershell -> PowerShell

} ~~~~~

Examples

The source code of this document contains some diagrams.

Here are some simple examples. For further details about diagrams’ syntax, please read the documentation of GraphViz, PlantUML and ditaa.

Graphviz

GraphViz is executed when one of these keywords is used: dot, neato, twopi, circo, fdp, sfdp, patchwork, osage

\raw{\twopi(doc/img/pp-graphviz-example)(This is just a GraphViz diagram example)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
digraph {
    O -> A
    O -> B
    O -> C
    O -> D
    D -> O
    A -> B
    B -> C
    C -> A
}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
}

Once generated the graph looks like:

(pp-graphviz-example)(This is just a GraphViz diagram example) ~~~~~ digraph { O -> A O -> B O -> C O -> D D -> O A -> B B -> C C -> A } ~~~~~

GraphViz must be installed.

PlantUML

PlantUML is executed when the keyword uml is used. The lines @startuml and @enduml required by PlantUML are added by pp.

\raw{\uml(pp-plantuml-example)(This is just a PlantUML diagram example)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Alice -> Bob: Authentication Request
Bob --> Alice: Authentication Response
Alice -> Bob: Another authentication Request
Alice <-- Bob: another authentication Response
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
}

Once generated the graph looks like:

(pp-plantuml-example)(This is just a PlantUML diagram example) ~~~~~ Alice -> Bob: Authentication Request Bob –> Alice: Authentication Response Alice -> Bob: Another authentication Request Alice <– Bob: another authentication Response ~~~~~

PlantUML is written in Java and is embedded in pp. Java must be installed.

Ditaa

ditaa is executed when the keyword ditaa is used.

\raw{\ditaa(pp-ditaa-example)(This is just a Ditaa diagram example)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    +--------+   +-------+    +-------+
    |        | --+ ditaa +--> |       |
    |  Text  |   +-------+    |diagram|
    |Document|   |!magic!|    |       |
    |     {d}|   |       |    |       |
    +---+----+   +-------+    +-------+
        :                         ^
        |       Lots of work      |
        +-------------------------+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
}

Once generated the graph looks like:

(pp-ditaa-example)(This is just a Ditaa diagram example) ~~~~~ +——–+ +——-+ +——-+ | | –+ ditaa +–> | | | Text | +——-+ |diagram| |Document| |!magic!| | | | {d}| | | | | +—+—-+ +——-+ +——-+ : ^ | Lots of work | +————————-+ ~~~~~

ditaa is written in Java and is embedded in pp. Java must be installed.

Bash

Bash is executed when the keyword bash is used.

\raw{\bash
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
echo "Hi, I'm $SHELL $BASH_VERSION"
RANDOM=42 # seed
echo "Here are a few random numbers: $RANDOM, $RANDOM, $RANDOM"
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
}

This script outputs:

\bash
~~~~~
echo "Hi, I'm $SHELL $BASH_VERSION"
RANDOM=42 # seed
echo "Here are a few random numbers: $RANDOM, $RANDOM, $RANDOM"
~~~~~

Note: the keyword sh executes sh which is generally a link to bash.

Cmd

Windows’ command-line interpreter is executed when the keyword cmd is used.

\raw{\cmd
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
echo Hi, I'm %COMSPEC%
ver
if "%WINELOADER%%WINELOADERNOEXEC%%WINEDEBUG%" == "" (
    echo This script is run from wine under Linux
) else (
    echo This script is run from a real Windows
)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
}

This script outputs:

\cmd
~~~~~
echo Hi, I'm %COMSPEC%
ver
if "%WINELOADER%%WINELOADERNOEXEC%%WINEDEBUG%" == "" (
    echo This script is run from a real Windows
) else (
    echo This script is run from wine under Linux
)
~~~~~

Python

Python is executed when the keyword python is used.

\raw{\python
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
import sys
import random

if __name__ == "__main__":
    print("Hi, I'm Python %s"%sys.version)
    random.seed(42)
    randoms = [random.randint(0, 1000) for i in range(3)]
    print("Here are a few random numbers: %s"%(", ".join(map(str, randoms))))
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
}

This script outputs:

\python
~~~~~
import sys
import random

if __name__ == "__main__":
    print("Hi, I'm Python %s"%sys.version)
    random.seed(42)
    randoms = [random.randint(0, 1000) for i in range(3)]
    print("Here are a few random numbers: %s"%(", ".join(map(str, randoms))))
~~~~~

Haskell

Haskell is executed when the keyword haskell is used.

\raw{\haskell
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
import System.Info
import Data.Version
import Data.List

primes = filterPrime [2..]
    where filterPrime (p:xs) =
            p : filterPrime [x | x <- xs, x `mod` p /= 0]

version = showVersion compilerVersion

main = do
    putStrLn $ "Hi, I'm Haskell " ++ version
    putStrLn $ "The first 10 prime numbers are: " ++
                intercalate " " (map show (take 10 primes))
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
}

This script outputs:

\haskell
~~~~~
import System.Info
import Data.Version
import Data.List

primes = filterPrime [2..]
    where filterPrime (p:xs) =
            p : filterPrime [x | x <- xs, x `mod` p /= 0]

version = showVersion compilerVersion
main = do
    putStrLn $ "Hi, I'm Haskell " ++ version
    putStrLn $ "The first 10 prime numbers are: " ++
                intercalate " " (map show (take 10 primes))
~~~~~

OS support

PP is meant to be portable and multi platform. To be OS agnostic, the use free script languages is strongly recommended. For instance, bash scripts are preferred to proprietary closed languages because they can run on any platform. It is standard on Linux and pretty well supported on Windows (Cygwin, MSYS/Mingw, Git Bash, BusyBox, …). Python is also a good choice.

Anyway, if some documents require portability and specific tools, PP provides some macros to detect the OS ((\os, \arch)). E.g.:

\raw[\quiet
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
\ifeq(\os)(linux)
`````````````````````
\def(linux)(\1)
\def(win)()
`````````````````````
\ifeq(\os)(windows)
`````````````````````
\def(linux)()
\def(win)(\1)
`````````````````````
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

\win(Sorry, you're running Windows)
\linux(Hello, happy GNU/Linux user)
]

The (\exec) macro is also OS aware. It runs the default shell according to the OS (sh on Linux and MacOS, cmd on Windows).

Third-party documentations, tutorials and macros

Licenses

PP

Copyright (C) 2015, 2016, 2017 Christophe Delord
http://www.cdsoft.fr/pp

PP is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

PP is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with PP. If not, see http://www.gnu.org/licenses/.

PlantUML

PlantUML.jar is integrated in PP. PlantUML is distributed under the GPL license. See http://plantuml.sourceforge.net/faq.html.

ditaa

ditaa.jar is not integrated anymore in PP. The ditaa version used is the one already integrated in PlantUML. ditaa is distributed under the GNU General Public License version 2.0 (GPLv2). See http://sourceforge.net/projects/ditaa/.

Feedback

Your feedback and contributions are welcome. You can contact me at http://cdsoft.fr

Support

If you find these softwares useful, you are free to donate something to support their future evolutions. Thanks for your support.

You can use Flattr, PayPal, buy some CDSoft products or simply disable your ad-blocker to support these softwares.

Flattr PayPal Essays