hydrox

dive deeper into your code

1    Introduction

1.1    Overview

hydrox assists in the transmission of knowledge around a clojure project, providing in-repl management of documentation, docstrings and metadata through the reuse/repurposing of test code. Facillitating the creation of 'documentation that we can run', the tool allows for a design-orientated workflow for the programming process, blurring the boundaries between design, development, testing and documentation.

fig.1  -  Overview

1.2    Installation

1.2.1    Standard

In your project.clj, add hydrox to the [:profiles :dev :dependencies] entry:

(defproject ...
    ...
    :profiles {:dev {:dependencies [...
                                    [helpshift/hydrox "0.1.15"]
                                    ...]}}
    ...)

All functionality is the hydrox.core namespace:

(use 'hydrox.core) (dive) ;; initialises the tool, and watches project for files changes (import-docstring) ;; imports docstrings into functions from test files (purge-docstring) ;; purges docstrings from functions (generate-docs) ;; generates html (like this one) from test files (surface) ;; stops the tool

1.2.2    Vinyasa

A better experience can be obtained by using vinyasa and editing your ~/.lein/profiles.clj to inject all hydrox vars into the . namespace:

{:user
   {:plugins [...]
    :dependencies [[im.chit/vinyasa.inject "0.3.4"]
                   [helpshift/hydrox "0.1.15"]]
    :injections
    [(require '[vinyasa.inject :as inject])
     (inject/in [hydrox.core dive surface generate-docs
                             import-docstring purge-docstring])]}}

Functions can then be used like this within the repl:

(./dive) (./import-docstring) (./purge-docstring) (./generate-docs) (./surface)

1.3    Motivation

Programming is a very precise art form. Programming mistakes, especially the little ones, can result in dire consequences and much wasted time. We therefore use tools such as debuggers, type checkers and test frameworks to make our coding lives easier and our source code correct.

Documentation is the programmers' means of communicating how to use or build upon a library, usually to a larger audience of peers. This means that any mistakes in the documentation results in wasted time for all involved. Therefore any mistake in documentation can have a greater effect than a mistake in source code because it wastes everybody's time.

There are various tools for documentation like latex, wiki and markdown. However, none address the issue that code examples in the documentation are not checked for correctness. hydrox attempts to bridge the gap between writing tests and writing documentation by introducing the following features:

1.3.1    Features

The features are:

  1. To generate .html documentation from a .clj test file.
  2. Management of function docstrings and metadata through tests
  3. Express documentation elements as clojure datastructures.
  4. Render code, clojure.test and midje test cases as examples.
  5. Latex-like numbering and linking facilities.

1.3.2    Benefits

In this way, the project benefits in multiple ways:

  1. All documentation errors can be eliminated.
  2. Removes the need to cut and copy test examples into a readme file.

1.3.3    Improvements

The precessor of hydrox was midje-doc, and it provided very much the same functionalities as the current hydrox implementation. Whilst midje-doc was built primarily as a leiningen plugin, it was found that the tool was more effective for the development when used within the repl.

There are significant improvements of hydrox over midje-doc including:

  • Declarative source code traversal using jai
  • Extensible micropass pipeline for compilation and linking
  • Customisable templates system for html documentataion
  • Code-diffing mechanism for efficient management of filesystem changes

1.4    Literate Programming

The phrase 'Literate Programming' has been bandied around alot lately. The main idea is that the code is written in a way that allows both a machine and a person to understand what is going on. Most seem to agree that it is a great idea. The reuse factor of not writing seperate documentation alone brings great excitement to many developers. However, we must understand that methods of effective communication to humans and machines are very different:

Communication to Machines are usually linear and based on a specific set of instructions. First Do This, Then Do That.... Machines don't really care what the code does. It just executes whatever code it has been given. The main importance of establishing that a program is correct is to give it a set of verifible input/output responses and see if it behaves a certain way.

Communication to Humans are usually less procedural and more relational. We wish to be engaged, inspired and taught, not given a sequence of instructions that each break down to even smaller sequences. The best documentation are usally seperated into logical sections like an overview, table of contents, list of figures, chapters, sections and subsections. There are text, code, pictures, even sound and video. Documentation structure resemble trees, with links between content that connect related topics and content. They do not resemble program code and therefore should be created independently of the machine code itself.

2    Facillitating Communication

2.1    Code as Communication

The primary reason for building hydrox was to simplify the communication process across a clojure project. In general, the programming process can be seen as a set of different communication types with the developer acting as the facillitator for the following:

  1. with the machine telling it what to do (code)
  2. with the machine, verifying that it has done it's job (unit and integration tests)
  3. with themselves/other developers, telling them how to use the function (unit tests and docstrings)
  4. with the end user, telling them how to do it (design/integration documentation)

Notice that with the 4 points listed, there happens to be some overlap between 2 and 3 (unit tests) as well as 2 and 4 (integration). We can categorise the overlap as follows:

  1. unit/function level documentation (intended for communicating with a machine/developer)
  2. design/integration level documentation (intended for communicating with a consumer)

This overlap is very important because this is the one area where repetitive activities occur. hydrox helps limit this repetition through automation of the documentation workflow.

2.2    Functional Level Documentation

The best description for our functions are not found in source files but in the test files. Tests are potentially the best documentation because they provide information about what a function outputs, what inputs it accepts and what exceptions it throws. The best place to display functional level documentation is in the docstring as the clojure ecosystem has amazing support for the uses of docstings in tools such as cider, cursive, codox, grimoire and many others.

Lets discuss the problem with the following piece of code:

e.2.1  -  split-string source
(defn split-string "The split-string function is used to split a string in two according to the idx that is passed." [s idx] [(.substring s 0 idx) (.substring s idx)])

Technically, everything is fine except that the docstring is pretty much useless. Instead of writing vague phrases, we write our test as follows (usually in another file):

e.2.2  -  split-string tests
^{:refer documentation.hydrox-guide/split-string :added "0.1"} (facts "split-string usage:" (split-string "abcde" 1) => ["a" "bcde"] (split-string "abcde" 3) => ["abc" "de"])

It can be seen that test cases provides a much better explaination for split-string than the source docstring because any developer can immediately tell what inputs the function accepts and what outputs it returns. In this case, hydrox provides the function import-docstring to turn a test into a docstring and then deliver it back into the source code.

2.2.1    Import Docstring

The following hydrox calls:

(hydrox.core/dive) (hydrox.core/import-docstring)

Results in the split-string function looking like this:

e.2.3  -  split-string after import-docstring
(defn split-string "split-string usage: (split-string \"abcde\" 1) => [\"a\" \"bcde\"] (split-string \"abcde\" 3) => [\"abc\" \"de\"]" {:added "0.1"} [s idx] [(.substring s 0 idx) (.substring s idx)])

It is possible to target the docstring of a source file when the test code is in another file is due to the ^{:refer documentation.hydrox-guide/split-string :added "0.1"} metadata that was included as part of the test. The other call to dive was needed to link the source and test files. It only needs to called once and will be explained in a later section.

2.2.2    Purge Docstring

Sometimes, it's better to just concentrate on the function itself for more clarity, in this case, the purge-docstring function is really useful:

(hydrox.core/purge-docstring)

Results in the split-string function looking like this, having no docstring or metadata:

e.2.4  -  split-string after purge-docstring
(defn split-string [s idx] [(.substring s 0 idx) (.substring s idx)])

To bring the docstring and metadata back again, just make another call to import-docstring. With the tests for split-string in place, it is easy to manage metadata and docstrings. Having the information within the test files decreases source code clutter and we can import/purge them as needed, dependending upon the situation.

2.3    Design Level Documentation

Documentation at the design level requires more visual elements than documentation at the function level. hydrox can generate html output based on a .clj file and a template. This requires some configuration and so the following is placed in the project.clj map.

(defproject ... ... :documentation {:site "sample" :output "docs" :template {:path "template" :copy ["assets"] :defaults {:template "article.html" :navbar [:file "partials/navbar.html"] :dependencies [:file "partials/deps-web.html"] :navigation :navigation :article :article}} :paths ["test/documentation"] :files {"sample-document" {:input "test/documentation/sample_document.clj" :title "a sample document" :subtitle "generating a document from code"}}} ...)

The :documentation key in defproject specifies which files to use as entry points to use for html generation. A sample can be seen below:

(ns documentation.sample.document (:require [midje.sweet :refer :all])) [[:chapter {:tag "hello" :title "Hello Midje Doc"}]] "This is an introduction to writing with midje-doc." [[:section {:title "Defining a function"}]] "We define function `add-5` below:" [[{:numbered false}]] (defn add-5 [x] (+ x 5)) [[:section {:title "Testing a function"}]] "`add-5` outputs the following results:" (facts [[{:tag "add-5-1" :title "1 add 5 = 6"}]] (add-5 1) => 6 [[{:tag "add-5-10" :title "10 add 5 = 15"}]] (add-5 10) => 15)

A pretty looking html document can be generated by running generate-docs:

(hydrox.core/generate-docs)

The :output entry specifies the directory that files are rendered to, :template' specifies the files that are needed to generate the file. The current template will generate the same as the current documentation.

3    API Reference

3.1    The Basics

3.1.1    elements

Elements are constructed using a tag and a map contained within double square brackets. Elements tags have been inspired from latex:

Clojure strings are treated as paragraph elements whilst clojure forms are treated as code elements. fact and comment forms are also considered code elements. Elements will be described in detail in their respective sections.

e.3.1  -  Element Notation
[[<tag> {<key1> <value1>, <key2> <value2>}]] for example [[:chapter {:title "Hello World" :tag "hello"}]]

3.1.2    attributes

Attribute add additional metadata to elements. They are written as a single hashmap within double square brackets. Attributes mean nothing by themselves. They change the properties of elements directly after them.

[[{:tag "my-paragraph"}]] [[:paragraph {:content "This is a paragraph"}]] is equivalent to [[:paragraph {:content "This is a paragraph" :tag "my-paragraph"}]]

Multiple attributes can be stacked to modify an element:

[[{:numbered false}]] [[{:lang "shell"}]] (comment > lein repl)

produces the following shell code:

> lein repl

3.2    Sectioning Elements

Sectioning elements are taken from latex and allow the document to be organised into logical sections. From highest to lowest order of priority, they are: :chapter, section, subsection and :subsubsection, giving four levels of organisation.

The numbering for elements are generated in sequencial order: (1, 2, 3 ... etc) and a tag can be generated from the title or specified for creating links within the document. :chapter, section and subsection elements are list in the table of contents using tags.

For example, I wish to write a chapter about animals and have organised content into categories shown below.

Animals - Mammals - Birds *- Can Fly **- Eagle **- Hummingbird *- Flightless **- Penguin

It is very straight forward to turn this into sectioning elements which will then generate the sectioning numbers for different categories

[[:chapter {:title "Animals"}]] [[:section {:title "Mammals"}]] [[:section {:title "Birds"}]] [[:subsection {:title "Can Fly"}]] [[:subsubsection {:title "Eagle"}]] [[:subsubsection {:title "Hummingbird"}]] [[:subsection {:title "Flightless"}]] [[:subsubsection {:title "Penguin"}]]

The sections will be automatically numbered as show below:

*1 Animals ;; animals *1.1 Mammals ;; mammals *1.2 Birds ;; birds *1.2.1 Can Fly ;; can-fly *1.2.1.1 Eagle ;; eagle *1.2.1.2 Hummingbird ;; hummingbird *1.2.2 Flightless ;; flightless *1.2.2.1 Penguin ;; penguin

3.3    Content Elements

Content elements include :paragraph, :image, and file elements.

3.3.1    :paragraph

Paragraph elements should make up the bulk of the documentation. They can be written as an element or in the usual case, as a string. The string is markdown with templating - so that chapter, section, code and image numbers can be referred to by their tags.

e.3.2  -  Paragraph Element
[[:paragraph {:content "Here is some content"}]]
e.3.3  -  Paragraph String
"Here is some content"
e.3.4  -  Markdown String
[[:chapter {:title "Chapter Heading" :tag "ch-heading"}]] " # Heading One Here is some text. Here is a tag reference to Chapter Heading - {{ch-heading}} - Here is a bullet point - Here is another one"

3.3.2    :image

The :image element embeds an image as a figure within the document. It is numbered and can be tagged for easy reference. The code example produces the image seen in Figure 2:

[[:image {:tag "clojure-logo" :title "Clojure Logo (source clojure.org)" :src "http://clojure.org/space/showimage/clojure-icon.gif"}]]

fig.2  -  Clojure Logo (source clojure.org)

3.3.3    :file

The :file element allows inclusion of other files into the document. It is useful for breaking up a document into managable chunks. A file element require that the :src attribute be specified. A high-level view of a document can thus be achieved, making the source more readable. This is similar to the include element in latex.

e.3.5  -  :file tag example
[[:file {:src "test/docs/first_section.clj"}]] [[:file {:src "test/docs/second_section.clj"}]] [[:file {:src "test/docs/third_section.clj"}]]

3.4    Code Elements

Code displayed in documentation are of a few types:

  1. Code that needs to be run (normal clojure code)
  2. Code that needs verification taking input and showing output. (midje fact)
  3. Code that should not be run (namespace declaration examples)
  4. Code that is part of the library's tests or source definition
  5. Code in other languages

The different types of code can be defined so that code examples render properly using a variety of methods

3.4.1    normal s-expressions

Normal s-expressions are rendered as is. Attributes can be added for grouping purposes. The source code shown below

e.3.6  -  seperating code blocks through attributes
[[{:title "add-n definition" :tag "c-add-1"}]] (defn add-n [n] (fn [x] (+ x n))) [[{:title "add-4 and add-5 definitions" :tag "c-add-2"}]] (def add-4 (add-n 4)) (def add-5 (add-n 5))

renders the following outputs:

e.3.7  -  add-n definition
(defn add-n [n] (fn [x] (+ x n)))
e.3.8  -  add-4 and add-5 definitions
(def add-4 (add-n 4)) (def add-5 (add-n 5))

3.4.2    test forms

Documentation examples put in facts forms allows the code to be verified for correctness using lein midje. Document element notation still be rendered except before and after the midje arrows (=>). Consecutive code within a fact form will stacked as one common code block.

The source form:

e.3.9  -  Facts Form Source
[[{:tag "facts-form-output" :title "Facts Form Output"}]] (facts [[{:title "Definining an atom" :tag "c-facts-1"}]] (def a (atom 1)) (deref a) => 1 [[{:title "Updating the atom" :tag "c-facts-2"}]] (swap! a inc) (deref a) => 2)

Renders the following output:

e.3.10  -  Definining an atom
(def a (atom 1)) (deref a) => 1
e.3.11  -  Updating the atom
(swap! a inc) (deref a) => 2

3.4.3    fact form

For an entire block to be embedded in code, use the fact form. The source form:

e.3.12  -  Fact Form Source
[[{:tag "fact-form-output" :title "Fact Form Output"}]] (fact (def a (atom 1)) (deref a) => 1 (swap! a inc 1) (deref a) => 2)

Renders the following output:

e.3.13  -  Fact Form Output
(def a (atom 1)) (deref a) => 1 (swap! a inc) (deref a) => 2

3.4.4    comments

Comments are clojure's built-in method of displaying non-running code and so this mechanisim is used in clojure for displaying code that should not be run, but still requires display. Code can still output without interferring with code or tests.

e.3.14  -  Switching to a new namespace
[[{:title "Switching to a new namespace" :tag "c-com-1"}]] (comment (in-ns 'hello.world) (use 'clojure.string) (split "Hello World" #"\s") ;=> ["Hello" "World"] )
e.3.15  -  Switching to a new namespace
(in-ns 'hello.world) (use 'clojure.string) (split "Hello World" #"\s") ;=> ["Hello" "World"]

3.4.5    reference

Sometimes we wish to refer to source/test code that is already in our library, we can do this using the :reference directive:

[[:reference {:refer hydrox.core/dive}]]
e.3.16  -  source of hydrox.core/dive
(defn dive ([] (dive "project.clj")) ([path] (dive path {})) ([path opts] (patch/patch-read-keyword) (->> (io/file path) (util/read-project) (merge opts) (regulator/regulator) (component/start))))

We can refer to tests as well by changing the :mode to :docs (by default it is :source):

[[:reference {:refer hydrox.meta.util/append-node :mode :docs}]]
e.3.17  -  docs of hydrox.meta.util/append-node
"Adds node as well as whitespace and newline on right" (-> (z/of-string "(+)") (z/down) (append-node 2) (append-node 1) (z/->root-string)) => "(+\n 1\n 2)"

3.4.6    code

The most generic way of displaying code is with the :code tag. It is useful when code in other languages are required to be in the documentation.

3.4.6.1    Python Example

The source and outputs are listed below:

e.3.18  -  Python for Loop Source
[[:code {:lang "python" :title "Python for Loop" :tag "c-py-1"} " myList = [1,2,3,4] for index in range(len(myList)): myList[index] += 1 print myList"]]
e.3.19  -  Python for Loop
myList = [1,2,3,4] for index in range(len(myList)): myList[index] += 1 print myList

3.4.6.2    Ruby Example

The source and outputs are listed below:

e.3.20  -  Ruby for Loop Source
[[:code {:lang "ruby" :title "Ruby for Loop" :tag "c-rb-2"} " array.each_with_index do |element,index| element.do_stuff(index) end"]]
e.3.21  -  Ruby for Loop
array.each_with_index do |element,index| element.do_stuff(index) end

4    A Bug's Life

4.1    Version One

A new clojure project is created.

> lein new fami > cd fami > lein repl

A very useful function add-5 has been defined:

e.4.1  -  src/fami/operations.clj
(ns fami.operations) (defn add-5 "add-5 is a function that takes any number of arguments and adds 5 to the sum" [& ns] (apply + 5 ns))

And the corresponding tests specified:

e.4.2  -  src/fami/test-operations.clj
(ns fami.test-operations (:require [fami.operations :refer :all] [midje.sweet :refer :all])) (fact "add-5 should increment any list of numbers by 5" (add-5 5) => 10 (add-5 1 2 3 4) => 15)

There are additional entries for the operation in the readme as well as also being scattered around in various other documents.

e.4.3  -  readme.md, operations.md
... Here are some of the use cases for add-5 (add-5 5) ;; => 10 (add-5 1 2 3 4) ;; => 15 ...

This version of this library has been released as version 1.0

4.2    Version Two

The library is super successful with many users. The code undergoes refactoring and it is decided that the original add-5 is too powerful and so it must be muted to only accept one argument. An additional function add-5-multi is used to make explicit that the function is taking multiple arguments.

e.4.4  -  src/fami/operations.clj - v2
(ns fami.operations) (defn add-5 [n] ;; The muted version "add-5 is a function that takes a number and adds 5 to it" (+ n 5)) (defn add-5-multi "add-5-multi is a function that takes any number of arguments and adds 5 to the sum" [& ns] (apply + 5 ns))

The tests throw an exception but are quickly fixed:

e.4.5  -  src/fami/test-operations.clj
(ns fami.test-operations (:require [fami.operations :refer :all] [midje.sweet :refer :all])) (fact "add-5 should increment only one input by 5" (add-5 5) => 10 (add-5 1 2) => (throws clojure.lang.ArityException)) (fact "add-5-multi should increment any list of numbers by 5" (add-5-multi 1 2 3 4) => 15)

This version of this library has been released as version 2.0

4.3    The Bug Surfaces

Although the tests are correct, the documentation is not. Anyone using this library can potentially have the clojure.lang.ArityException bug if they carefully followed instructions in the documentation.

This is a trival example of a much greater problem. When a project begins to evolve and codebase begins to change, the documentation then becomes incorrect. Although source and test code can be isolated through testing, fixing documentation is a miserable and futile exercise of cut and paste. With no real tool to check whether code is still valid, the documentation become less and less correct until all the examples have to be rechecked and the documention rewritten.

Then the codebase changes again ...

Once the library has been release to the world and people have already started using it, there is no taking it back. Bugs propagate through miscommunication. Miscommunication with machines can usually be contained and fixed. Miscommunication with people becomes potentially more difficult to contain.

This simple scenario highlights the problem that developers have when updating documentation. When we cannot prove that our documentation is correct, then the cost of fixing out of date documentation becomes bigger and bigger as time progresses. With hydrox this process just became much more cost effective.