2.4. Functional Programming¶

Functional programming is a paradigm that favors the use of functions instead of the direct manipulation of stateful objects. The processing of data happens by using chained operations, where the output of one function serves as the input to other functions.

For example, given two functions:

\[ \begin{align}\begin{aligned}f: \mbox{Wires} \rightarrow \mbox{Hits}\\g: \mbox{Hits} \rightarrow \mbox{Tracks}\end{aligned}\end{align} \]

a composite function \(h: \mbox{Wires} \rightarrow \mbox{Tracks}\) can be constructed such that:

\[ts = h\ ws = g\ (f\ ws) = (g \comp f)\ ws\]

or \(h = g \comp f\), where \(ws \in \mbox{Wires}\) and \(ts \in \mbox{Tracks}\).

In reality, the creation of tracks from wire signals is much more complicated [1]. However, as seen above, functional programming permits a mathematical description of the data-processing to be performed. Expressing the processing needs according to mathematics enables:

the use of mathematical rules to optimize the processing of the data,
naturally reproducible results, assuming the functions are pure (see Section 2.4.1),
parallel invocations of pure functions with no possibility of data races [DUNE 130].

2.4.1. Pure Functions¶

According to Wikipedia [Wiki-Pure], a pure function has the following properties:

the function return values are identical for identical arguments, and
the function has no side effects.

Phlex therefore encourages the use of pure functions for creating of reproducible data products, a principle of the framework philosophy as discussed in Section 1.2.

Favor free functions

Functions can additionally be classified as free functions or member functions (or methods). Whereas a free function has only explicit input parameters, a member function called on an object has access to the internal state of the object as well as the explicit function parameters. Both kinds of functions can be useful, but authors of classes must exercise special care to ensure that a class instance’s member functions can be safely invoked in concurrent contexts. For this reason, framework users should favor free functions over classes and their member functions.

2.4.2. Challenges with Functional Programming¶

One drawback to functional programming is that it differs from what many in the HEP community are accustomed to when writing their own physics algorithms. Commonly used third-party libraries and computing languages can also make functional programming difficult to use in practice. We argue, though, that physicists often think in terms of functional programming when developing the high-level processing steps of a workflow. It is not until those processing steps need to be implemented that the functional steps are often translated into procedural ones.

Phlex aims to restore the functional programming approach as the natural way of expressing the data-processing to be performed. By leveraging commonly used processing patterns (see Section 2.5 on higher-order functions), we can mitigate any awkwardness due to initial unfamiliarity with functional programming paradigms. The framework also does not place constraints on the algorithm implementations—algorithm authors are free to employ imperative programming techniques within the implementations if doing so is convenient. The framework will simply schedule the algorithm as if it were a pure function without regard to its implementation.

2.5. Families of Data and Higher-Order Functions¶

Particle physics results are obtained by performing statistical analysis on families of data. Such analysis typically involves repeated invocations of the same kind of operation. For example, a relatively simple result is calculating the arithmetic mean of \(n\) numbers:

\[\overline{c} = \frac{1}{n}\sum_{i \in \mathcal{I}} c_i\]

where the sum is over the numbers \(\family{c}\), and \(n\) is the cardinality of the index set \(\mathcal{I}\).

The sum is an example of a data reduction or fold, where a family is collapsed into one result. In particular, the arithmetic mean above can be expressed as:

\[\overline{c} = \frac{1}{n}\ \fold{+}{0}\ \family{c}\]

where the fold accepts a binary operator (\(+\) in this case) that is repeatedly applied to an accumulated value (initialized to 0) and each element of the family.

The fold is an example of a higher-order function (HOF) [Wiki-HOF] that accepts a family and an operator applied in some way to elements of that family. However, additional HOFs exist—for example, suppose the family \(\fami{c}\) was created by applying a function \(w: E \rightarrow C\) to each element of a family \(\fami{e}\). Such a HOF is called a map or transform:

\[\fami{c} = \fami{w\ e} = \transform{w}\ \fami{e}\]

In such a scenario, the average \(\overline{c}\) could be expressed as:

\[\overline{c} = \frac{1}{n}\ \fold{+}{0}\ \transform{w}\ \fami{e} = \frac{1}{n}\ \fold{+ \comp w}{0}\ \fami{e}\]

The second equality holds by the fold-map fusion law [Bird], which states that the application of a \(\text{transform}\) followed by a \(\text{fold}\) can be reduced to a single \(\text{fold}\). The operator to this single fold is ‘\(+ \comp w\)’, indicating that the function \(w\) should be applied first before invoking the \(+\) operation. Relying on such mathematical laws permits the replacement of chained calculations with a single calculation, often leading to efficiency improvements without affecting the result.

Table 2.2 Higher-order functions supported by Phlex. Each family is represented by a single variable (e.g. \(a\)). Details of each HOF and its operators are in Section 3.5.¶
Higher-order function		Operator(s)	Output family length
Transform	\(b = \transform{f}\ a\)	\(f: A \rightarrow B\)	\(\|b\| = \|a\|\)
Predicate	\(\tilde{b} = \predicate{f}\ a\)	\(f: A \rightarrow \bool\)	\(\|\tilde{b}\| = \|a\|\)
Filter	\(a' = \filter{\phi}\ a\)	\(\phi: \bool^n \rightarrow \bool\)	\(\|a'\| \le \|a\|\)
Observer	\([\ \ ] = \observe{f}\ a\)	\(f: A \rightarrow \one\)	\(0\)
Fold	\(d = \pfold{f}{init}{part}\ c\)	\(f: D \times C \rightarrow D\)	\(\|d\| \le \|c\|\)
		\(init: \opt{\iset{d}} \rightarrow D\)
		\(part: \{\iset{c}\} \rightarrow \mathbb{P}(\iset{c})\)
Unfold	\(c = \punfold{p}{gen}{label}\ d\)	\(p: N \rightarrow \bool\)	\(\|c\| \ge \|d\|\)
		\(gen: N \rightarrow N \times C\)
		\(label: \one \rightarrow L\)
Window	\(y = \window{f}{adj}{label}\ x\)	\(f: X \times \opt{X} \rightarrow Y\)	\(\|y\| = \|x\|\)
		\(adj: \iset{x} \times \iset{x} \rightarrow \bool\)
		\(label: \one \rightarrow L\)

A calculation using a HOF is then generally expressed in terms of:

The HOF to be used (\(\text{fold}\), \(\text{transform}\), etc.)
The operator(s) to be used by each HOF (\(+\), \(w\), etc.)
The family (or families) of data on which the HOFs are to be applied.

Phlex supports the HOFs listed in Table 2.2. As discussed later, each HOF’s operator is an algorithm registered with the framework. Phlex will likely support other higher order functions as well.

Footnotes

References

[Wiki-Pure]

https://en.wikipedia.org/wiki/Pure_function

[Wiki-HOF]

https://en.wikipedia.org/wiki/Higher-order_function

[Bird]

Bird, Introduction to Functional Programming using Haskell (2nd ed.), Prentice Hall (1988), pp. 131–132

2.4. Functional Programming¶

2.4.1. Pure Functions¶

2.4.2. Challenges with Functional Programming¶

2.5. Families of Data and Higher-Order Functions¶

Table of Contents

Previous topic

Next topic

This Page