Data Flow ========= In :numref:`ch_preliminaries/functional_programming:Functional programming`, the example was given for creating tracks from wires. The simplified expression for doing this was the chained application of two functions :math:`f` and :math:`g` such that: .. math:: ts = g\ (f\ ws) = (g \comp f)\ ws where :math:`ws` and :math:`ts` are the wires and tracks, respectively. An alternative representation to the equational form is a *directed graph* .. graphviz:: functional-programming-data-flow.gv :align: center where the functions :math:`f` and :math:`g` are vertices or *nodes* and the data :math:`ws` and :math:`ts` are passed along arrows or *edges* that connect the nodes [#f1]_. The arrows indicate the flow of the data (or *data flow*) through the graph. Some observations: - The function :math:`f` returns hits that are unlabeled in the equational form. For the graph above, we explicitly label those hits as :math:`hs` to emphasize that data are always passed along edges. If, however, the hits are not needed by any other function in the graph, the two functions :math:`f` and :math:`g` can be replaced by their composition :math:`h=g \circ f` .. graphviz:: functional-programming-data-flow-optimized.gv :align: center resulting in potential performance improvements in computational efficiency and program memory usage. - A directed graph can have no unterminated edges---i.e. an edge must have a source (the tail of the arrow) and a target (the head of the arrow). This means that whereas the equation does not need to specify where the :math:`ws` wires originate (or where the :math:`ts` tracks end up), such locations must be specified in the graph. Any node that has only outgoing edges is called a *source* (denoted by a solid dot), and any node that has only incoming edges is a *sink* (denoted by an encircled solid dot). Data Flow with Sequences ------------------------ As mentioned earlier in :numref:`ch_preliminaries/functional_programming:Sequences of data and higher-order functions`, processing sequences of data is a critical aspect of obtaining physics results. The data-flow discussion in the previous section naturally maps to applying the functions :math:`f` and :math:`g` to elements of sequences. Specifically [#f2]_: - Instead of the individual objects :math:`ws`, :math:`hs`, and :math:`ts` being passed along the edges, the sequences :math:`[ws_i]`, :math:`[hs_i]`, and :math:`[ts_i]` are passed. - The functions :math:`f` and :math:`g` map to :math:`\transform{f}` and :math:`\transform{g}`, respectively. .. graphviz:: functional-programming-data-flow-hof.gv :align: center The above graph does not specify an implementation---assuming :math:`f` and :math:`g` are pure functions (see :numref:`ch_preliminaries/functional_programming:Pure functions`), the same result is obtained if (a) full sequences are passed between the nodes, or (b) one element per sequence is passed at a time. Whether option (a) or (b) is chosen as the processing implementation depends on the data and the overall constraints on the program. .. todo:: Determine whether the choice between (a) and (b) can be decided in user code, by user configuration, or by something else. One benefit to using a graph representation using data sequences and higher-order functions is the ability to easily express folds and unfolds. As an illustration, consider the calculation of a sum of squares for all integers from 1 through :math:`n`: .. math:: S(n) = \sum_{i=1}^n i^2 This calculation requires three separate steps: 1. an unfold called *iota* that, given an integer :math:`n`, generates a sequence of integers from 1 through :math:`n`, 2. a transformation that squares each integer in the sequence using an algorithm called *square* , and 3. a fold called *sum* that sums all squared integers. The data-flow graph of individual objects looks like: .. graphviz:: unfold-transform-fold.gv :align: center where each number is passed along its own edge to the nodes performing the calculation. The summation formula for :math:`S(n)`, however, can be expressed in terms of higher-order functions that reflect the three steps above: .. math:: S(n) = \sum_{i=1}^n i^2 = \underbrace{\fold{+}{0}}_{3.\ \textcolor{blue}{\textit{sum}}}\ \underbrace{\transform{\textcolor{blue}{\textit{square}}}}_{2.}\ \underbrace{\unfold{\ge 0}{\textit{decrement}}}_{1.\ \textcolor{blue}{\textit{iota}}}\ n Note that the unfold :math:`\textit{iota}` takes two operators: the predicate :math:`\ge 0`, which tests whether the next generated number is greater than or equal to zero, and the :math:`\textit{decrement}` operator, which decrements the next generated number by 1. [#f3]_ Once the predicate returns false, the unfold stops generating numbers. By adopting the HOF representation of the sum-of-squares problem, the data-flow graph is simplified: .. graphviz:: unfold-transform-fold-hof.gv :align: center In addition, the topology of the sequence-based graph remains the same regardless of the value of :math:`n`; the topology of the object-based graph becomes intractable as :math:`n` increases. The vast majority of graphs included in this design document adopt the sequence-based data-flow representation. .. rubric:: Footnotes .. [#f1] Treating functions as nodes and data as edges is a function-centric view of the graph. It is also possible to invert the view of the graph such that the data are treated as nodes and the functions as edges; this is the data-centric graph, which is the *line digraph* of the function-centric graph. .. [#f2] In Haskell (and, similarly, in category theory), this transformation is achieved through the `List/fmap` functor. .. [#f3] The same data-flow can equivalently be represented with an unfold that uses the predicate :math:`\le n` and the operator :math:`\textit{increment}`, but starts with an initial value of 0. This alternative representation, however, requires an unfold that depends on the value of :math:`n`.