3.5.4. Partitioned Folds

Partitioned fold

Operators

Output sequence length

\(d = \pfold{f}{\textit{init}}{\textit{part}}\ c\)

\(f: D \times C \rightarrow D\)

\(|d| \le |c|\)

\(\textit{init}: \one \rightarrow D\)

\(\textit{part}: \{\iset{c}\} \rightarrow \mathbb{P}(\iset{c})\)

As mentioned in Section 2.5, a fold can be defined as a transformation of a sequence of data to a single value:

\[d = \fold{f}{\textit{init}}\ \isequence{c}{c}\]

where the user-defined operation \(f\) is applied repeatedly between an accumulated value (initialized by \(init\)) and each element of the input sequence.

In a framework context, however, multiple fold results are often desired in the same program for the same kind of computation. For example, consider a program that processes \(n\) runs, each of which contains spills, identified by the tuple \((R\ i, S\ j)\). The user may wish to create one histogram per run that contains the track multiplicity per spill. Instead of creating a single fold result, we thus use a partitioned fold:

\begin{align*} [h_{(R\ 1)}&,\ \dots,\ h_{(R\ n)}] \\ &= \pfold{\textit{fill}}{\textit{init}}{\textit{into\_runs}}\ [m_{(R\ 1, S\ 1)},\ m_{(R\ 1, S\ 2)},\ \dots,\ m_{(R\ n, S\ 1)},\ m_{(R\ n, S\ 2)},\ \dots] \end{align*}

where \(h_{(R\ i)}\) denotes the histogram for run \(i\), and \(m_{(R\ i,\ S\ j)}\) is the track multiplicity for spill \(j\) in run \(i\).

The above equation can be expressed more succinctly as:

\[[h_j]_{j \in \iset{\text{out}}} = \pfold{\textit{fill}}{\textit{init}}{\textit{into\_runs}}\ [m_i]_{i \in \iset{\text{in}}}\]

where

\begin{align*} \iset{\text{in}} &= \{(R\ 1,\ S\ 1),\ (R\ 1,\ S\ 2),\ \dots,\ (R\ n,\ S\ 1),\ (R\ n,\ S\ 2), \dots\}, \text{and}\\ \iset{\text{out}} &= \{(R\ 1),\ \dots, (R\ n)\}\ . \end{align*}

3.5.4.1. Partitions

Factorizing a set of data into non-overlapping subsets that collectively span the entire set is called creating a set partition. [Wiki-partition] Each subset of the partition is called a cell. In the above example, the role of the \(\textit{into\_runs}\) operation is to partition the input sequence into runs so that there is one fold result per run. In general, however, the partitioning function is of the form \(\textit{part}: \{\iset{c}\} \rightarrow \mathbb{P}(\iset{c})\), where:

  • the domain is the singleton set that contains only the index set \(\iset{c}\) (i.e. \(\textit{part}\) can only be invoked on \(\iset{c}\)), and

  • the codomain is the set of partitions of the index set \(\iset{c}\).

The function \(part\) also establishes an equivalence relationship on the index set \(\iset{c}\), where each element of the index set is mapped to a cell of the partition. The number of elements in the output sequence \(d\) corresponds to the number of partition cells.

3.5.4.2. Initializing the Accumulator

Todo

Change the domain type of \(\textit{init}\).

A crucial ingredient of the fold is the accumulator, which stores the fold result while it is being formed. Each accumulator is initialized by invoking a user-defined operation \(\textit{init}: \one \rightarrow D\), which returns an object that has the same type \(D\) as the fold result. [1] Instead of invoking a function, an accumulator is often initialized with a value. However, in functional programming, a value can be represented by invoking a function that always returns the same result. Expressing an initializer as a function thus supports value-initialization while retaining the flexibility that may occasionally be required through functions.

3.5.4.3. Fold Operation

A cell’s fold result is obtained by repeatedly applying a fold operation to the cell’s accumulator and each element of that cell’s input sequence. The fold operation has the signature \(f: D \times C \rightarrow D\), where \(D\) represents the type of the accumulator/fold result, and \(C\) is the type of each element of the input sequence.

In the above example, the function \(\textit{fill}\) receives a histogram \(h_{(R\ i)}\) as the accumulator for run \(i\) and “combines” it with a track multiplicity object \(m_{(R\ i,\ S\ j)}\) that belongs to spill \(j\) in run \(i\). This “combined” value is then returned by \(\textit{fill}\) as the updated value of the accumulator. The function \(\textit{fill}\) is repeatedly invoked to update the accumulator with each track multiplicity value. Once all track multiplcity values in run \(i\) have been processed by \(\textit{fill}\), the accumulator’s value becomes the fold result for that run.

3.5.4.4. Registration interface

Result type: A fold algorithm may create multiple data products through its result by specifying an std::tuple<T1, ..., Tn> where each of the types T1, ..., Tn models a data-product created type.

Operator

Allowed signature

\(f\)

void function_name(result_type&, P1, Pn..., Rm...) [qualifiers];

\(\textit{init}\)

result_type{...}

\(\textit{part}\)

Name of data-set category

Footnotes

References