3.5.6. Partitioned Unfolds

Partitioned unfold

Operators

Output family length

\(c = \punfold{p}{\textit{gen}}{\textit{label}}\ d\)

\(p: N \rightarrow \bool\)

\(|c| \ge |d|\)

\(\textit{gen}: N \rightarrow N \times C\)

\(\textit{label}: \one \rightarrow L\)

As discussed in Section 2.5, the opposite of a fold is an unfold, where a family of objects is generated from a single object. The example given in Section 2.6.1 is \(\text{iota}\), which generates a sequence of contiguous integers given one input number:

\[c = [1,\ 2,\ 3,\ \dots,\ n] = \text{iota}\ n = \unfold{greater\_than\_zero}{decrement}\ n\]

where \(\text{iota}\) has been expressed in terms of an unfold HOF that receives the predicate \(greater\_than\_zero\) and a generator called \(decrement\).

The unfold operation is repeatedly called until the predicate returns false, whereby it emits an empty list \([\ ]\):

\begin{align*} c &= \unfold{greater\_than\_zero}{decrement}\ n \\ &= \unfold{greater\_than\_zero}{decrement}\ n-1\ \boldsymbol{+}\ [n] \\ &\quad\quad \vdots \\ &= \unfold{greater\_than\_zero}{decrement}\ 0\ \boldsymbol{+}\ [1, 2, \dots, n-1, n] \\ &=[\ ]\ \boldsymbol{+}\ [1, 2, \dots, n-1, n] \end{align*}

where \(\boldsymbol{+}\) in this example denotes an operator that concatenates two lists.

Heuristically, this can be thought of as executing the function:

def unfold(predicate, generator, n):
    result = []
    next_value = n
    while predicate(next_value):
        # generator returns a new value for next_value
        next_value, family_element = generator(next_value)
        result.prepend(family_element)
    return result

where the user supplies the predicate (\(p\)) and generator (\(\textit{gen}\)) algorithms.

Phlex expands the concept of an unfold by allowing it to operate on a family of data products corresponding to a set partition [DUNE 33]. This partitioned unfold is shown in Fig. 3.1, where the \(\textit{unfold(into\_apas)}\) node transforms a flat family of \(\textit{SimDepos}\) data products (each of which belong to a cell within the Spill partition) into a family of families, with each nested family containing the \(\textit{Waveforms}\) data products for all APAs within a given Spill.

Unfolding in this way can be used for parallelizing the processing of a data product in smaller chunks. Breaking up the processing of a data product can also be an important ingredient in controlling the memory use of a Phlex program.

Note

Phlex requires the use of the \(\textit{label}\) operator in unfolds to avoid collisions with already-existing data products and to reflect the more granular data-processing that occurs as a result of the unfold.

3.5.6.1. Next Type

The signatures for the operators \(p\) and \(\textit{gen}\) have the curious type \(N\), which seems unrelated to the input family \(d\), whose elements are of type \(D\), or the output family \(c\), whose elements are of type \(C\). The type \(N\) refers to the type of the next value on which the unfold operates. In the \(\text{iota}\) example above, the type \(N\) is the same as the input argument \(n\), which is an integer, and it is the same as that of the output family elements, which are also integers.

The unfold in Fig. 3.1, however, demonstrates an example where \(N\) is equal to neither \(D\) nor \(C\). Whereas the input type \(D\) corresponds to the \(\textit{SimDepos}\) data product in each Spill, the output type \(C\) represents the \(\textit{Waveforms}\) data products produced for each APA. Assuming \(\textit{SimDepos}\) is represented as a std::vector<SimDepo> object, a reasonable type for \(N\) might be std::vector<SimDepo>::const_iterator, thus permitting the comparison of iterators in the predicate \(p\) and using it in the generator \(\textit{gen}\) for processing portions of the initial data product. The generator would thus return a pair with an advanced iterator and a \(\textit{Waveforms}\) object corresponding to one APA.

The choice of the next type \(N\) thus depends on the use case and is not prescribed by Phlex.

3.5.6.2. Operator Signatures

Operator

Allowed signature

\(p\)

bool function_name(next_type) [quals];

\(\textit{gen}\)

std::pair<next_type, product_type> function_name(next_type, Rm...) [quals];

\(\textit{label}\)

Name of data layer of output data products

The unfold’s product_type must model the created data-product type described in Section 3.3.2. An unfold’s \(\textit{gen}\) algorithm may also create multiple data products by returning an object of type std::tuple<next_type, T1, ..., Tn>, where each of the types T1, ..., Tn models a created data-product type.

3.5.6.3. Registration Interface

As unfolds require coordination between the predicate \(p\) and the generator \(\textit{gen}\), they are supported by implementing classes with member functions that are registered with the framework.

For the \(\textit{unfold(to\_apas)}\) node in Fig. 3.1, the C++ code for the experiment algorithm would be:

class sim_depos { ... };
class waveforms { ... };

class to_apas {
  using next_type = sim_depos::const_iterator;
  next_type advance(next_type) { ... }
  next_type end_;

public:
  explicit to_apas(sim_depos const& sds)  // Constructed with input data-product
    : end_{sds.end()}
  {}

  bool keep_going(next_type next) const { return next != end_; }

  std::pair<next_type, waveforms> make_waveforms(next_type next) const
  {
    // Create waveforms object 'ws' using 'next',
    // ... and then move into result
    return std::make_pair(advance(next), std::move(ws));
  }
};

The definition of advance(...) would advance the next iterator according to some desired chunk size, or it would return an end iterator when all elements of the "SimDepos" data product have been processed. The class is then registered with Phlex via:

PHLEX_REGISTER_ALGORITHMS(config)
{
  products("Waveforms") =
    unfold<to_apas>(
      "to_apas",                 // <= Node name for framework
      &to_apas::keep_going,      // <= Unfold predicate
      &to_apas::make_waveforms,  // <= Unfold generator
      "APA",                     // <= Data layer for output data products
      concurrency::unlimited     // <= Allowed concurrency
    )
    .family("SimDepos"_in("Spill"));
}

Note that the template argument in unfold<to_apas> is an indication that the framework will create an object of type to_apas each time it receives a "SimDepos" data product. The framework ensures that all data products remain in memory for as long as they are required, and once they are no longer needed, they (as well as any unneeded to_apas objects) are evicted from memory as soon as possible [DUNE 142].