3. Conceptual Design

Purpose

The conceptual design is not a reference manual; it is a high-level description of how the framework aims to satisfy the stakeholder requirements (see Appendix B). The audience for the conceptual design is the physicist, algorithm author, or framework program runner. More detailed design aspects in support of the conceptual model are given in the technical design (under preparation).

Phlex adopts the data-flow approach discussed in Section 2.6.1. Instead of expressing scientific workflows as monolithic functions to be executed, workflows are factorized into composable algorithms that operate on data products passed among them [DUNE 1], [DUNE 111], [DUNE 20]. These algorithms then serve as operators to higher-order functions (HOFs) that operate on data-product families.

To guide the discussion of Phlex’s conceptual model, we refer to Fig. 3.1, which shows a small fictitious workflow that creates vertices from simulated energy deposits. Various framework aspects are demonstrated by that figure:

data-flow graph

The data-flow graph is formed by ingesting the configuration file and recording the data-product dependencies required of each algorithm (see Section 3.1).

data-product flow

Data products (see Section 3.2) are passed along graph edges. As mentioned in Section 2.6.1, the data passed between HOFs are expressed as families. Fig. 3.1 thus formally passes families (e.g. \([\textit{GoodHits}_{ijk}]\)) between nodes [1].

framework driver

The driver instructs the framework what to process (see Section 3.6).

The driver in Fig. 3.1 is configured so that all Spills in the specified ROOT input files are processed.

data-product providers

Data-product providers are framework components that provide data products from external entities to downstream user algorithms (see Section 3.7). From a functional programming perspective they are transforms that map a data cell to one of the data products within that data cell.

In the workflow, one provider reads a \(\textit{SimDepos}\) data product from each Spill in the HDF5 input files, and the other reads a single \(\textit{Geometry}\) corresponding to the Job from a GDML file.

HOFs and user-provided algorithms

Arguably the most important aspect of the framework is how user-provided algorithms are bound to HOFs and registered with the framework (see Section 3.3, Section 3.4 and Section 3.5).

All seven HOFs supported by Phlex (see Table 2.2) are used in Fig. 3.1. For the main processing chain of creating vertices:

  • An unfold HOF is configured to create a family of \(\textit{Waveforms}\) objects—creating one \(\textit{Waveforms}\) object per APA—from one \(\textit{SimDeps}\) data product in each Spill.

  • A configured transform HOF is run on the family of \(\textit{Waveforms}\) objects to create a family of \(\textit{GoodHits}\) objects.

  • To make a \(\textit{GoodTracks}\) data product, a window algorithm is applied to pairs of \(\textit{GoodHits}\) objects that come from adjacent APAs.

  • Lastly, another transform algorithm operates on the \(\textit{GoodTracks}\) data products to produce vertices.

There are additional parts of the graph that are not directly related to creating vertices:

  • A fold algorithm is executed over the \(\textit{GoodHits}\) data products to sum the hit energy (i.e. \(\textit{TotalHitEnergy}\)) across all APAs for a given Spill.

  • After a filter has been applied with the predicate \(\textit{high\_energy}\), an observe algorithm is used to fill a histogram with hit-related information from the \(\textit{GoodHits}\) data products.

data-product writers

Data-product writers are plugins that write data products to framework outputs (see Section 3.8)

Each of the five writers in Fig. 3.1 is responsible for writing to one or more output files.

resources

Most workflows require access to some external resource (see Section 3.9).

The histogramming resource in Fig. 3.1 enables the observe algorithm to fill and write histograms to a ROOT analysis file.

Note that in this workflow, the names Spill and APA are not special to the Phlex framework; they are names (hypothetically) chosen by the experiment. Each data product is also indexed, thus associating it with a particular data cell (e.g. \(\textit{GoodHits}_{3,5,9}\) denotes the \(\textit{GoodHits}\) data product belonging to APA \(9\) of Spill \(5\) of Run \(3\)).

digraph {
  node [shape="box", style="rounded"]
  edge [fontcolor="red"];

  start [shape="point", width=0.1];
  unfold_into_apas [label=<unfold(<font color="blue">into_apas</font>)>];
  transform_find_hits [label=<transform(<font color="blue">find_hits</font>)>];
  filter_high_energy [label=<filter(<font color="blue">high_energy</font>)>];
  window_make_tracks [label=<window(<font color="blue">make_tracks</font>)>];
  out [label="ROOT output file(s)", shape="cylinder", style="filled", fillcolor="lightgray"];

  observe_histogram_hits [label=<observe(<font color="blue">histogram_hits</font>)>];
  transform_make_vertices [label=<transform(<font color="blue">make_vertices</font>)>];
  fold_total_energy [label=<fold(<font color="blue">sum_energy</font>)>];

  // Histogram resource
  resource [label=<Histogram<br/> resource>,
            shape=hexagon,
            style=filled,
            fillcolor=thistle,
            margin=0];
  root [label=<ROOT<br/> analysis file>, style=filled, shape=cylinder];

  gdml [label="GDML file", shape="cylinder", style="filled", fillcolor="lightgray"]
  driver [label="driver(Spill)", style="rounded,filled",fillcolor="palegreen1"];
  input [label="HDF5 input files(s)", shape="cylinder", style="filled", fillcolor="lightgray"];

  // Providers
  geometry_provider [label="provide(Geometry)", style="filled,rounded", fillcolor="lightblue"];
  sim_depos_provider [label="provide(SimDepos)", style="filled,rounded" fillcolor="lightblue"];

  // Writers
  waveforms_writer [label="write(Waveforms)", style="filled,rounded", fillcolor="lightblue"];
  total_energy_writer [label="write(TotalHitEnergy)", style="filled,rounded", fillcolor="lightblue"];
  tracks_writer [label="write(GoodTracks)", style="filled,rounded", fillcolor="lightblue"];
  vertices_writer [label="write(Vertices)", style="filled,rounded", fillcolor="lightblue"];
  hits_writer [label="write(GoodHits)", style="filled,rounded" fillcolor="lightblue"];

  start -> driver [label=" Configuration", fontcolor="forestgreen"];
  driver -> input [style="dotted", arrowhead=none];
  driver -> geometry_provider [label=" [Job]", fontcolor="darkorange"];
  driver -> sim_depos_provider [label=< [Spill<sub><i>i j</i></sub>]>, fontcolor="darkorange"];

  gdml -> geometry_provider [arrowhead=none, style="dotted", color="black:invis:black"];
  resource -> root [arrowhead=none, style="dotted", color="black:invis:black"];

  sim_depos_provider -> input [style="dotted", arrowhead=none];
  sim_depos_provider -> unfold_into_apas [label=< [SimDepos<sub><i>i j</i></sub>]>];

  geometry_provider -> unfold_into_apas [label=<[Geometry]>];
  geometry_provider -> transform_make_vertices [taillabel=<[Geometry]>, labeldistance=4.7, labelangle=20];

  unfold_into_apas -> transform_find_hits [xlabel=<[Waveforms<sub><i>i j k</i></sub>] >];
  unfold_into_apas -> waveforms_writer [label=<[Waveforms<sub><i>i j k</i></sub>]>];

  transform_find_hits -> filter_high_energy [label=<[GoodHits<sub><i>i j k</i></sub>]>];
  transform_find_hits -> hits_writer;
  transform_find_hits -> window_make_tracks [label=<[GoodHits<sub><i>i j k</i></sub>]>];
  transform_find_hits -> fold_total_energy;

  window_make_tracks -> transform_make_vertices [label=< [GoodTracks<sub><i>i j k</i></sub>]>];
  window_make_tracks -> tracks_writer;

  transform_make_vertices -> vertices_writer [label=< [Vertices<sub><i>i j k</i></sub>]>];

  fold_total_energy -> total_energy_writer [label=< [TotalHitEnergy<sub><i>i j</i></sub>]>];

  filter_high_energy -> observe_histogram_hits [label=<[GoodHits<sub><i>i j k</i> '</sub>]>];

  resource -> observe_histogram_hits [style="dashed"];

  { total_energy_writer, waveforms_writer, hits_writer, tracks_writer, vertices_writer } -> out [style="dotted", arrowhead=none]

  // Making the graph layout better
  { rank=same; driver; input; }
  { rank=same; resource; root; }
  { rank=same; gdml; geometry_provider; sim_depos_provider; }
  { rank=same; window_make_tracks; hits_writer; fold_total_energy; filter_high_energy; }
  { rank=same; transform_make_vertices; observe_histogram_hits; tracks_writer; total_energy_writer; }

  // The following edges do not denote any formal relationships; they are intended for influencing the layout.
  edge [style="invis"]
  input -> waveforms_writer;
  filter_high_energy -> resource;
  transform_find_hits -> waveforms_writer [constraint="false"];
}

Fig. 3.1 A fictitious workflow showing how HOFs are used in a Phlex program. Each unshaded node represent a HOF bound to a user-defined algorithm, whose name is shaded in blue. Each user-defined algorithm operates on arguments received from the incoming arrows to the node: data products are passed along solid arrows; objects that provide access to resources are passed along dashed arrows. Whereas single-dotted lines indicate communication of data through the framework’s IO system, double-dotted lines denote communication of data with entities not directly related to the framework. See text for workflow details.

Footnotes