3.4. Framework Registration

Attention

The C++ interface below is illustrative and not intended to reflect the final registration interface.

To the extent possible, Phlex preserves data flow among data products and algorithms. This is indicated in the interface for registering algorithms. In some cases, access to a limited resource is required and the algorithm signature will specify dependencies on not only the data of interest but also the shared resource.

Consider the following C++ classes and function:

class hits { ... };
class tracks { ... };

tracks make_tracks(hits const& hs) { ... }

where the implementations of hits, tracks, and make_tracks are unspecified. Suppose a physicist would like to use the function make_tracks to transform “good_hits” to “good_tracks” for each spill with unlimited concurrency. This can be achieved by in terms of the C++ registration stanza:

PHLEX_REGISTER_ALGORITHMS(m)  // <== Registration opener (w/o configuration object)
{
  products("good_tracks") =   // 1. Specification of output data product from make_tracks
    transform(                // 2. Higher-order function
      "track_maker",          // 3. Name assigned to HOF
      make_tracks,            // 4. Algorithm/HOF operation
      concurrency::unlimited  // 5. Allowed CPU concurrency
    )
    .sequence(
      "good_hits"             // 6. Specification of input data product to make_tracks
      _in("spill")            // 7. Data category to search for input data products
    );
}

The registration stanza is included in a C++ file that is compiled into a module, a compiled library that is dynamically loadable by Phlex.

A Python algorithm can be registered with its own companion C++ module or through the Python import helpers that make use of a pre-built, configurable, Phlex module. For the sake of consistency and ease of understaning, the helpers have the same naming and follow the same conventions as the C++ registration.

The stanza is introduced by an opener—e.g. PHLEX_REGISTER_ALGORITHMS()—followed by a registration block, a block of code between two curly braces that contains one or more registration statements. A registration statement is a programming statement that closely follows the equation described in Section 3.5 and is used to register an algorithm with the framework.

\[\isequence{b}{\text{output}} = \text{HOF}(f_1,\ f_2,\ \dots)\ \isequence{a}{\text{input}}\]

Specifically, in the registration stanza above, we have the following:

products(...)
  1. This is the equivalent of the output sequence \(\isequence{b}{\text{output}}\), which is formed from specification(s) of the data product(s) created by the algorithm [DUNE 156].

transform(...)

Fully specifying the mathematical expression \(\text{HOF}(f_1,\ f_2,\ \dots)\) requires several items:

  1. The HOF to be used,

  2. The name to assign to the configured HOF,

  3. The algorithm/HOF operator(s) to be used (i.e. \(f_1,\ f_2,\ \dots\)), and

  4. The maximum number of CPU threads the framework can use when invoking the algorithm [DUNE 152].

sequence(...)

The specification of the input sequence \(\isequence{a}{\text{input}}\) requires:

  1. The specification(s) of data products that serve as input sequence elements [DUNE 65].

  2. The data category where the input data products are found.

The set of information required by the framework for registering an algorithm largely depends on the HOF being used (see the Section 3.5 for specific interface). However, in general, the registration code will specify which data products are required/produced by the algorithm [DUNE 111] and the hardware resources required by the algorithm [DUNE 9]. Note that the input and output data-product specifications are matched with the corresponding types of the registered algorithm’s function signature. In other words:

  • "good_hits" specifies a data product whose C++ type is that of the first (and, in this case, only) input parameter to make_tracks (i.e. hits).

  • "good_tracks" specifies a data product whose C++ type is the tracks return type of make_tracks.

When executed, the above code creates a configured higher-order function, which serves as a node in the function-centric data-flow graph.

The registration block may contain any code supported by C++. The block, however, must contain a registration statement to execute an algorithm.

Important

A module must contain only one registration stanza. Note that multiple registration statements may be made in each stanza.

3.4.1. Accessing Configuration Information

Instead of hard-coding all pieces of registration information, it is desirable to specify a subset of such information through a program’s run-time configuration. To do this, an additional argument (e.g. config) is passed to the registration opener:

PHLEX_REGISTER_ALGORITHMS(m, config)
{
  auto selected_data_scope = config.get<std::string>("data_scope");

  products("good_tracks") =
    transform("track_maker", make_tracks, concurrency::unlimited)
    .sequence("good_hits"_in(selected_data_scope));
}

Note

As discussed in Section 3.10.2, the registration code will have access only to the configuration relevant to the algorithm being registered, and to certain framework-level configuration such as debug level, verbosity, or parallelization options.

Except for the specification of make_tracks as the algorithm to be invoked, and transform as the HOF, all other pieces of information may be provided through the configuration.

3.4.2. Framework Dependence in Registration Code

Usually, classes like hits and tracks and algorithms like make_tracks are framework-independent (see Section 1.4). There may be scenarios, however, where dependence on framework interface is required, especially if framework-specific metadata types are used by the algorithm. In such cases, it is strongly encouraged to keep framework dependence within the module itself and, more specifically, within the registration stanza. This can be often achieved by registering closure objects that are generated by lambda expressions.

For example, suppose a physicist would like to create an algorithm make_tracks_debug that reports a spill number when making tracks. By specifying a lambda expression that takes a phlex::handle<hits> object, the data product can be passed to the make_tracks_debug function, along with the spill number from the metadata accessed from the handle:

tracks make_tracks_debug(hits const& hs, std::size_t spill_number) { ... }

PHLEX_REGISTER_ALGORITHMS(m)
{
  products("good_tracks") =
    transform(
      "track_maker",
      [](phlex::handle<hits> hs) {
        return make_tracks_debug(*hs, hs.id()->number());
      },
      concurrency::unlimited
    )
    .sequence("good_hits"_in("spill"));
}

The lambda expression does depend on framework interface; the make_tracks_debug function, however, retains its framework independence.

3.4.3. Member Functions of Classes

class track_maker {
public:
  track_maker(std::size_t track_seed);
  tracks make(hits const& hs) const;
  ...
};

PHLEX_REGISTER_ALGORITHMS(m, config)
{
  auto track_seed = config.get<std::size_t>("track_seed");
  auto selected_data_scope = config.get<std::string>("data_scope");

  products("good_tracks") =
    m.make<track_maker>(track_seed)
      .transform("track_maker", &track_maker::make, concurrency::unlimited)
      .sequence("good_hits"_in(selected_data_scope));
}

3.4.4. Overloaded Functions

Phlex performs a substantial amount of type deduction through the transform(...) clause. This works well except in cases where the registered algorithms are overloaded functions. For example, suppose one wants to register C++’s overloaded std::sqrt(...) function with the framework. Simply specifying transform(..., std::sqrt) will fail at compile time as the compiler will not be able to determine which overload is desired.

Instead, the code author can use the following [1]:

transform(..., [](double x){ return std::sqrt(x); }, ...);

where the desired overload is selected based on the double argument to the lambda expression.

Footnotes