3.4. Framework Registration¶
Attention
The C++ interface below is illustrative and not intended to reflect the final registration interface.
To the extent possible, Phlex preserves data flow among data products and algorithms. This is indicated in the interface for registering algorithms. In some cases, access to a limited resource is required and the algorithm signature will specify dependencies on not only the data of interest but also the shared resource.
Consider the following C++ classes and function:
class hits { ... };
class tracks { ... };
tracks make_tracks(hits const& hs) { ... }
where the implementations of hits
, tracks
, and make_tracks
are unspecified.
Suppose a physicist would like to use the function make_tracks
to transform “good_hits” to “good_tracks” for each spill with unlimited concurrency.
This can be achieved by in terms of the C++ registration stanza:
PHLEX_REGISTER_ALGORITHMS(m) // <== Registration opener (w/o configuration object)
{
products("good_tracks") = // 1. Specification of output data product from make_tracks
transform( // 2. Higher-order function
"track_maker", // 3. Name assigned to HOF
make_tracks, // 4. Algorithm/HOF operation
concurrency::unlimited // 5. Allowed CPU concurrency
)
.sequence(
"good_hits" // 6. Specification of input data product to make_tracks
_in("spill") // 7. Data category to search for input data products
);
}
The registration stanza is included in a C++ file that is compiled into a module, a compiled library that is dynamically loadable by Phlex.
A Python algorithm can be registered with its own companion C++ module or through the Python import helpers that make use of a pre-built, configurable, Phlex module. For the sake of consistency and ease of understaning, the helpers have the same naming and follow the same conventions as the C++ registration.
The stanza is introduced by an opener—e.g. PHLEX_REGISTER_ALGORITHMS()
—followed by a registration block, a block of code between two curly braces that contains one or more registration statements.
A registration statement is a programming statement that closely follows the equation described in Section 3.5 and is used to register an algorithm with the framework.
Specifically, in the registration stanza above, we have the following:
products(...)
This is the equivalent of the output sequence \(\isequence{b}{\text{output}}\), which is formed from specification(s) of the data product(s) created by the algorithm [DUNE 156].
transform(...)
Fully specifying the mathematical expression \(\text{HOF}(f_1,\ f_2,\ \dots)\) requires several items:
The HOF to be used,
The name to assign to the configured HOF,
The algorithm/HOF operator(s) to be used (i.e. \(f_1,\ f_2,\ \dots\)), and
The maximum number of CPU threads the framework can use when invoking the algorithm [DUNE 152].
sequence(...)
The specification of the input sequence \(\isequence{a}{\text{input}}\) requires:
The specification(s) of data products that serve as input sequence elements [DUNE 65].
The data category where the input data products are found.
The set of information required by the framework for registering an algorithm largely depends on the HOF being used (see the Section 3.5 for specific interface). However, in general, the registration code will specify which data products are required/produced by the algorithm [DUNE 111] and the hardware resources required by the algorithm [DUNE 9]. Note that the input and output data-product specifications are matched with the corresponding types of the registered algorithm’s function signature. In other words:
"good_hits"
specifies a data product whose C++ type is that of the first (and, in this case, only) input parameter tomake_tracks
(i.e.hits
)."good_tracks"
specifies a data product whose C++ type is thetracks
return type ofmake_tracks
.
When executed, the above code creates a configured higher-order function, which serves as a node in the function-centric data-flow graph.
The registration block may contain any code supported by C++. The block, however, must contain a registration statement to execute an algorithm.
Important
A module must contain only one registration stanza. Note that multiple registration statements may be made in each stanza.
3.4.1. Accessing Configuration Information¶
Instead of hard-coding all pieces of registration information, it is desirable to specify a subset of such information through a program’s run-time configuration.
To do this, an additional argument (e.g. config
) is passed to the registration opener:
PHLEX_REGISTER_ALGORITHMS(m, config)
{
auto selected_data_scope = config.get<std::string>("data_scope");
products("good_tracks") =
transform("track_maker", make_tracks, concurrency::unlimited)
.sequence("good_hits"_in(selected_data_scope));
}
Note
As discussed in Section 3.10.2, the registration code will have access only to the configuration relevant to the algorithm being registered, and to certain framework-level configuration such as debug level, verbosity, or parallelization options.
Except for the specification of make_tracks
as the algorithm to be invoked, and transform
as the HOF, all other pieces of information may be provided through the configuration.
3.4.2. Framework Dependence in Registration Code¶
Usually, classes like hits
and tracks
and algorithms like make_tracks
are framework-independent (see Section 1.4).
There may be scenarios, however, where dependence on framework interface is required, especially if framework-specific metadata types are used by the algorithm.
In such cases, it is strongly encouraged to keep framework dependence within the module itself and, more specifically, within the registration stanza.
This can be often achieved by registering closure objects that are generated by lambda expressions.
For example, suppose a physicist would like to create an algorithm make_tracks_debug
that reports a spill number when making tracks.
By specifying a lambda expression that takes a phlex::handle<hits>
object, the data product can be passed to the make_tracks_debug
function, along with the spill number from the metadata accessed from the handle:
tracks make_tracks_debug(hits const& hs, std::size_t spill_number) { ... }
PHLEX_REGISTER_ALGORITHMS(m)
{
products("good_tracks") =
transform(
"track_maker",
[](phlex::handle<hits> hs) {
return make_tracks_debug(*hs, hs.id()->number());
},
concurrency::unlimited
)
.sequence("good_hits"_in("spill"));
}
The lambda expression does depend on framework interface; the make_tracks_debug
function, however, retains its framework independence.
3.4.3. Member Functions of Classes¶
class track_maker {
public:
track_maker(std::size_t track_seed);
tracks make(hits const& hs) const;
...
};
PHLEX_REGISTER_ALGORITHMS(m, config)
{
auto track_seed = config.get<std::size_t>("track_seed");
auto selected_data_scope = config.get<std::string>("data_scope");
products("good_tracks") =
m.make<track_maker>(track_seed)
.transform("track_maker", &track_maker::make, concurrency::unlimited)
.sequence("good_hits"_in(selected_data_scope));
}
3.4.4. Overloaded Functions¶
Phlex performs a substantial amount of type deduction through the transform(...)
clause.
This works well except in cases where the registered algorithms are overloaded functions.
For example, suppose one wants to register C++’s overloaded std::sqrt(...)
function with the framework.
Simply specifying transform(..., std::sqrt)
will fail at compile time as the compiler will not be able to determine which overload is desired.
Instead, the code author can use the following [1]:
transform(..., [](double x){ return std::sqrt(x); }, ...);
where the desired overload is selected based on the double
argument to the lambda expression.
Footnotes