B. Framework Requirements¶
Requirements norm: Baseline 1 (created March 03, 2025)
Important
All stakeholder requirements approved by DUNE are listed here for convenience. However, the requirements recorded in the DUNE framework Jama Connect project are authoritative and take precedence over any unintentional variations below.
B.1. Conceptual Requirements¶
The framework shall allow the execution of multiple algorithms. |
See Chapter 3
The framework shall separate the persistent representation of data products from their in-memory representations as seen by algorithms. |
See Section 3.2.1.1
The framework shall run on widely-used scientific computing systems in order to fully utilize DUNE computing resources. |
See Section 1.2.2
The framework shall provide an API that allows users to express hardware requirements of the algorithms. |
See Section 3.4
The framework shall support running algorithms that require a GPU. |
See Section 1.2.2
The framework shall support the invocation of algorithms written in multiple programming languages. |
See Section 1.3
The framework shall provide user-accessible persistence of user-defined metadata. |
See Section 3.8
The framework shall provide the ability to read a framework-produced file as input to a subsequent framework job so that the physics data are equivalent to the physics data obtained from a single execution of the combined job. |
See Section 3.8
The framework shall present data produced by an already executed algorithm to each subsequent, requesting algorithm. |
See Section 3.1, Section 3.2.1
The framework shall support the creation of data sets composed of data products derived from data originating from disparate input sources. |
See Section 3.6.
The framework shall support flexibly defined, context-aware processing units to address the varying granularity necessary for processing different kinds of data. |
See Section 1.1, Section 1.2.1
The framework shall support processing of collections that are too large to fit into memory at one time. |
See Section 3.2.2
The framework shall record metadata to output enabling the reproduction of the processing steps used to produce the data recorded in that output. |
See Section 3.8
The framework shall allow the unfolding of data products into a sequence of finer-grained data products. |
See Section 3.5.6
The framework shall support access to external data sources. |
The framework shall provide a facility to produce random numbers enabling algorithms to create reproducible data in concurrent contexts. |
The framework shall support algorithms that provide data from calibration databases. |
See Section 3.2, Section 3.6, Section 3.9.
The framework shall support the registration of algorithms that are independent of framework interface. |
The framework shall safely execute user algorithms declared to be non-thread-safe along with those declared to be thread-safe. |
See Section 3.9.1
The framework shall enable the specification of resources required by the program. |
See Section 3.9
The framework shall dynamically schedule algorithms to execute efficiently according to the availability of each algorithm’s required resources. |
See Section 3.9
The framework shall enable the specification of resources required by each algorithm. |
See Section 3.9
The framework shall support composable workflows that use GPU algorithms along with CPU algorithms. |
See Section 3.9.2
The framework shall support the specification of data products required as input by an algorithm. |
See Section 3.1, Section 3.4
The framework shall accept exactly one configuration per program execution. |
See Section 3.10
The framework shall provide the ability to configure the execution of a framework program at runtime using a human-readable language. |
See Section 3.10
The framework shall provide a public API that enables the implementation of a concrete IO backend for a specific persistent storage format. |
See Section 3.7, Section 3.8
The framework IO subsystem shall support backward compatibility across versions, subject to policy decisions on deprecation provided by DUNE. |
See Section 3.7
The framework shall support the invocation of algorithms written in C++. |
See Section 1.3
The framework shall support the invocation of algorithms written in Python. |
See Section 1.3
The framework shall provide the ability for user-level code to define data products. |
The framework shall provide the ability for user-level code to create new data sets. |
See Section 3.2, Section 3.2.2
The framework shall provide the ability for user-level code to define data families. |
See Section 3.2, Section 3.2.2
The framework shall provide the ability for user-level code to define hierarchies of data families. |
See Section 3.2, Section 3.2.2
The framework shall allow a single invocation of an algorithm with data products from multiple data sets. |
See Section 3.4.1
The framework shall support the user specification of which data family to place the data products created by an algorithm. |
See Section 3.4
The framework shall support the invocation of an algorithm with data products belonging to adjacent data sets. |
See Section 3.4.1.2
The framework shall support user code that defines adjacency of data sets within a data family. |
See Section 3.4.1.2
The data objects exchanged among algorithms shall be separable from those algorithms. |
See Section 3.9.3
The framework shall mediate communication between algorithms via data products. |
See Section 3.2.1
The framework shall allow a single invocation of an algorithm with data products from multiple data families. |
See Section 3.4.1.1
The framework shall enable users to discover the provenance of data products. |
The framework shall support the reproduction of data products from the provenance stored in the output. |
See Section 3.2.4.
The framework shall facilitate the development of thread-safe algorithms. |
See Section 2.4, Section 3.2.3
The framework shall support executing programs configured by composing configurations of separate components. |
See the technical design (under preparation)
The framework shall attempt a graceful shutdown by default. |
See Section 1.2.3
The framework shall optimize the memory management of data products. |
The framework shall permit algorithm authors to specify that the algorithm requires serial access to a thread-unsafe resource. |
See Section 3.9.1
The framework shall enable the specification of user-defined resources required by the program. |
See Section 3.9.4
The framework shall enable the specification of user-defined resources required by the algorithm. |
See Section 3.9.4
The framework shall support the specification of data products created as output by an algorithm. |
See Section 3.1, Section 3.4
B.2. Supporting Requirements¶
The framework shall shut down if the platform fails to meet each specified hardware requirement. |
The framework documentation shall provide instructions for writing framework-executable algorithms in supported languages. |
The framework shall support reading from disk only the data products required by a given algorithm. |
The framework shall provide an option to persist the configuration of each framework execution to the output of that execution. |
See the technical design (under preparation)
The framework shall record the job’s execution environment. |
The framework shall gracefully shut down if the program attempts to exceed a configured memory limit. |
The framework shall support algorithms that perform calculations using a local GPU. |
The framework shall support algorithms that perform calculations using a remote GPU. |
The framework shall allow algorithms to use the same parallelism mechanisms the framework uses to schedule the execution of algorithms. |
See the technical design (under preparation)
The framework shall support logging the usage of a specified resource for each algorithm using the resource. |
The framework shall have an option to emit a message stating the resources required by each algorithm of a configured program without executing the workflow. |
The framework shall be able to report the global memory use of the framework program at user-specified points in time. |
See the technical design (under preparation)
The framework shall have an option to provide elapsed time information for each algorithm executed in a framework program. |
See the technical design (under preparation)
The framework shall support a logging solution that is usable in an algorithm without that algorithm explicitly relying on the framework. |
See the technical design (under preparation)
The framework shall operate independently of unique characteristics of existing hardware. |
The framework shall validate an algorithm’s configuration against specifications provided at registration time. |
See the technical design (under preparation)
The framework shall have an option to emit an algorithm’s configuration schema in human-readable form. |
See the technical design (under preparation)
The framework shall have an option to emit a description of the data flow of a configured program without executing the workflow. |
The framework shall validate the configuration of each algorithm before that algorithm processes data. |
See the technical design (under preparation), the technical design (under preparation).
The framework ecosystem shall support a ROOT IO backend. |
See Section 3.2.1.
The framework IO subsystem shall allow user-configuration of compression settings for each concrete IO implementation. |
The framework shall support user-configurable rollover of output files. |
The framework shall provide the ability to compare two configurations. |
See the technical design (under preparation)
The framework shall provide the list of recordable components of the execution environment. |
The framework shall save each execution-environment description selected by the user from the framework-provided-list. |
The framework’s IO subsystem shall support backward compatibility of data products. |
The framework’s IO subsystem shall support backward compatibility of metadata. |
The framework shall have an option to rollover output files according to a configurable limit on the number of data sets in a user-specified data family. |
The framework shall emit a diagnostic message for each hardware requirement the platform fails to meet. |
The framework ecosystem shall support processing ProtoDUNE single-phase raw data. |
The framework ecosystem shall support processing ProtoDUNE dual-phase raw data. |
The framework ecosystem shall support processing ProtoDUNE II horizontal-drift raw data. |
The framework ecosystem shall support processing ProtoDUNE II vertical-drift raw data. |
The framework shall support the writing of collections too large to hold in memory. |
The framework shall record user-selected items from the shell environment. |
The framework shall record labelled execution environment information provided by the user. |
The framework shall provide a command-line interface that allows the setting of configuration parameters. |
See the technical design (under preparation)
The framework shall support the use of local configuration changes with respect to a separate complete configuration to modify the execution of a program. |
See the technical design (under preparation)
The framework configuration system shall have an option to provide diagnostic information for an evaluated configuration, including origins of final parameter values. |
See the technical design (under preparation)
The language used for configuring a framework program shall include features for maintaining hierarchical configurations from a single point of maintenance. |
See the technical design (under preparation)
The framework shall record metadata identifying data sets where the framework took special measures to process data collections of unconstrained size. |
The framework build system shall support options that enable debugging executed code. |
The framework shall allow the per-execution setting of the float-point environment to control the handling of IEEE floating-point exceptions. |
The framework shall by default attempt a graceful shutdown upon receiving an uncaught exception from user algorithms. |
The framework shall by default attempt a graceful shutdown when receiving a signal. |
The framework shall emit a diagnostic message if the program attempts to exceed the configured maximum memory. |
The framework shall have an option to rollover output files according to a configurable limit on output-file size. |
The framework shall have an option to rollover output files according to a configurable limit on the aggregated value of a user-derived quantity. |
The framework shall have an option to rollover output files according to a configurable limit on the time the file has been open. |
The framework ecosystem shall support an HDF5 IO backend. |
See Section 3.2.1.
The framework shall optimize the availability of external resources. |
The framework shall efficiently execute a graph of algorithms where at least one algorithm requires access to a network resource. |
The framework shall enable the specification of the maximum number of CPU threads permitted by the program. |
The framework shall enable the specification of the maximum CPU memory allowed by the program. |
The framework shall enable the specification of GPU resources required by the program. |
The framework shall efficiently execute a graph of algorithms where at least one algorithm specifies a required amount of CPU memory. |
The framework shall efficiently execute a graph of algorithms where at least one algorithm specifies a required amount of GPU memory. |
The framework shall enable the specification of the maximum number of CPU threads permitted by the algorithm. |
See Section 3.4
The framework shall enable the specification of GPU resources required by the algorithm. |
The framework shall enable the specification of an algorithm’s expected CPU memory usage. |
See Section 3.2.1.1