-
Notifications
You must be signed in to change notification settings - Fork 0
Exascale Challenge Problem definition
This page outlines the approach to define exascale Challenge Problems for ExCALIBUR.
This approach is based heavily on that taken by the US ECP
The general process is:
- Identify exascale use cases that will become Challenge Problems
- Define Category for the Challenge Problem based on software maturity for exascale application
- Complete Challenge Problem template (template is below along with examples)
- For Category 1 Challenge Problems, define a Figure of Merit (FOM) that will be used to measure the performance improvement from the baseline on current systems (examples of FOM definitions are given below).
- Category 1 problems: Codebase reasonably well-developed for exascale use cases: For these challenge problems, a key concept is the performance baseline, which is a quantitative measure of an application Figure of Merit (FOM) using the fastest computers available at the inception of ExCALIBUR against which the final FOM improvement is measured. This includes systems such as ARCHER2 (maybe DiRAC Tursa for GPU-enabled codes) - systems in the 10–20 PFLOP/s range.
- Category 2 problems: Codebase needs substantial work for exascale use cases: Without a well-defined starting point at the 10–20 PFLOP/s scale, it is unclear what FOM improvement would correspond to a successful outcome. A more appropriate measure of success for these applications is whether the necessary capability to execute their exascale challenge problems is in place at the end of the project, not the relative performance improvement throughout the project.
- Working group name:
- Problem category: 1 or 2 (see description of categories above)
- Physical phenomena and associated models: Brief description of the phenomena the challenge problem computes and the modelling approach(es) used.
- Numerical approach, algorithms: Brief summary of algorithms used and the numerical approach used by the software to address the challenge problem.
- Simulation details: Problem size, complexity, geometry, etc.
- Demonstration calculation requirements: Summary of the requirements, e.g. number of timesteps, iterations, starting conditions.
- Resource requirements to run demonstration calculation: Estimated requirements from an exascale system to be able to run the challenge problem (fraction of system, runtime, etc.)
- Figure of Merit (FOM): For Category 1 problems, definition of the FOM (see below).
The Figure of Merit (FOM) is a quantitative measure of the performance of the proposed Challenge Problem. It is expected that a Challenge Problem should aim to show an increase of at least 40x on an exascale system compared to the baseline performance on current systems in the region of 25 PFLOP/s. This FOM will differ from problem to problem and should be defined to capture the key performance metric for the use case. Some example FOMs from the US ECP are provided below.
Further examples can be found in the ECP Applications Development Milestone Report
Problem category:
1
Physical phenomena and associated models:
Study the decays of K, D and B mesons. Examine both simple single-hadron final states and more complex processes involving multi-hadron final states, decay-induced mixing, long-distance effects and E&M processes.
Numerical approach, algorithms:
Use the methods of lattice QCD and a chiral fermion formulation. E&M effects are treated with infinite-volume methods. Linear and bi-linear combinations of composite operators are renormalized non-perturbatively. Requires Lanzcos eigenvectors, deflation, all-2-all propagators, all-mode-averaging, open boundary conditions and Fourier acceleration.
Simulation details:
The target lattice volume is 963 × 384 with a lattice spacing of a = 0.055 fm. The Wilson gauge action and Mobius DWF would be used.
Demonstration calculation requirements:
Monte Carlo evolution for 5 time units of the physical mass, 963 × 384, a = 0.55 fm ensemble. Start with a replicated equilibrated configuration constructed from 162 periodic copies of a 323 × 64 configuration. Standard suite of measurements on a single configuration.
Resource requirements to run demonstration calculation:
25% of the full exascale machine for 6 hours for evolution and for 10 hours for measurements.
Figure of Merit (FOM):
The FOM is calculated as the geometric mean of the gauge-generation FOM and the analysis FOM. Each of those FOMs is defined to be
FOM(B_a → F_b) = (t_a(B) * f_a(B) * n_b) / (t_b(F) * f_b(F) * n_a)
where B, F represent the baseline and final system; a, b represent baseline and target problems; and t, f, and n represent the wall time, fraction of the system used (assuming benchmark run is part of a large ensemble), and complexity of the problem (FLOPs).
Problem category:
1
Physical phenomena and associated models:
Deep learning neural networks for cancer: feed-forward, auto-encoder, recurrent neural networks.
Numerical approach, algorithms:
Gradient descent of model parameters; optimization of loss function; network activation function; regularization; and learning rate scaling methods.
Simulation details:
Large-scale machine learning solutions will be computed for the three cancer pilots.
- Pilot1—leave-one-out cross validation of roughly 1,000 drugs by 1000 cell lines. This involves roughly one million models. Partition the drugs and cell lines into n sets and train with those for e epochs then transfer the weights w to the next set of models expanding the number of models in each iteration. Each of the models at iteration i can be safely (avoiding information leakage) used to seed models for iteration i + 1, where the set of drugs and cell lines in the i + 1 validation set were not in the training set of the model at iteration i.
- Pilot2—state identification and classification of one or more RAS proteins binding to a lipid membrane; prediction over time of clustering behavior of key lipid populations that leads to RAS protein binding. RAS proteins are represented in sufficient resolution to model all pairwise interactions within and between proteins. Lipid membranes are represented as continuous density fields of tens of species of lipid concentration. Predictions are trained on the cross-product 1,000s of simulations, each of which are thousands of time steps, over multiple protein configurations, and performed for a large range of different concentrations.
- Pilot3—predicting cancer phenotypes and patient treatment trajectories from millions of cancer registry documents. Thousands of multitask phenotype classification models will be built from defined combinations of descriptive terms extracted from 10K curated text training sets. To accelerate model training, the team will use a transfer learning scheme with sharing of weights during training.
Demonstration calculation requirements:
The computation performed at scale will be standard neural network computations, matric multiplies, 2D convolutions, pooling, etc. These will be specifically defined by the models chosen to demonstrate transfer learning. The computations performed at scale will require sharing of weights.
Resource requirements to run demonstration calculation:
For each Pilot, it is estimated that each pilot problem will require up to 12 hours at full system.
Figure of Merit (FOM):
The FOM is the average “rate” of model training to convergence. On a given system, it can be demonstrated that n instances of a pilot model (Pi) can be trained to convergence in a given time t thus producing a rate xi equal to n/t. Models are defined as part of each of the three pilot applications. Because each of the three pilot applications will focus on different deep neural network based models, and because rates are being measured for each pilot model, the total FOM will be the harmonic mean (H) of the rates across the models from the three pilot applications where x is the rate of nth model trained to convergence.
The rate of model training to convergence is model specific, and the time varies accordingly with the number of epochs needed to train the model to convergence. For this reason, the team will need to choose a few numbers of models (ideally one per pilot project), fix the input data and fix the number of epochs associated with each model so that repeated measurements of the FOM can be compared to previous measurements. A weighted harmonic mean (Hw) will be used that considers the number of epochs required to train to convergence for the different models. In cases where computational resources or training to convergence exceeds standard queue policies, and if it may be assumed that the time per epoch is constant, Hw remains a valid choice. This assumes that the time per epoch is constant, which holds true for many of the pilot application’s models.
Problem category:
2
Physical phenomena and associated models:
Multiscale CFD with reacting flows in both low-Mach and compressible formulations with DNS+LES turbulence resolution with multispecies chemistry.
Numerical approach, algorithms:
Time-explicit and deferred correction strategies in a compressible and projection-based low Mach formulation, respectively. Finite volume spatial discretizations on block-structured AMR grids with embedded boundaries. Hybrid DNS/LES to enable fully resolved (DNS) treatment where turbulence-chemistry interaction occurs and modeled (LES) treatment to reduce resolution requirements in low heat release portions of the flow.
Simulation details:
Gas-phase simulation of multiple jets (4) interacting in a 14 scale geometry (2.5 cm diameter) derived from a production engine piston bowl with a flat head and centered fuel injector. Multiple pulses including a low-reactivity and high-reactivity fuel capturing cross-mixing and reactions between fuels using ∼30–35 species; 1.5 ms physical simulation time; four levels of hierarchical mesh refinement; finest grid 1.25 μm; realistic environment: >50 bar.
Demonstration calculation requirements:
Restart from checkpoint that gives us a realistic development of plume into geometry obtained by running the case with restricted resolution. (two levels of mesh refinement), additional refinement added at restart, for a total of four levels, and run 10–20 time steps to compute a realistic grind time that can be used to estimate the cost of the full time horizon.
Resource requirements to run demonstration calculation:
Estimate to run 20-time steps is 1.5 hours using the anticipated full system.
Problem category:
2
Physical phenomena and associated models:
MSN Fragment energetics (reaction rates) and dynamics (diffusion rates) computations with at least 10,000 atoms for the pore + solvent. The go-to level of theory will be EFMO/RI-MP2 with an adequate basis set (e.g., 6-31G(d,p)) for the pore + catalyst + gatekeeper. The solvent will be treated either with the same level of theory or with EFP. Final energies will be captured using multi-level EFMO calculations, with either coupled cluster or QMC calculations for the reaction region and RI-MP2 elsewhere.
Numerical approach, algorithms:
Configurations can be computed concurrently; each configuration will utilize the EFMO fragmentation approach to spatially parallelize the calculation of underlying quantum-chemistry methods which are typically characterized by dense linear algebra like operations: Hartree-Fock → RI-MP2 → Coupled Cluster / Quantum Monte Carlo (QMC).
Simulation details:
At least 10,000 atoms, comprising the MSN pore, reactants, and solvents. An estimated one million basis functions.
Demonstration calculation requirements:
Demonstrate the ability to complete the science challenge problem by concurrently running a subset (1–10) of atomic configurations concurrently with EFMO-RI-MP2 on the full exascale system.
Resource requirements to run demonstration calculation:
Full exascale machine for 2–4 hours for each energy + gradient RI-MP2 calculation.