Computational and Energy Use of CMIP6

Paper

Design of CPMIP: computational performance CMIP6, designed to assess model performance and cost of running these simulations.

Result: ~1600T of CO2Eq

Total energy = Simulated years x JPSY (Jouls per simulated year)

Carbon footprint = Total energy x CF x PUE

where:

  • CF = Greenhouse gas conversion factor (MWh to CO2Eq)
  • PUE = Power usage effectiveness (accounts for data center effectiveness)

Useful metrics:

  • SYPD: simulation years per day
  • CHSY: core hours per simulated year
  • Parallelization, complexity, and resolution: cluster / experiment dependant metrics (constant values)
  • Data output cost: greatly influenced by I/O configuration
  • Data intensity: production efficiency of data (data generated per core hour, correlated with SYPD)
  • Workflow and infrastructure cost: cost of running the simulation, very much infrastructure dependant (11% - 75% of total cost)
  • Coupling cost: cost of coupling ESM models

\( CC = \frac{T_M P_M - \sum_c T_C P_C}{T_M P_M} \)

where:

  • \( T_M \) = total runtime for model
  • \( P_M \) = parallelization for coupled model
  • \( T_C \) = total runtime for individual component
  • \( P_C \) = parallelization for individual component

Other metrics:

  • Speed / cost / parallel: closely related to model speed and parallelizability
  • Memory bloat = (Mem size - Parallel x File size) / Ideal mem size = ~ 10 - 100
  • Useful simulated years: years that are actually used for analysis
  • Data produced: data generated by the simulation

How to estimate missing numbers? mean(ESGF data) or mean(Total data) or mean(ESGF data)