Skip to content

Validation status

This page is the honest current-state report of what AtmosTransport has been validated against, what hasn't been validated yet, and the floating-point tolerances that hold in each case. The goal is to let an atmospheric-transport practitioner decide quickly whether the level of validation here meets their needs — and where the gaps are.

Verification vs validation

The terms in their canonical sense:

  • Verification ("are we solving the equations correctly?") — the test suite covers this for the three schemes (UpwindScheme, SlopesScheme, PPMScheme) that live in the shared kernel test matrix in test/test_advection_kernels.jl: uniform-invariance, mass-budget, and CPU/GPU-agreement tests all run for those three. LinRoodPPMScheme has a CPU CS runtime smoke test in test_cubed_sphere_runtime.jl:322 but is not covered by the same per-step kernel matrix. The replay gate enforces the discrete-conservation contract on every preprocessor write and every opt-in runtime load, regardless of scheme. See Conservation budgets for the per-test breakdown.

  • Validation ("are we solving the right equations?") — this page. Validation is comparison against external reference data (TM5, GCHP, observations, …) and is fundamentally less complete than verification.

If you only need the first — reproduction of a published algorithm under your own forcing within tolerances documented in the test suite — verification is solid (for the schemes that are in the test matrix) and you can proceed. If you need cross-model comparison or observational match, read the gaps below.

What HAS been validated

Synthetic-fixture suite (verification, comprehensive)

The ~75 core test files in test/runtests.jl run on every push and PR via the CI workflow, with no external data dependency. The count grows steadily; the current authoritative list lives between the core_tests = [ opening and its matching ] in test/runtests.jl. Anchor tables:

PropertyTest filesStatus
Uniform tracer invariance under a synthetic flow (relative err < 1e-6)test_advection_kernels.jl covers CPU for Upwind / Slopes / PPM (line 153); GPU coverage is Upwind only (line 201). LinRoodPPMScheme is not in this matrix.green where exercised
Global mass conservation (gradient IC, 4 steps)test_advection_kernels.jl (CPU+GPU; line 166-199), test_cubed_sphere_advection.jlgreen
Cross-window replay closuretest_replay_consistency.jl (Plan 39 H gate)green
Cross-day continuity (synthetic GEOS C8 fixture)test_geos_cs_passthrough.jl (3467 cases)green
GEOS native CS preprocessor end-to-end (synthetic fixture)test_geos_reader.jl (48), test_geos_cs_passthrough.jl, test_geos_convection.jl (26)green
Conservative regrid mass closuretest/regridding/test_conservation.jl, test_ll_to_cs_regrid_script.jl (script-level tol 1e-6)green
CPU / GPU agreement (4 ULP for Upwind 1-step; 16 ULP for Slopes / PPM 4-step, F32 and F64; LinRood smoke-only)test_advection_kernels.jl (CUDA-gated @testset "CPU-GPU agreement"); LinRood CS smoke in test_cubed_sphere_runtime.jlgreen where exercised
Operator dispatch (Strang palindrome ordering, NoOp dead branches)test_transport_model_convection.jl, test_tm5_convection.jl, test_cmfmc_convection.jlgreen

Total core-suite cases: thousands; CI breaks down pass/fail per file.

Real-data preprocessor smoke tests (verification with real input)

PathWhat was verifiedStatus
ERA5 spectral → LL 72×37 F32, Dec 2021preprocessor closes write-time replay gate; runtime steps cleanly; conservation tested via uniform ICgreen (proven on disk; matches the quickstart bundle)
ERA5 spectral → LL 144×73 F32, Dec 2021samegreen
ERA5 spectral → CS C24 F32, Dec 2021same; F32-CS path requires the f3b3abf fix to spectral_synthesis.jlgreen (post-f3b3abf)
ERA5 spectral → CS C90 F32, Dec 2021samegreen (post-f3b3abf)
GEOS-IT C180 → CS C180 F64 nativepreprocessor closes write-time replay gate; binary loads cleanly via inspect_transport_binary.jl; per-window snapshot output verified. The 2026-04-25 unified-chain validation flagged a runtime GPU step-1 blocker on the unmerged-vertical C180 binary; the production response targets the merge_above_pressure = 0.25 hPa 64-level product with adaptive substeps. Status against that product is tracked in current Catrine config notes.preprocessor green; current C180 runtime path uses merged + adaptive-substep binaries

Model parity (TM5)

WhatTestsStatus
TM5 four-field convection (entu/detu/entd/detd) parity with the TM5 F90 referencetest_tm5_preprocessing.jl, test_tm5_preprocessing_rates.jl, test_tm5_vs_cmfmc_parity.jl, test_tm5_driven_simulation.jl, test_tm5_process_day.jl, test_tm5_vertical_remap.jlgreen
Russell-Lerner slopes vs TM5's advectx__slopes / advecty__slopesline-for-line port; documented in reconstruction.jl:266 and :213-265 derivationgreen by construction (port verified via uniform-invariance + mass-budget tests)

The TM5 parity work is the most thoroughly validated cross-model comparison the runtime currently has.

What HAS NOT been validated end-to-end

The following work is on the roadmap but not yet done:

GapWhy it mattersStatus
GCHP parity for full-physics CS runsThe CMFMC convection and ImplicitVerticalDiffusion operators are independently unit-tested but a full multi-day GCHP-vs-AtmosTransport intercomparison on identical met forcing has not been published.run scripts exist (scripts/diagnostics/compare_* family) but no committed parity report
CATRINE D7.1 intercomparisonThe European CATRINE protocol is the natural validation target (4 tracers: CO2, fossil CO2, SF6, 222Rn; full-physics; multi-month). The configs (config/runs/catrine_*.toml) exist; the runtime can produce the output. The gated 1-day smoke test test/test_tm5_catrine_1day.jl (in the --all suite) exercises the Catrine TM5-physics setup over a single day, but no full multi-month CATRINE-protocol regression test is committed and no protocol-vs-reference comparison memo has been published.gated 1-day smoke test in place; output runs successfully (see docs/validation/geosit_c180_unified_chain_2026_04_25.md — internal memo); full protocol regression not yet wired
Observational closureComparison of model output (column CO2, surface SF6 etc.) against an observational network (NOAA in-situ + TCCON / OCO satellite)not started
Multi-month GPU production runsThe longest GPU validation run committed is 7 days. Multi-week stability has been spot-checked but not regression-tested.committed test ceiling: 7-day; production target: ~30-day
Adjoint kernelsSee Adjoint status. Tape + checkpoint + revolve (bisection variant) + four-scheme reverse pass + 4D-Var driver are on CI. Gaps: CMFMC convection adjoint, copy_corners reverse, optimal binomial Revolve, TM5-4DVAR cross-validation.partial (shipped)

Floating-point tolerance practice

Tolerances vary by operation; the canonical sources:

OperationF64 toleranceF32 toleranceReference
Per-window replay gate1e-101e-4src/MetDrivers/ReplayContinuity.jl::replay_tolerance(FT)
Window-continuity verification (test variant)1e-121e-6test/test_replay_consistency.jl:84
Per-step uniform-tracer invariance (relative)1e-61e-6test_advection_kernels.jl (@testset "uniform tracer")
4-step total mass conservation (gradient IC, structured grid)1e-125e-5test_advection_kernels.jl (@testset "mass conservation")
CPU/GPU advection agreement4 * eps(FT) (Upwind 1-step) / 16 * eps(FT) (Upwind / Slopes / PPM 4-step)same as F64 columntest_advection_kernels.jl (CUDA-gated @testset "CPU-GPU agreement"). LinRoodPPMScheme covered only by the CS smoke in test_cubed_sphere_runtime.jl.
Conservative regrid mass closure (script-level acceptance)≤ 1e-6 relsametest/test_ll_to_cs_regrid_script.jl:175–178
Cross-day GEOS chain continuitymachine epsilon (5.94e-16 F64 measured)~3.5e-7 F32 measuredpreprocessor stdout from process_day

The F64 tolerances reflect double-precision noise floors at production resolutions; F32 tolerances reflect single-precision accumulation. Production runs on the L40S GPU use F32 by default — the F32 noise floor is the operational tolerance.

What this means for users

If you are doing:

  • Advection algorithm research → verification is solid, F32 / F64 noise-floor agreement is well-tested. Proceed.

  • CO2 intercomparison studies that need GCHP-equivalent fidelity → the underlying operators are TM5-faithful or GCHP-style; the end-to-end intercomparison report has not been written. Run a side-by-side and compare yourself; the run scripts in scripts/diagnostics/compare_* are the starting point.

  • Inverse modelling that needs an adjoint → the adjoint and 4D-Var stack ship on CS. See Adjoint status for the supported scheme matrix and the remaining gaps (CMFMC adjoint kernel, copy_corners reverse, TM5-4DVAR cross-validation).

  • Validation against observations → not in scope today; the forward model has the fidelity, but the observation-comparison diagnostics are external.

  • Adjoint status — what the README claims vs what actually ships.

  • Conservation budgets — the explicit @test assertions that anchor the verification claims above.

  • Phase 7: Configuration & Runtime — TOML schema for the run configs that drive the validation work above.