Validation status

This page is the honest current-state report of what AtmosTransport has been validated against, what hasn't been validated yet, and the floating-point tolerances that hold in each case. The goal is to let an atmospheric-transport practitioner decide quickly whether the level of validation here meets their needs — and where the gaps are.

Verification vs validation

The terms in their canonical sense:

Verification ("are we solving the equations correctly?") — the test suite covers this for the three schemes (UpwindScheme, SlopesScheme, PPMScheme) that live in the shared kernel test matrix in test/test_advection_kernels.jl: uniform-invariance, mass-budget, and CPU/GPU-agreement tests all run for those three. LinRoodPPMScheme has a CPU CS runtime smoke test in test_cubed_sphere_runtime.jl:322 but is not covered by the same per-step kernel matrix. The replay gate enforces the discrete-conservation contract on every preprocessor write and every opt-in runtime load, regardless of scheme. See Conservation budgets for the per-test breakdown.
Validation ("are we solving the right equations?") — this page. Validation is comparison against external reference data (TM5, GCHP, observations, …) and is fundamentally less complete than verification.

If you only need the first — reproduction of a published algorithm under your own forcing within tolerances documented in the test suite — verification is solid (for the schemes that are in the test matrix) and you can proceed. If you need cross-model comparison or observational match, read the gaps below.

What HAS been validated

Synthetic-fixture suite (verification, comprehensive)

The ~75 core test files in test/runtests.jl run on every push and PR via the CI workflow, with no external data dependency. The count grows steadily; the current authoritative list lives between the core_tests = [ opening and its matching ] in test/runtests.jl. Anchor tables:

Property	Test files	Status
Uniform tracer invariance under a synthetic flow (relative err < 1e-6)	`test_advection_kernels.jl` covers CPU for `Upwind` / `Slopes` / `PPM` (line 153); GPU coverage is `Upwind` only (line 201). `LinRoodPPMScheme` is not in this matrix.	green where exercised
Global mass conservation (gradient IC, 4 steps)	`test_advection_kernels.jl` (CPU+GPU; line 166-199), `test_cubed_sphere_advection.jl`	green
Cross-window replay closure	`test_replay_consistency.jl` (Plan 39 H gate)	green
Cross-day continuity (synthetic GEOS C8 fixture)	`test_geos_cs_passthrough.jl` (3467 cases)	green
GEOS native CS preprocessor end-to-end (synthetic fixture)	`test_geos_reader.jl` (48), `test_geos_cs_passthrough.jl`, `test_geos_convection.jl` (26)	green
Conservative regrid mass closure	`test/regridding/test_conservation.jl`, `test_ll_to_cs_regrid_script.jl` (script-level tol `1e-6`)	green
CPU / GPU agreement (4 ULP for Upwind 1-step; 16 ULP for Slopes / PPM 4-step, F32 and F64; LinRood smoke-only)	`test_advection_kernels.jl` (CUDA-gated `@testset "CPU-GPU agreement"`); LinRood CS smoke in `test_cubed_sphere_runtime.jl`	green where exercised
Operator dispatch (Strang palindrome ordering, NoOp dead branches)	`test_transport_model_convection.jl`, `test_tm5_convection.jl`, `test_cmfmc_convection.jl`	green

Total core-suite cases: thousands; CI breaks down pass/fail per file.

Real-data preprocessor smoke tests (verification with real input)

Path	What was verified	Status
ERA5 spectral → LL 72×37 F32, Dec 2021	preprocessor closes write-time replay gate; runtime steps cleanly; conservation tested via uniform IC	green (proven on disk; matches the quickstart bundle)
ERA5 spectral → LL 144×73 F32, Dec 2021	same	green
ERA5 spectral → CS C24 F32, Dec 2021	same; F32-CS path requires the `f3b3abf` fix to spectral_synthesis.jl	green (post-`f3b3abf`)
ERA5 spectral → CS C90 F32, Dec 2021	same	green (post-`f3b3abf`)
GEOS-IT C180 → CS C180 F64 native	preprocessor closes write-time replay gate; binary loads cleanly via `inspect_transport_binary.jl`; per-window snapshot output verified. The 2026-04-25 unified-chain validation flagged a runtime GPU step-1 blocker on the unmerged-vertical C180 binary; the production response targets the `merge_above_pressure = 0.25 hPa` 64-level product with adaptive substeps. Status against that product is tracked in current Catrine config notes.	preprocessor green; current C180 runtime path uses merged + adaptive-substep binaries

Model parity (TM5)

What	Tests	Status
TM5 four-field convection (`entu/detu/entd/detd`) parity with the TM5 F90 reference	`test_tm5_preprocessing.jl`, `test_tm5_preprocessing_rates.jl`, `test_tm5_vs_cmfmc_parity.jl`, `test_tm5_driven_simulation.jl`, `test_tm5_process_day.jl`, `test_tm5_vertical_remap.jl`	green
Russell-Lerner slopes vs TM5's `advectx__slopes` / `advecty__slopes`	line-for-line port; documented in `reconstruction.jl:266` and `:213-265` derivation	green by construction (port verified via uniform-invariance + mass-budget tests)

The TM5 parity work is the most thoroughly validated cross-model comparison the runtime currently has.

What HAS NOT been validated end-to-end

The following work is on the roadmap but not yet done:

Gap	Why it matters	Status
GCHP parity for full-physics CS runs	The CMFMC convection and ImplicitVerticalDiffusion operators are independently unit-tested but a full multi-day GCHP-vs-AtmosTransport intercomparison on identical met forcing has not been published.	run scripts exist (`scripts/diagnostics/compare_*` family) but no committed parity report
CATRINE D7.1 intercomparison	The European CATRINE protocol is the natural validation target (4 tracers: CO2, fossil CO2, SF6, 222Rn; full-physics; multi-month). The configs (`config/runs/catrine_.toml`) exist; the runtime can produce the output. The gated 1-day smoke test* `test/test_tm5_catrine_1day.jl` (in the `--all` suite) exercises the Catrine TM5-physics setup over a single day, but no full multi-month CATRINE-protocol regression test is committed and no protocol-vs-reference comparison memo has been published.	gated 1-day smoke test in place; output runs successfully (see `docs/validation/geosit_c180_unified_chain_2026_04_25.md` — internal memo); full protocol regression not yet wired
Observational closure	Comparison of model output (column CO2, surface SF6 etc.) against an observational network (NOAA in-situ + TCCON / OCO satellite)	not started
Multi-month GPU production runs	The longest GPU validation run committed is 7 days. Multi-week stability has been spot-checked but not regression-tested.	committed test ceiling: 7-day; production target: ~30-day
Adjoint kernels	See Adjoint status. Tape + checkpoint + revolve (bisection variant) + four-scheme reverse pass + 4D-Var driver are on CI. Gaps: CMFMC convection adjoint, `copy_corners` reverse, optimal binomial Revolve, TM5-4DVAR cross-validation.	partial (shipped)

Floating-point tolerance practice

Tolerances vary by operation; the canonical sources:

Operation	F64 tolerance	F32 tolerance	Reference
Per-window replay gate	`1e-10`	`1e-4`	`src/MetDrivers/ReplayContinuity.jl::replay_tolerance(FT)`
Window-continuity verification (test variant)	`1e-12`	`1e-6`	`test/test_replay_consistency.jl:84`
Per-step uniform-tracer invariance (relative)	`1e-6`	`1e-6`	`test_advection_kernels.jl` (`@testset "uniform tracer"`)
4-step total mass conservation (gradient IC, structured grid)	`1e-12`	`5e-5`	`test_advection_kernels.jl` (`@testset "mass conservation"`)
CPU/GPU advection agreement	`4 * eps(FT)` (Upwind 1-step) / `16 * eps(FT)` (Upwind / Slopes / PPM 4-step)	same as F64 column	`test_advection_kernels.jl` (CUDA-gated `@testset "CPU-GPU agreement"`). `LinRoodPPMScheme` covered only by the CS smoke in `test_cubed_sphere_runtime.jl`.
Conservative regrid mass closure (script-level acceptance)	`≤ 1e-6` rel	same	`test/test_ll_to_cs_regrid_script.jl:175–178`
Cross-day GEOS chain continuity	machine epsilon (`5.94e-16` F64 measured)	`~3.5e-7` F32 measured	preprocessor stdout from `process_day`

The F64 tolerances reflect double-precision noise floors at production resolutions; F32 tolerances reflect single-precision accumulation. Production runs on the L40S GPU use F32 by default — the F32 noise floor is the operational tolerance.

What this means for users

If you are doing:

Advection algorithm research → verification is solid, F32 / F64 noise-floor agreement is well-tested. Proceed.
CO2 intercomparison studies that need GCHP-equivalent fidelity → the underlying operators are TM5-faithful or GCHP-style; the end-to-end intercomparison report has not been written. Run a side-by-side and compare yourself; the run scripts in scripts/diagnostics/compare_* are the starting point.
Inverse modelling that needs an adjoint → the adjoint and 4D-Var stack ship on CS. See Adjoint status for the supported scheme matrix and the remaining gaps (CMFMC adjoint kernel, copy_corners reverse, TM5-4DVAR cross-validation).
Validation against observations → not in scope today; the forward model has the fidelity, but the observation-comparison diagnostics are external.

Where to read next

Adjoint status — what the README claims vs what actually ships.
Conservation budgets — the explicit @test assertions that anchor the verification claims above.
Phase 7: Configuration & Runtime — TOML schema for the run configs that drive the validation work above.

Validation status ​

Verification vs validation ​

What HAS been validated ​

Synthetic-fixture suite (verification, comprehensive) ​

Real-data preprocessor smoke tests (verification with real input) ​

Model parity (TM5) ​

What HAS NOT been validated end-to-end ​

Floating-point tolerance practice ​

What this means for users ​

Where to read next ​