ERA5 spectral path
The ERA5 spectral path takes ECMWF model-level vorticity (VO), divergence (D), and log surface pressure (LNSP) GRIB fields, synthesizes mass-conserving wind / mass-flux fields on a target grid via Holton-style continuity-consistent reconstruction, and writes a transport binary (format_version = 3).
All three target topologies (LL, RG, CS) run through the unified driver run_unified_preprocessor_day! with the source axis fixed to ERA5SpectralReader and the target axis chosen via [grid].type.
Required input
| File pattern | Contents | Hourly? |
|---|---|---|
era5_spectral_YYYYMMDD_lnsp.gb | log(surface pressure) spectral coefficients | yes |
era5_spectral_YYYYMMDD_vo_d.gb | vorticity + divergence spectral coefficients (137 model levels) | yes |
era5_thermo_ml_YYYYMMDD.nc | specific humidity (for dry-basis conversion + mass-fix) | yes |
The thermo file is mandatory if [output] mass_basis = "dry" (the default and only currently-tested binary basis for runtime consumption); without it, the dry-basis correction (1 − qv) cannot be applied. ECMWF distributes spectral data via CDS / MARS; the download script scripts/download_era5.py handles the CDS path.
The preprocessor prefers one extra day past the requested range so the last window of each day can read its forward-flux endpoints from the next day's hour-0 instantaneous fields. All three target paths fall back to a zero-tendency final-window closure when the next-day file is missing — the binary still builds, and replay continuity holds, but the very last window of the last day uses an implicit dm = 0 rather than a real next-day delta. As a rule of thumb, download [start, end+1] for production runs; quickstart- sized one-day jobs are fine without.
Targets
The spectral source supports all three target topologies; pick the one that matches your downstream run:
| Target | Topology-specific workspace |
|---|---|
LatLonTargetGeometry | LatLonSpectralWindowWorkspace (src/Preprocessing/transport_binary/latlon_spectral.jl) |
ReducedGaussianTargetGeometry | ReducedGaussianSpectralWindowWorkspace (src/Preprocessing/reduced_transport_helpers.jl) |
CubedSphereTargetGeometry | CubedSphereSpectralWindowWorkspace (src/Preprocessing/transport_binary/cubed_sphere_spectral.jl) |
All three workspaces subtype AbstractWindowWorkspace{G, FT} and are allocated once per day by allocate_window_workspace. The PreprocessorRunCache (allocated once per run) holds the shared expensive objects: the LL→CS conservative regridder, the RG compressed-Laplacian factorization. These persist across days within a single batch invocation.
The CS target adds a lat-lon staging step between spectral synthesis and CS panels — spectral-to-CS-direct is mathematically possible but the conservative regrid via an intermediate LL grid is cheaper and matches the LL/RG paths' precision. The staging resolution is configurable via staging_nlon, staging_nlat in the [grid] block; the defaults max(4·Nc, 360) × max(2·Nc + 1, 181) are what the LL bundle binaries use.
Per-window pipeline
For each of the day's 24 hourly windows:
- Spectral synthesis (
spectral_synthesis.jl::spectral_to_native_fields!)
Reconstruct LNSP → PS via inverse spectral transform.
Reconstruct VO + D → (u_stag, v_stag) via the standard IFS streamfunction / velocity-potential decomposition, then to mass fluxes (am, bm) directly from the Holton continuity formula.
137 native model levels are synthesized; the
level_top/level_botconfig window picks a slab.
- Vertical level merging (
vertical_coordinates.jl::merge_thin_levels,select_levels_echlevs)
The
[grid] echlevsmode (e.g."ml137_tropo34") selects a vertical-level recipe — typically a tropospheric subset merged to ~34 layers via the TM5 r1112 echlevs scheme.merge_min_thickness_Pa = 1000.0floors the merged layer thickness so no merged layer is thinner than ~1 km in the tropopause region.
- Mass-fix (
mass_support.jl::pin_global_mean_ps_using_qv!)
Optional, gated by
[mass_fix] enable = true.Computes a uniform PS offset per window so the global-mean dry surface pressure equals
target_ps_dry_pa.LL and CS targets use the hourly QV field for the offset.
RG targets use the constant
qv_global_climatologyscalar (the per-cell hourly QV is dry-converted later in the pipeline; the mass-fix step itself uses the climatology for cost reasons).
The preprocessor applies the offset in place while building the window — the binary's stored PS, mass, and fluxes are already pinned. The LL writer additionally records the per-window offsets in the binary header as
ps_offsets_pa_per_windowfor diagnostic traceability; the CS and RG writers carry onlymass_fix_enabled(and a handful of related metadata flags), no per-window offset vector. The runtime never re-applies the offsets in any case — there is nothing left to apply.Without
[mass_fix], raw ERA5 spectral analysis drifts by tens of Pa per window, which compounds to noticeable day-boundary tracer-mass jumps over a multi-week run.
- Dry-basis conversion (
mass_support.jl::apply_dry_basis_native!)
Multiplies cell-centered DELP and PS by
(1 − qv); multiplies face-staggered mass fluxes by face-averaged(1 − qv_face).Applied before the binary write; the binary is dry-basis end-to-end.
- Poisson balance — applied per target, with target-specific solver and finalisation:
LL:
apply_poisson_balance!(inlatlon_contracts.jl) uses an FFT-based Poisson solve over the periodic-longitude grid; followed byrecompute_cm_from_dm_target!so the explicit-dmclosure holds at write-time replay.RG: a CG-based solve over the compressed face Laplacian, streaming previous-window state forward; the per-day binary is built window-by-window rather than accumulated.
CS (spectral source): a CG-based solve on the per-panel cubed-sphere Laplacian; cm is then derived via
diagnose_cs_cm!rather thanrecompute_cm_from_dm_target!.All three converge to the same plan-39 dry-basis tolerance and satisfy
‖m_evolved − m_stored_{n+1}‖/‖m_{n+1}‖ ≤ replay_tolerance(FT)at the write-time gate.
- Write + replay gate —
verify_storage_continuity_*!re-runs the forward stepper on the just-written window and checks‖m_evolved − m_stored_{n+1}‖ / ‖m_stored_{n+1}‖ ≤ replay_tolerance(FT)(1e-10for Float64,1e-4for Float32). Failures throw rather than letting the bad binary land.
A worked LL preprocessing config
# Preprocesses 5° (72 × 37) ERA5 spectral → daily transport binary, F32.
# (config/preprocessing/era5_ll72x37_advresln_dec2021_f32.toml)
[input]
spectral_dir = "~/data/AtmosTransport/met/era5/0.5x0.5/spectral_hourly"
thermo_dir = "~/data/AtmosTransport/met/era5/0.5x0.5/physics"
coefficients = "config/era5_L137_coefficients.toml"
[output]
directory = "~/data/AtmosTransport/met/era5/ll72x37_advresln/transport_binary_v2_tropo34_dec2021_f32"
mass_basis = "dry"
[grid]
type = "latlon"
nlon = 72
nlat = 37
level_top = 1
level_bot = 137
echlevs = "ml137_tropo34"
merge_min_thickness_Pa = 1000.0
[numerics]
float_type = "Float32"
dt = 900.0
met_interval = 3600.0
[mass_fix]
enable = true
target_ps_dry_pa = 98726.0
qv_global_climatology = 0.00247The CS variant adds two [grid] keys (Nc, regridder_cache_dir, optionally staging_nlon / staging_nlat) and a different type = "cubed_sphere". See config/preprocessing/era5_cs_c24_transport_binary*.toml for a working example.
The synthetic-RG variant uses type = "synthetic_reduced_gaussian" + gaussian_number = N + nlon_mode = "octahedral"; see era5_o090_transport_binary.toml.
Performance notes
The slowest step is spectral synthesis: at LL 720×361 it's ~30 s per window F64 on a recent 8-core CPU, ~10 s F32. Multiplied by 24 windows × 30 days, a one-month F32 run is ~2 h.
The CS variant's conservative regrid dominates at high
Nc.PreprocessorRunCachebuilds the regridder once per batch run; the JLD2 cache (under~/.cache/AtmosTransport/cr_regridding/) makes the regridder persist across batch invocations.Replay-gate verification doubles the window's compute (one forward step per window). It is on by default; set the env var
ATMOSTR_NO_WRITE_REPLAY_CHECK=1to skip (not recommended for production binaries).All three target topologies (LL, RG, CS) run on F32 in production.
What's next
GEOS native cubed-sphere — the FV3-native path (no spectral synthesis, no Poisson balance; column-balanced fluxes with the raw next-hour dry endpoint).
Regridding — how the conservative weights are built and cached.
Conventions cheat sheet — units, tolerances, panel conventions.