Preprocessing API
For narrative coverage of the source × target dispatch model and per-source pipelines, see Preprocessing overview and the per-source pages ERA5 spectral path and GEOS native cubed-sphere.
AtmosTransport.Preprocessing.AbstractBinaryWriter Type
AbstractBinaryWriter{G <: AbstractTargetGeometry, FT,
Basis <: AbstractMassBasis}Typed nominal for the topology's streaming binary writer. The third type parameter encodes the on-disk mass-basis convention (reusing State.AbstractMassBasis so the same nominal flows through the runtime reader path) so a writer↔reader pairing mismatch is a compile-time MethodError.
P1 ships only the abstract type; concrete subtypes land in P2.
sourceAtmosTransport.Preprocessing.AbstractChainPolicy Type
AbstractChainPolicyType-system tag for cross-day mass-state carry. Concrete subtypes are NoChain (no carry; each day starts from the raw source endpoint) and ChainedMass{T} where T is the array shape of the seed (e.g. NTuple{6, Array{Float32, 3}} for GEOS cubed-sphere).
AtmosTransport.Preprocessing.AbstractFieldKind Type
AbstractFieldKindSingleton-type tag selecting the vertical-reduction rule apply_vertical! uses for one payload field. Concrete subtypes (all zero-size singletons) below.
AtmosTransport.Preprocessing.AbstractMetReader Type
AbstractMetReader{FT, S, CP}Typed met-reader nominal. Bundles a met-source's settings, per-day handle context, and chained-mass-policy carry into one struct that the unified preprocessor driver dispatches on. Type parameters:
FT <: AbstractFloat— preprocessing float type (Float32for GPU,Float64for diagnostic runs).S <: AbstractMetSettings— concrete settings type (GEOSITSettings,GEOSFPSettings,ERA5SpectralSettings, …).CP <: AbstractChainPolicy—NoChainorChainedMass{T}for some seed array typeT.
Concrete subtypes implement the seven trait functions below.
sourceAtmosTransport.Preprocessing.AbstractMetSettings Type
AbstractMetSettingsTop-level supertype for typed met-data source descriptors used by the preprocessor. Concrete subtypes (e.g. GEOSITSettings, MERRA2Settings, SpectralERA5Settings) carry source-specific paths and parameters and implement the read_window! / source_grid / windows_per_day interface.
process_day(date, grid::AbstractTargetGeometry, settings::AbstractMetSettings, vertical; ...) dispatches on settings to pick the reader.
AtmosTransport.Preprocessing.AbstractVerticalTransform Type
AbstractVerticalTransformTyped nominal selecting how native source levels are mapped to output levels. Concrete subtypes:
IdentityVertical— no merge.MergeByIndex— explicit native-level groups.MergeLayersThinnerThan— automatic local coarsening.MergeAbovePressure— upper-atmosphere coarsening.LevelSelection— echlevs-style level selection.PressureOverlap— pressure-thickness overlap onto a different hybrid coordinate.
A concrete transform T is consumed by plan_vertical(transform, native_vc) to produce a VerticalPlan{FT, T}. apply_vertical! dispatches on (plan, ::FieldKind) to select the right per-field rule.
AtmosTransport.Preprocessing.AbstractWindowContract Type
AbstractWindowContract{G <: AbstractTargetGeometry, FT}Typed nominal owning a topology's per-window gate policy and the worst-window accumulator state. Concrete subtypes:
CubedSphereContract{FT}(cubed_sphere_contracts.jl)LatLonContract{FT}(latlon_contracts.jl)ReducedGaussianContract{FT}(reduced_gaussian_contracts.jl)
A concrete contract validates its policy at construction time (so e.g. positivity_cfl_limit = 0.0 errors before any window is preprocessed) and exposes the four-method trait surface:
verify_window!(window, contract, win_idx) -> (replay, positivity)
update_accumulator!(contract, positivity_diag, win_idx) -> nothing
summarize_status!(contract; quarantine_path) -> nothingwindow is the topology-specific window payload (NamedTuple of typed buffers today, P2-typed ReadyWindow{G, FT} later).
Closes foot-gun (A) from DESIGN.md: contract knobs aren't drift-prone kwargs anymore — each topology constructs its own contract once from config, with whatever fields IT needs.
sourceAtmosTransport.Preprocessing.AbstractWindowWorkspace Type
AbstractWindowWorkspace{G <: AbstractTargetGeometry, FT}Typed nominal for the per-day target-shape workspace buffers. P1 ships only the abstract type; concrete subtypes land alongside the unified driver cutover in P2 (today's workspaces are NamedTuples constructed inside each topology's process_day orchestrator).
AtmosTransport.Preprocessing.ChainedMass Type
ChainedMass{T} <: AbstractChainPolicyThe reader carries an end-of-day mass field of shape T into the next day's open. The shape T is the seed array type (e.g. NTuple{6, Array{Float32, 3}}); the actual seed VALUE lives in the reader struct, but its STATIC type is encoded here so end_of_day_seed return-type is known at compile time.
AtmosTransport.Preprocessing.ConvectionInterfaceFlux Type
Convective interface flux (e.g. cmfmc). Same selection rule as PressureFluxField.
AtmosTransport.Preprocessing.ConvectionTendencyField Type
Convective center tendency (e.g. dtrain). Extensive at the layer; sum within merged group.
AtmosTransport.Preprocessing.CubedSphereContract Type
CubedSphereContract{FT} <: AbstractWindowContract{CubedSphereTargetGeometry, FT}Typed nominal owning a CS preprocessor's per-window gate policy and worst-window positivity accumulator.
Fields:
replay_tol::Float64— relative replay tolerance.positivity_cfl_limit::Float64— per-substep positivity CFL gate. Must satisfy0 < limit ≤ 1; validated at construction.require_substep_positivity::Bool— whethersummarize_status!errors (true) or warns (false) on a positivity violation.steps_per_window::Int— for the recommended-steps message in the summary's escape-hatch detail. Must be ≥ 1.halo_width::Int— passed through toverify_substep_positivity_cs!.worst::NamedTuple— mutable accumulator (initially zero).
Construct with explicit kwargs; defaults match the CS round-2/round-3 production policy.
sourceAtmosTransport.Preprocessing.ERA5PhysicsBinaryHeader Type
ERA5PhysicsBinaryHeaderIn-memory view of the ERA5 physics BIN header. Parsed from / serialized to a 4 KB JSON block at the start of each BIN file. Fields:
format_version— int, currently 1. Readers error on mismatch.date—Date, the calendar day the BIN represents (hours 00-23).Nlon, Nlat, Nlev, Nt— grid shape.Ntis always 24 for the hourly calendar-day BIN.var_offsets_bytes— NamedTuple mappingvar name → byte offsetinto the file (where that variable's payload starts, after the header block).var_nelems— NamedTuple mappingvar name → total element count.latitude_convention—:S_to_N(AtmosTransport orientation).longitude_range—(first, last, step)in degrees.latitude_range—(first, last, step)in degrees (S to N).provenance— Dict with source NC paths, timestamp, git sha.
AtmosTransport.Preprocessing.ERA5PhysicsBinaryReader Type
ERA5PhysicsBinaryReaderMmap view of one ERA5 physics BIN file. Per-variable getters return reshaped views with zero allocation on subsequent calls.
Fields
path— absolute BIN path.io— openIOStream(keep alive for mmap lifetime).header— parsedERA5PhysicsBinaryHeader.mmap— single flatVector{Float32}over the entire payload. Individual variable views reshape into this.
Usage
reader = open_era5_physics_binary(bin_dir, Date(2021, 12, 1))
try
udmf = get_era5_physics_field(reader, :udmf) # 4D (Nlon, Nlat, Nlev, 24)
slab = @view udmf[:, :, :, 5] # one hour
# ... use slab ...
finally
close_era5_physics_binary(reader)
endAtmosTransport.Preprocessing.ERA5SpectralReader Type
ERA5SpectralReader{FT, S} <: AbstractMetReader{FT, S, NoChain}Typed reader nominal for the ERA5 spectral path. ChainPolicy is fixed at NoChain because today's spectral path does not carry cross-day mass state (it pins global-mean ps per window instead). The per-window read still fuses with merge in today's process_window! and is deferred to P2.
AtmosTransport.Preprocessing.ERA5SpectralSettings Type
ERA5SpectralSettings <: AbstractMetSettingsTyped wrapper for the historical ERA5 spectral settings NamedTuple. The spectral preprocessor still performs spectral synthesis inside the topology workspaces, but the source axis is now explicit: TOML parsing constructs this settings type and topology methods dispatch on it.
AtmosTransport.Preprocessing.GEOSDayHandles Type
GEOSDayHandlesOpen NCDataset handles for one UTC day's GEOS collections plus the resolved level orientation and the hybrid coordinate (loaded once for endpoint-DELP reconstruction). The orchestrator opens these once at the start of process_day and closes them at the end.
AtmosTransport.Preprocessing.GEOSNativeReader Type
GEOSNativeReader{FT, S, CP, V, H} <: AbstractMetReader{FT, S, CP}Typed reader wrapping (GEOSSettings, handles) for one day, plus optional chained-mass state. Type parameters:
FT— preprocessing float type.S <: AbstractGEOSSettings— concrete settings (GEOSITSettings / GEOSFPSettings / …).CP <: AbstractChainPolicy—NoChainorChainedMass{T}.V— seed-field slot type.NothingforNoChain;Union{Nothing, NTuple{6, Array{FT, 3}}}forChainedMass. The V parameter is fixed at construction (Julia-style review round-1: value-dependent V was the type-instability foot-gun; we pin V to the unconditional Union on the chained path so the inner constructor specializes once).H— concrete handles type. Pinned totypeof(open_day(...))at construction soreader.handlesaccess is type-stable (replaces the previous:: Anyfield). Different GEOS flavors produce different handle structs (GEOSDayHandlesfor GEOS-IT,GEOSFPNativeDayHandlesfor GEOS-FP); each becomes its own concrete reader type via this parameter.
Constructor: open_reader(settings::AbstractGEOSSettings, date, FT; seed, next_day_handle). See the function docstring.
AtmosTransport.Preprocessing.GEOSSettings Type
GEOSSettings{flavor} <: AbstractGEOSSettingsSettings for one of the two supported GEOS flavors:
flavor = :geosit— GEOS-IT (file patternGEOSIT.{date}.{collection}.C{Nc}.nc).flavor = :geosfp— GEOS-FP (file patternGEOSFP.{date}.{collection}.C{Nc}.nc).
Auto-detection of level orientation runs at open_geos_day time when level_orientation = :auto. Set explicitly to :bottom_up or :top_down to skip the heuristic.
AtmosTransport.Preprocessing.IdentityVertical Type
No-op identity vertical transform. Nz_output = Nz_native, merge_map[k] = k.
AtmosTransport.Preprocessing.IntensiveCenterField Type
Center-level intensive field (e.g. T, Q). Mass-weighted mean within merged group; weights provided positionally.
AtmosTransport.Preprocessing.LatLonContract Type
LatLonContract{FT} <: AbstractWindowContract{LatLonTargetGeometry, FT}Typed nominal owning an LL preprocessor's per-window gate policy and worst-window positivity accumulator. Construction validates positivity_cfl_limit ∈ (0, 1] and steps_per_window ≥ 1 so an invalid TOML value fails before any window runs.
AtmosTransport.Preprocessing.LevelSelection Type
LevelSelection(echlevs)Typed wrapper for the existing select_levels_echlevs algorithm. echlevs is a vector of native level INTERFACE indices (0-based, bottom-up); levels between selected interfaces are summed. See vertical_coordinates.jl:64 and the ECHLEVS_ML137_* constants.
AtmosTransport.Preprocessing.MassField Type
Center-level extensive mass (e.g. delp, m). Sum native layers within each merged group.
AtmosTransport.Preprocessing.MassFluxField Type
Horizontal face mass flux (e.g. am, bm). Already integrated over the layer thickness; sum within merged group.
AtmosTransport.Preprocessing.MergeAbovePressure Type
MergeAbovePressure(pressure_Pa; target_min_thickness_Pa = Inf,
reference_surface_pressure_Pa = 101325.0)Upper-atmosphere coarsening: native layers whose midpoint pressure is LOWER than pressure_Pa (physically ABOVE the cutoff altitude) get greedily merged until each merged layer exceeds target_min_thickness_Pa. Below the cutoff, the native grid is preserved verbatim.
target_min_thickness_Pa = Inf merges every above-cutoff native layer into one top cap. The GEOS-IT L72 use case is pressure_Pa = 100.0 + target_min_thickness_Pa = 50.0 — merges the ~14 Pa mesospheric layers into ~50 Pa output layers while keeping the troposphere and stratosphere at native resolution.
AtmosTransport.Preprocessing.MergeByIndex Type
MergeByIndex(groups)Explicit native center-level groups. groups[l] is the UnitRange{Int} of native center levels merged into output level l. Validation at plan_vertical:
groups[1]starts at 1;groups[end]ends atNz_native;groups are contiguous (
groups[l+1].start == groups[l].stop + 1);each range is non-empty.
This is the most auditable transform for production reruns — the group list lives in the run TOML and is version-controlled.
sourceAtmosTransport.Preprocessing.MergeLayersThinnerThan Type
MergeLayersThinnerThan(min_thickness_Pa; reference_surface_pressure_Pa = 101325.0)Typed wrapper for the existing merge_thin_levels algorithm: greedily merge adjacent native layers until each output layer exceeds min_thickness_Pa at the reference surface pressure.
AtmosTransport.Preprocessing.NoChain Type
NoChain <: AbstractChainPolicyThe reader does not carry mass state across days. Day N starts from the raw source endpoint. end_of_day_seed(::reader{…, NoChain}) returns nothing (statically inferred).
AtmosTransport.Preprocessing.PreprocessorRunCache Type
PreprocessorRunCache{G, FT}()Small typed cache for artifacts that should be built once per preprocessing run instead of once per day/window (for example spectral LL->CS regridders or RG compressed Laplacians). P2b only introduces the nominal and storage; concrete drivers decide which keys they own as they migrate.
sourceAtmosTransport.Preprocessing.PressureFluxField Type
Vertical interface mass flux (e.g. cm). Interfaces are selected (not summed); top/bottom zeros preserved.
AtmosTransport.Preprocessing.PressureOverlap Type
PressureOverlap(target_coeff_path)Remap native layer integrals onto an independent target hybrid coordinate by pressure-thickness overlap. The target half-level coefficients are loaded from target_coeff_path. Full apply_vertical! implementation lands in P1 alongside the spectral driver cutover; plan_vertical constructs the plan today.
AtmosTransport.Preprocessing.PreverifiedWindow Type
PreverifiedWindow(ready::ReadyWindow, contract_diag; accumulated=false)Typed event wrapper for topology hooks that have already run verify_window! before handing a window to the generic driver. When accumulated=true, the hook also guarantees it has already folded the positivity diagnostic into the contract accumulator.
AtmosTransport.Preprocessing.RawWindow Type
RawWindow{FT, A2, A3}Per-window source-grid intermediate carrying both window endpoints (t_n and t_{n+1}) and the window-integrated horizontal mass fluxes between them.
The right endpoint of window n is the left endpoint of window n+1, so the reader caches it across calls. For the last window of the day, the right endpoint comes from the next day's first instantaneous file (the existing next_day_hour0 plumbing in the orchestrator).
Cell-center winds u, v (geographic frame) are filled only when the target grid differs from the source grid and fluxes must be reconstructed downstream. For native passthrough (source mesh == target mesh) u/v stay nothing and am/bm are written through directly after vertical merging.
cmfmc/dtrain are filled only when the source supports convection and the user has enabled it via settings.include_convection. vdiff carries optional Holtslag-Boville VDIFF source fields (u, v, t, qv) when the source can archive them into the transport binary for runtime diffusion.
AtmosTransport.Preprocessing.ReadyWindow Type
ReadyWindow{G, FT}(index, payload::NamedTuple)Typed wrapper for a topology-specific window payload that is ready to be verified and written. G is the target geometry, FT is the on-disk float type, and payload is the existing topology NamedTuple (m_cur/am/... for LL/CS, m_cur/hflux/... for RG).
Unknown property access is forwarded to payload, so existing verify_window!(window, contract, win_idx) methods can accept either a raw NamedTuple or a ReadyWindow.
AtmosTransport.Preprocessing.ReducedGaussianContract Type
ReducedGaussianContract{FT} <: AbstractWindowContract{ReducedGaussianTargetGeometry, FT}Typed nominal owning an RG preprocessor's per-window gate policy and worst-window positivity accumulator. Holds the face connectivity (face_left / face_right) so the per-window call site doesn't need to thread it through every call.
Construction validates replay_tol, positivity_cfl_limit ∈ (0, 1], steps_per_window ≥ 1, boundary_stub_tol ≥ 0, and length(face_left) == length(face_right).
boundary_stub_tol (default 0.0) is the absolute tolerance for the boundary-stub flux gate (verify_boundary_stub_flux_rg). Default is strict; tighten only with caution.
AtmosTransport.Preprocessing.SurfaceField Type
2D surface field (e.g. ps, pblh). No vertical reduction — identity passthrough.
AtmosTransport.Preprocessing.TM5PreprocessingWorkspace Type
TM5PreprocessingWorkspace{FT, R, B}Per-day scratch for the TM5 convection preprocessor. Holds the per-column vectors reused across all columns of an hour, plus the 4×(Nlon_src, Nlat_src, Nz_native) native-vertical buffers and the 4×(Nlon_src, Nlat_src, Nz) merged-vertical buffers on the ERA5 source grid. regridder is the conservative regridder from the ERA5 LL source mesh to the target mesh, or nothing when source and target are the same LL grid (identity fast-path).
physics_bufs::B holds optional FT-typed conversion buffers for the six physics inputs (udmf, ddmf, udrf, ddrf, t, q). When the workspace FT matches the physics BIN eltype (Float32 today), B === Nothing and the kernel reads the BIN views directly with zero copy. When FT differs (e.g. the F64 path), B === NTuple{6, Array{FT, 3}} and compute_tm5_merged_hour_on_source! upcasts hour views into them in-place — alloc once per workspace, reused across all 24 hours per day.
AtmosTransport.Preprocessing.TracerMassField Type
Center-level extensive tracer mass (e.g. qv-mass). Sum native layers within each merged group.
AtmosTransport.Preprocessing.UnifiedPreprocessorDay Type
UnifiedPreprocessorDay(reader, workspace, contract, writer; context=nothing)Bundle the four typed axes a unified preprocessing day needs. context is an opaque topology/source adapter payload used by hook methods during migration.
AtmosTransport.Preprocessing.VerticalPlan Type
VerticalPlan{FT, T <: AbstractVerticalTransform}Result of plan_vertical. Type-parameterized by the originating transform so the apply_vertical! dispatch is statically resolved.
Fields:
transform: the originatingAbstractVerticalTransformvalue.native_vc: the input native hybrid coordinate.merged_vc: the output hybrid coordinate.merge_map:Vector{Int}such thatmerge_map[k_native]is the output-level index for merge-map flavors.Int[]forPressureOverlap(uses overlap coefficients instead).groups:Vector{UnitRange{Int}}of native center-level ranges that map to each output level (derived frommerge_mapfor the merge-map flavors).Nz_output: the output level count.Nz_native: the input level count (cached for cheap access).
AtmosTransport.Preprocessing.TM5CleanupStats Method
TM5CleanupStats() -> NamedTuple of Ref{Int} countersDiagnostic counters bumped by ec2tm_from_rates!. Nothing-overhead when the function runs without stats (pass nothing); when stats are passed, each counter increments once per level/column that hit the corresponding cleanup branch.
columns_processed— total columns the function was called on.no_updraft— columns with no level satisfyingudmf > 0(after small-value clipping). entu/detu zeroed out.no_downdraft— columns with no level satisfyingddmf < 0.levels_udmf_clipped,levels_ddmf_clipped— levels where half-level mass flux magnitude was below 1e-6 kg/m²/s and got zeroed.levels_udrf_clipped,levels_ddrf_clipped— full-level detrainment rates zeroed (|rate| < 1e-10 kg/m³/s).levels_entu_neg,levels_detu_neg,levels_entd_neg,levels_detd_neg— levels where the indicated output went negative and got fixed via symmetric redistribution with its complementary rate.
Interpretation
On clean data we expect ~0% clipped levels and ~0 no-updraft columns (outside pure stratospheric columns). O(1%) redistribution firings are normal TM5 behaviour. O(50%) firings indicate a data pathology (wrong param IDs, wrong stream, bad units).
sourceAtmosTransport.Preprocessing.allocate_raw_window Function
allocate_raw_window(settings::AbstractMetSettings;
FT::Type, Nc=nothing, Nz=nothing) -> RawWindowAllocate a pre-zeroed RawWindow sized for one window of settings's source data. The orchestrator calls this once before the per-window loop and reuses the same buffer across all windows in a day. The shape parameters (Nc, Nz, …) are source-specific — concrete subtypes pick the keys they need.
AtmosTransport.Preprocessing.allocate_raw_window Method
allocate_raw_window(settings::GEOSSettings; FT, Nz) -> RawWindowPre-allocate a per-window workspace for the GEOS reader: 6 zero-filled panel arrays each for m, m_next, qv, qv_next, am, bm (shape (Nc, Nc, Nz)) and ps, ps_next (shape (Nc, Nc)).
When settings.include_convection, also allocates cmfmc (interfaces, shape (Nc, Nc, Nz + 1) per panel) and dtrain (centers, shape (Nc, Nc, Nz) per panel). Cross-topology winds (u, v) stay nothing here — they are produced by the orchestrator only when the target grid differs from the source.
AtmosTransport.Preprocessing.allocate_tm5_workspace Method
allocate_tm5_workspace(Nlon_src, Nlat_src, Nz_native, Nz, FT;
regridder=nothing, target_nlon=Nlon_src,
target_nlat=Nlat_src) -> TM5PreprocessingWorkspaceAllocate the TM5 preprocessing workspace. Pass regridder=nothing for identity (source and target grids match). Otherwise pass a ConservativeRegridding.Regridder built via build_regridder(source_mesh, target_mesh) from src/Regridding/.
AtmosTransport.Preprocessing.allocate_window_workspace Function
allocate_window_workspace(args...; kwargs...)Construct the topology-specific AbstractWindowWorkspace{G, FT} for one preprocessing day. Concrete methods land as production drivers migrate.
AtmosTransport.Preprocessing.apply_vertical! Function
apply_vertical!(buf_out, buf_in, plan::VerticalPlan, kind::AbstractFieldKind, args...)Apply the vertical transform to one window of buf_in, writing the result into buf_out. Dispatches on the (plan.transform, kind) combination:
Extensive center fields (
MassField,TracerMassField,MassFluxField,ConvectionTendencyField) sum native layers within each output group.Interface fields (
PressureFluxField,ConvectionInterfaceFlux) select the kept half-level interfaces.IntensiveCenterFieldtakes an additional positionalweightsargument (native mass-per-layer); produces the mass-weighted mean within each output group.SurfaceFieldis a passthrough copy (no vertical reduction).
buf_out and buf_in must be 3D arrays with the vertical axis on dim 3 (or 2D for SurfaceField).
AtmosTransport.Preprocessing.build_target_geometry Method
build_target_geometry(cfg_grid, FT=Float64) -> AbstractTargetGeometryDispatch configuration-driven target-geometry construction from the user-facing grid.type string.
AtmosTransport.Preprocessing.build_target_geometry Method
build_target_geometry(::Val{:cubed_sphere}, cfg_grid, FT)Build a cubed-sphere target geometry from the [grid] config section.
Required keys:
Nc :: Int— cells per panel edge
Optional keys:
definition—"equiangular_gnomonic"(legacy synthetic) or"gmao"(GEOS-IT/GEOS-FP equal-distance gnomonic)panel_conventionorconvention—"gnomonic"(default) or"geos_native". Ifdefinitionis omitted,"geos_native"selects"gmao"and"gnomonic"selects"equiangular_gnomonic".regridder_cache_dir— directory for CR.jl weight cache (default~/.cache/AtmosTransport/cr_regridding)staging_nlon,staging_nlat— override the internal LL staging grid size (defaults:max(4Nc, 360)×max(2Nc+1, 181))
AtmosTransport.Preprocessing.build_target_geometry Method
build_target_geometry(::Val{:era5_native_reduced_gaussian}, cfg_grid, FT)Build the reduced-Gaussian geometry metadata from a native ERA5 GRIB file. This is currently intended for geometry discovery and future native-grid preprocessing work rather than the active lat-lon synthesis path.
sourceAtmosTransport.Preprocessing.build_target_geometry Method
build_target_geometry(::Val{:latlon}, cfg_grid, FT) -> LatLonTargetGeometryBuild the regular lat-lon target geometry used by the current v4 spectral preprocessor.
sourceAtmosTransport.Preprocessing.build_target_geometry Method
build_target_geometry(::Val{:synthetic_reduced_gaussian}, cfg_grid, FT)Build a reduced-Gaussian target geometry from scratch without needing a source GRIB file. cfg_grid["gaussian_number"] controls the truncation N so the mesh has 2N Gauss-Legendre latitude rings. cfg_grid["nlon_mode"] selects the longitude layout:
"regular"(default): every ring has4Ncells — the classical "regular reduced" Gaussian grid used e.g. in the TL159/N80 family."octahedral": ECMWF octahedral layoutnlon = 4k + 16per hemisphere, mirrored between the two hemispheres withk = 1at the pole-adjacent ring.
Mesh latitudes are produced from FastGaussQuadrature.gausslegendre(2N), which returns ascending Gauss-Legendre nodes on (-1, 1). geometry_source_grib is left as an empty string to mark the grid as synthetic.
AtmosTransport.Preprocessing.close_day! Function
close_day!(ctx)Close all resources held by a day context. Must be idempotent; safe to call from a finally block.
AtmosTransport.Preprocessing.close_era5_physics_binary Method
close_era5_physics_binary(reader) -> nothingRelease the mmap and close the underlying IOStream.
AtmosTransport.Preprocessing.close_reader! Function
close_reader!(reader::AbstractMetReader) → nothingClose all per-day file handles and release scratch held by the reader. Idempotent — safe to call from a finally block. Concrete readers implement.
AtmosTransport.Preprocessing.close_streaming_binary! Function
close_streaming_binary!(writer)Close a streaming binary writer. Implementations should be idempotent.
sourceAtmosTransport.Preprocessing.compute_tm5_merged_hour_on_source! Method
compute_tm5_merged_hour_on_source!(ws, reader, hour, ps_hour, ak_full, bk_full,
Nz_native, merge_map; stats)Phase 1 of the per-hour TM5 step. Reads native-L137 ERA5 physics fields for hour from the mmap-backed reader, runs the TM5 math column-by-column into ws.*_native, then collapses to merged Nz into ws.*_merged_src. All work is on the ERA5-native horizontal grid (720×361 for standard physics BINs).
This path requires source and target grids to match (Nx, Ny) == (720, 361). PS comes from the caller (typically the preprocessor's transform.sp after spectral synthesis) because the physics BIN does not carry PS.
stats::Union{Nothing, NamedTuple} is the TM5CleanupStats bundle; when non-nothing, counters accumulate across all columns of all hours processed.
Shape guards:
readerfields shape(Nlon_src, Nlat_src, Nz_native, Nt)— must matchws.entu_native's leading dims.ps_hourshape(Nlon_src, Nlat_src).merge_maplengthNz_native.
AtmosTransport.Preprocessing.contract_cfl_limit Function
contract_cfl_limit(contract::AbstractWindowContract) -> Float64Per-substep positivity CFL gate the contract enforces.
sourceAtmosTransport.Preprocessing.contract_replay_tolerance Function
contract_replay_tolerance(contract::AbstractWindowContract) -> Float64Relative replay tolerance the contract's verify_window! uses. Concrete contracts return their stored policy value.
AtmosTransport.Preprocessing.contract_require_positivity Function
contract_require_positivity(contract::AbstractWindowContract) -> BoolWhether summarize_status! errors (true) or warns (false) on a positivity violation. Closes-the-escape-hatch toggle from CS round-2.
AtmosTransport.Preprocessing.convert_era5_physics_nc_to_bin Method
convert_era5_physics_nc_to_bin(nc_dir, bin_dir, date;
force_rewrite=false, verbose=true) -> bin_pathBuild one calendar-day ERA5 physics BIN from the archive NCs.
nc_dir: directory containingera5_convection_YYYYMMDD.nc+era5_thermo_ml_YYYYMMDD.nc.bin_dir: output staging directory; BIN lands at<bin_dir>/<YYYY>/era5_physics_<YYYYMMDD>.bin.date: calendar day (00:00-23:00) the BIN represents.force_rewrite: whenfalseand the BIN already exists with a valid header, skip (idempotent). Whentrue, overwrite.verbose: log progress (defaulttrue).
Requires the convection NC for both date - 1 day and date (because convection is forecast-based and a calendar-day BIN needs hours 00-06 from the previous day's file and hours 07-23 from the current day's file). Raises if either is missing.
The thermo NC is calendar-day aligned so only the target-date file is needed.
Writes the BIN atomically: first to <name>.bin.tmp, then renames. This is safe to run concurrently with a reader: the rename step is an atomic fs operation on a single filesystem.
AtmosTransport.Preprocessing.detect_level_orientation Method
detect_level_orientation(ctm_a1::NCDataset) -> SymbolReturn :bottom_up if k=1 is the surface (mass-thicker) and :top_down if k=1 is TOA. Heuristic is unambiguous: surface DELP is O(1000 Pa), TOA DELP is O(1 Pa).
AtmosTransport.Preprocessing.drain_ready_windows! Function
drain_ready_windows!(workspace) -> iterator of `ReadyWindow`Return the windows that became write-ready after the last ingest.
sourceAtmosTransport.Preprocessing.driver_after_write_window! Method
driver_after_write_window!(workspace, reader, ready, context)Post-write migration hook. GEOS-native CS uses this to advance its chained endpoint state; most topologies do nothing.
sourceAtmosTransport.Preprocessing.driver_drain_ready_windows! Method
driver_drain_ready_windows!(workspace, contract, win, context)Return ready windows produced by the most recent ingest. A hook may return a single ReadyWindow, a single PreverifiedWindow, or any iterator of either shape.
AtmosTransport.Preprocessing.driver_flush_final_windows! Method
driver_flush_final_windows!(workspace, reader, contract, context)Return final ready windows after all source windows have been ingested.
sourceAtmosTransport.Preprocessing.driver_ingest_window! Method
driver_ingest_window!(workspace, reader, win, context)Migration hook for ingesting one source window into the target workspace. Topology/source adapters may override while legacy workspace signatures are being collapsed.
sourceAtmosTransport.Preprocessing.driver_windows_per_day Method
driver_windows_per_day(reader, context) -> IntMigration hook for source readers whose window count needs adapter context.
sourceAtmosTransport.Preprocessing.dz_hydrostatic_constT! Method
dz_hydrostatic_constT!(dz, ps, ak, bk, Nz; T_ref=260) -> dzConstant-temperature hydrostatic layer thickness — fallback for use when T and Q are unavailable. Matches main's Julia port's shortcut (T_ref = 260 K). Biases entu/detu magnitudes by ~10-20% vs the virtual-temperature version; dz_hydrostatic_virtual! is preferred when T + Q are downloaded together with the convection fields.
AtmosTransport.Preprocessing.dz_hydrostatic_virtual! Method
dz_hydrostatic_virtual!(dz, T_col, Q_col, ps, ak, bk, Nz) -> dzCompute layer thickness dz[1:Nz] (m) at layer centers from a single column's temperature T_col[1:Nz] (K) and specific humidity Q_col[1:Nz] (kg/kg), plus surface pressure ps (Pa) and the hybrid-sigma coefficients ak, bk (length Nz+1).
Uses the hydrostatic approximation with virtual temperature:
p_top[k] = ak[k] + bk[k] * ps (Pa, higher-altitude side)
p_bot[k] = ak[k+1] + bk[k+1] * ps (Pa, lower-altitude side)
dp[k] = p_bot[k] - p_top[k] (> 0 in AtmosTransport orientation)
p_mid[k] = 0.5 · (p_top[k] + p_bot[k])
T_v[k] = T_col[k] · (1 + 0.608 · Q_col[k])
dz[k] = R · T_v[k] / g · dp[k] / p_mid[k]Orientation: AtmosTransport (k=1=TOA, k=Nz=surface). ak/bk are the full (Nz+1)-length ERA5 L137 hybrid coefficients.
For the TOA half-level we fall back to an ordinary scale-height estimate (T_v=T_top, p_mid=p_bot) when p_top → 0. In practice the top-level dz is never in the convection window so the approximation is irrelevant; it's a guard against divide-by-zero.
AtmosTransport.Preprocessing.ec2tm! Method
ec2tm!(entu, detu, entd, detd,
mflu_ec, mfld_ec, detu_ec, detd_ec) -> (entu, detu, entd, detd)Convert ECMWF convective mass-flux fields into TM5 (entu, detu, entd, detd) layer-center fields. All inputs / outputs in AtmosTransport orientation (k=1=TOA, k=Nz=surface). Operates in place on the four output arrays (no allocations).
Shapes:
entu, detu, entd, detd:(..., Nz)— layer centers.mflu_ec, mfld_ec:(..., Nz+1)— half levels (interfaces). Interfacekis the TOP of layerk; interfaceNz+1is the surface boundary.detu_ec, detd_ec:(..., Nz)— layer centers.
The leading dimensions are arbitrary (scalar, (Nx, Ny), (ncells,), or per-panel (Nc, Nc)) as long as the arrays all have consistent shape. This function is backend-agnostic pure Julia; call from the preprocessor (CPU) ahead of binary writeout.
Negative small values (<= 0) in detu_ec / detd_ec are clipped to zero, following the ECMWF-rounding-artifact clean-up documented in phys_convec_ec2tm.F90 (ECMWF diagnostics can produce ~-1e-19 values from rounding).
For every location, computes:
k = 1: entu[1] = detu[1] (updraft starts in layer 1)
entd[1] = detd[1] (no flux at TOA)
k ∈ 2:Nz: entu[k] = detu[k] + mflu_ec[k+1] - mflu_ec[k]
entd[k] = detd[k] - mfld_ec[k] + mfld_ec[k-1]
(where mfld_ec sign-flipped internally)Wait — above formula is written in TM5 convention (k=1=surface). Let me re-write in AtmosTransport orientation explicitly.
AtmosTransport orientation (k=1=TOA):
Layer k has interface
kat its TOP (higher altitude side) and interfacek+1at its BOTTOM (lower altitude side).Updraft flows upward: from interface
k+1into layerkvia entrainmententu[k], out of layerkvia interfacek. Continuity:mflu_out − mflu_in + detu − entu = 0⟹entu[k] = detu[k] + mflu_ec[k] − mflu_ec[k+1]wheremflu_ec[k]is the flux through interface k (above layer k) andmflu_ec[k+1]is the flux through interface k+1 (below layer k). Convention:mflu_ec >= 0(positive upward).Downdraft flows downward: from interface
kinto layerkvia entrainmententd[k], out through interfacek+1via detrainment. Sign convention in ECMWF:mfdo <= 0(negative). We definemfdo_abs = -mfdo_ec >= 0, thenentd[k] = detd[k] + mfdo_abs[k+1] − mfdo_abs[k].
Boundary conditions:
mflu_ec[1] = 0(no updraft above TOA).mflu_ec[Nz+1] = 0(no updraft into ground from below — the surface is the sink).mfld_ec[1] = 0(no downdraft above TOA).mfld_ec[Nz+1] = 0(no downdraft escapes the surface).
These boundaries are typically already zero in ECMWF output; we enforce explicitly by construction below.
sourceAtmosTransport.Preprocessing.ec2tm_from_rates! Method
ec2tm_from_rates!(entu, detu, entd, detd,
udmf, ddmf, udrf_rate, ddrf_rate,
dz, Nz; stats=nothing) -> nothingColumn-level port of TM5's ECconv_to_TMconv (see deps/tm5/base/src/phys_convec_ec2tm.F90:87-237). Fills the output arrays entu, detu, entd, detd (kg/m²/s at layer centers) in place from raw ERA5 physics inputs. All arrays use AtmosTransport orientation (k=1=TOA, k=Nz=surface).
Inputs
udmf::AbstractVector{FT}, lengthNz+1— updraft mass flux at half levels (kg/m²/s). Half levelkis the interface at the TOP of layerk;udmf[Nz+1]is the surface interface. Must be ≥ 0.ddmf::AbstractVector{FT}, lengthNz+1— downdraft mass flux at half levels, ECMWF convention (≤ 0).udrf_rate::AbstractVector{FT}, lengthNz— updraft detrainment RATE at layer centers (kg/m³/s).ddrf_rate::AbstractVector{FT}, lengthNz— downdraft detrainment RATE at layer centers (kg/m³/s).dz::AbstractVector{FT}, lengthNz— layer thickness (m), positive. Compute fromdz_hydrostatic_virtual!.Nz::Int— number of full levels.stats::Union{Nothing, NamedTuple}— cleanup-stats counters fromTM5CleanupStats(). Whennothing, no counter work.
Algorithm (line-for-line match to F90 lines 132-236)
Copy-with-clipping (F90 L132-144).
|udmf| < 1e-6 → 0,|ddmf| < 1e-6 → 0(applied asddmf > -1e-6 → 0),udrf_rate < 1e-10 → 0,ddrf_rate < 1e-10 → 0.dz integration (F90 L146-151):
detu = udrf × dz(kg/m³/s → kg/m²/s). Same for detd.uptop/dotop search (F90 L153-173): find the first level from TOA (
k=1..Nz) with nonzero flux. If none, zero out everything in that direction.Mass-budget closure (F90 L175-212): for each active direction,
entu[k] = udmf[k] - udmf[k+1] + detu[k]fromuptopdown. Aboveuptop-1stays zero.Symmetric negative redistribution (F90 L214-232): if
entu[k] < 0, add-entu[k]todetu[k]and zeroentu[k]. Same fordetu<0(adds to entu), and the same two for downdraft.
Mass conservation
Within the cloud window, the sum entu[k] - detu[k] + (udmf[k] - udmf[k+1]) should be zero per layer — equivalent to the mass-budget closure at step 4. Negative redistribution (step 5) preserves this sum because it only SWAPS between entu↔detu (or entd↔detd) without changing the net entu - detu.
AtmosTransport.Preprocessing.end_of_day_seed Function
end_of_day_seed(reader::AbstractMetReader) → seed_or_nothingReturn the seed value to thread into the next day's open_reader(..., seed = ...). Type-system guarantees:
end_of_day_seed(::AbstractMetReader{FT, S, NoChain})returnsnothing(statically inferred).end_of_day_seed(::AbstractMetReader{FT, S, ChainedMass{T}})returns a value of typeT(or throws if the reader has not yet produced the end-of-day endpoint).
Closes foot-gun (D).
sourceAtmosTransport.Preprocessing.endpoint_dry_mass! Method
endpoint_dry_mass!(delp_dry, ps_dry, ps_total, qv, vc) -> (delp_dry, ps_dry)Reconstruct dry DELP and dry PS at one endpoint hour from the moist PS and QV provided by GEOS, using the hybrid coordinate vc. The output is on top-down level convention (k=1 = TOA).
Algorithm:
DELP_full[k] = ΔA[k] + ΔB[k] * PS_total
DELP_dry[k] = (1 - QV[k]) * DELP_full[k]
PS_dry = Σ DELP_dry[k]In-place: writes into delp_dry and ps_dry.
AtmosTransport.Preprocessing.endpoint_dry_mass Method
Allocate (delp_dry, ps_dry) for one panel and run endpoint_dry_mass!.
AtmosTransport.Preprocessing.flush_final_windows! Function
flush_final_windows!(workspace, args...; kwargs...)Emit any final cross-day/zero-tendency windows once the source stream is exhausted.
sourceAtmosTransport.Preprocessing.geos_collection_path Method
geos_collection_path(settings::GEOSITSettings, date::Date, collection) -> StringResolve the on-disk path of one GEOS-IT collection for date. Searches a flat root_dir and the per-day root_dir/YYYYMMDD/ layout.
AtmosTransport.Preprocessing.get_era5_physics_field Method
get_era5_physics_field(reader, var::Symbol) -> Array viewReturn a zero-allocation reshaped view into the mmap for var ∈ (:udmf, :ddmf, :udrf_rate, :ddrf_rate, :t, :q). Shape (Nlon, Nlat, Nlev, Nt).
AtmosTransport.Preprocessing.has_convection Method
has_convection(settings::AbstractMetSettings) -> BoolWhether this source can populate RawWindow.cmfmc and RawWindow.dtrain. Defaults to false; sources that support convection override and gate the actual output behind a user flag (e.g. settings.include_convection).
AtmosTransport.Preprocessing.ingest_window! Function
ingest_window!(workspace, args...; kwargs...) -> nothingConsume one source/met window into the topology workspace. Ready windows are exposed by drain_ready_windows!.
AtmosTransport.Preprocessing.init_cs_positivity_accumulator Method
init_cs_positivity_accumulator() -> NamedTupleZero-valued state for accumulating the worst per-window positivity diagnostic across a preprocessing loop. Pair with update_cs_positivity_accumulator!.
AtmosTransport.Preprocessing.init_ll_positivity_accumulator Method
init_ll_positivity_accumulator() -> NamedTupleZero-valued state for accumulating the worst per-window LL positivity diagnostic across a preprocessing loop. Pair with update_ll_positivity_accumulator.
AtmosTransport.Preprocessing.init_rg_positivity_accumulator Method
init_rg_positivity_accumulator() -> NamedTupleZero-valued state for accumulating the worst per-window RG positivity diagnostic across a preprocessing loop.
sourceAtmosTransport.Preprocessing.load_hybrid_coefficients Method
load_hybrid_coefficients(coeff_path::String) -> HybridSigmaPressureLoad all hybrid sigma-pressure interface coefficients from a TOML file. Unlike load_era5_vertical_coordinate, this does not slice — useful for sources whose level count comes from the file rather than a config knob (e.g. GEOS-72, MERRA-2, native ERA5 L137 without sub-tropo selection).
AtmosTransport.Preprocessing.load_met_settings Method
load_met_settings(toml_path::String; root_dir, kwargs...) -> AbstractMetSettingsConstruct a typed met-source descriptor from toml_path. The TOML's [source].name key picks the concrete settings type. The [preprocessing] table (if present) supplies defaults for level_orientation, mass_flux_dt_seconds, include_surface, include_convection, and include_vdiff_fields; explicit keyword arguments override.
root_dir is the on-disk directory holding the source's daily NetCDF files (e.g. ~/data/AtmosTransport/met/geosit/C180/raw_catrine).
AtmosTransport.Preprocessing.log_tm5_cleanup_stats Method
log_tm5_cleanup_stats(stats, date_str)Pretty-print the per-day TM5 cleanup counters produced by TM5CleanupStats(). Zero-valued counters are omitted to keep the output compact; a fully-clean day yields a single line.
AtmosTransport.Preprocessing.mass_basis_from_symbol Method
mass_basis_from_symbol(s::Symbol) -> AbstractMassBasisConstruct the matching basis singleton from a header Symbol. Throws ArgumentError for unknown values.
AtmosTransport.Preprocessing.mass_basis_symbol Method
mass_basis_symbol(::AbstractMassBasis) -> SymbolMap a basis singleton (State.DryBasis / State.MoistBasis) to the on-disk binary-header value (:dry / :moist). Inverse of mass_basis_from_symbol.
AtmosTransport.Preprocessing.merge_tm5_field_3d! Method
merge_tm5_field_3d!(merged, native, merge_map)Accumulate a native-level TM5 field (Nlon, Nlat, Nz_native) onto merged output levels (Nlon, Nlat, Nz_merged) using the native-to- merged level map. Matches the semantics of merge_cell_field! for mass/flux fields: sum native layers that map to the same merged layer.
For TM5 entrainment/detrainment fluxes (kg/m²/s), summing over a consolidated layer preserves the column-integrated mass budget exactly.
sourceAtmosTransport.Preprocessing.native_vertical Function
native_vertical(reader::AbstractMetReader) → HybridSigmaPressure{FT_v}Native vertical coordinate of the source data (NOT the merged/output coordinate). The vertical-transform axis (P0b) consumes this to plan its mapping.
sourceAtmosTransport.Preprocessing.open_day Function
open_day(settings::AbstractMetSettings, date::Date; ...) -> ctxOpen per-day source-specific context (file handles, caches, vertical coefficients, …). The orchestrator calls this once per day at the start of process_day, threads ctx through every read_window! call, and calls close_day!(ctx) in a finally block.
ctx is opaque to the orchestrator — only the source knows its layout.
AtmosTransport.Preprocessing.open_day Method
open_day(settings::GEOSSettings, date::Date; next_day_handle=true) -> GEOSDayHandlesCanonical-contract alias for open_geos_day. The orchestrator calls this once per day and threads the returned handles through every per-window read_window!.
AtmosTransport.Preprocessing.open_era5_physics_binary Method
open_era5_physics_binary(bin_dir, date) -> ERA5PhysicsBinaryReaderOpen the BIN for date under bin_dir (with YYYY subdir), mmap the payload, parse the header. Caller must close_era5_physics_binary when done (or wrap in a try/finally).
AtmosTransport.Preprocessing.open_geos_day Method
open_geos_day(settings, date; next_day_handle=true) -> GEOSDayHandlesOpen per-collection NCDataset handles for date. When next_day_handle is true and the next-day CTM_I1 file exists, opens it too so the last window of date has its right endpoint available.
AtmosTransport.Preprocessing.open_reader Function
open_reader(settings::AbstractMetSettings, date::Date, ::Type{FT};
seed = nothing, next_day_handle::Bool = true)
→ reader::AbstractMetReader{FT, typeof(settings), CP}Construct a typed met-reader for one calendar day. The reader owns the underlying day-handle context (file handles, vertical coefficients, chained-mass seed) and exposes the per-window trait surface.
seed carries cross-day mass state for chained-mass readers; pass the return value of end_of_day_seed(prev_day_reader). For NoChain readers, seed must be nothing.
next_day_handle controls whether the reader opens a handle into the next day's first-hour instantaneous file. Required when the day produces endpoint mass for the last window (the standard case for hourly sources).
Dispatch by settings type to select the concrete reader; concrete readers register their own method.
AtmosTransport.Preprocessing.open_reader Method
open_reader(settings::ERA5SpectralSettings, date::Date, ::Type{FT};
seed = nothing, next_day_handle::Bool = true)P0a: opens the spectral day's GRIB and caches the typed nominal. seed/next_day_handle accepted for signature parity with the GEOS constructor; the spectral path's "next-day endpoint" is handled by the existing next_day_hour0 helper in configuration.jl:349 until P2.
AtmosTransport.Preprocessing.plan_vertical Function
plan_vertical(transform::AbstractVerticalTransform,
native_vc::HybridSigmaPressure{FT}) → VerticalPlan{FT, typeof(transform)}Materialize the planned vertical-coordinate mapping for one day's run. Called once per day (or once per run if native_vc is invariant); the returned plan is reused across all windows.
AtmosTransport.Preprocessing.process_day Method
process_day(date::Date, grid::AbstractTargetGeometry, settings, vertical; next_day_hour0=nothing)Topology-specific daily transport-binary preprocessor extension point.
Concrete target geometries implement this method with ordinary Julia multiple dispatch:
LatLonTargetGeometrywrites structured directional LL binaries.ReducedGaussianTargetGeometrywrites face-indexed RG binaries.CubedSphereTargetGeometrywrites panel-local CS binaries.
Every implementation must satisfy the same transport contract:
use explicit forward endpoint mass targets for every window, including the final cross-day window when
next_day_hour0is available;write declared payload semantics, including
delta_semantics;run a write-time replay check unless explicitly disabled for diagnostics;
produce binaries that the runtime driver can load with replay validation.
This fallback rejects unsupported source/target pairs after config parsing has already produced an AbstractTargetGeometry.
AtmosTransport.Preprocessing.process_day Method
process_day(date, grid::CubedSphereTargetGeometry,
settings::AbstractGEOSSettings, vertical;
out_path,
dt_met_seconds = 3600.0,
FT = Float64,
mass_basis = :dry,
replay_tol = replay_tolerance(FT),
seed_m = nothing,
next_day_hour0 = nothing,
chain_mass = true) -> NamedTupleBuild a v4 cubed-sphere transport binary at out_path from one UTC day of native GEOS data. Source mesh and target mesh must match (CS passthrough).
Stored mass targets the raw GEOS dry endpoint (DELP_dry) transformed to the output vertical grid. The native horizontal fluxes are column-balanced to that endpoint, then cm is diagnosed so the replay and positivity contracts are checked against the same endpoint the runtime will see.
For multi-day preprocessing with chain_mass = true, seed_m carries the raw endpoint from the previous day so adjacent daily binaries share a boundary mass: pass nothing (default) on day 1 to seed from raw GEOS DELP_dry, and on day N+1 pass the final_mreturned by day N'sprocess_day. Withchain_mass = false,seed_m is ignored and every window reinitializes from raw GEOS mass.
When chain_mass = true, the returned NamedTuple includes final_m::NTuple{6, Array{FT, 3}}, the raw-endpoint state at the END of the last window. With chain_mass = false, final_m is nothing.
next_day_hour0 is part of the inherited topology-dispatch contract but unused — the GEOS reader handles next-day endpoints internally via next_ctm_i1.
AtmosTransport.Preprocessing.process_day Method
process_day(date, grid::CubedSphereTargetGeometry, settings, vertical; ...)Spectral→CS transport binary: spectral synthesis to an internal LL staging grid, conservative regridding to CS panels, endpoint continuity closure, and streaming binary write. No on-disk LL intermediate.
sourceAtmosTransport.Preprocessing.process_day Method
process_day(date, grid::LatLonTargetGeometry, settings, vertical;
next_day_hour0=nothing, positivity_cfl_limit=0.95,
require_substep_positivity=true)Run the full one-day preprocessing workflow for the structured lat-lon target: read spectral input, process all windows, close continuity against forward mass endpoints, and write the final binary.
sourceAtmosTransport.Preprocessing.process_day Method
process_day(date, grid::ReducedGaussianTargetGeometry, settings, vertical;
next_day_hour0=nothing, positivity_cfl_limit=0.95,
require_substep_positivity=true)Streaming one-day preprocessing for reduced-Gaussian targets.
Uses a 2-window sliding buffer: at any time only two windows' worth of (m, hflux, cm, ps) are held in memory. Each window is Poisson-balanced and written to disk before the next pair is computed. This reduces peak memory from O(Nt) to O(1) and enables O160/O320 binary generation.
Pipeline per window: spectral synthesis → mass fix → level merge → (wait for next window) → Poisson balance using (m_cur, m_next) → cm recomputation → stream-write
sourceAtmosTransport.Preprocessing.process_day Method
process_day(cfg::Dict; day_override=nothing, start_date=nothing, end_date=nothing)Top-level TOML-driven preprocessor entry. Detects source type from cfg:
[source].toml = "config/met_sources/<source>.toml"→ typedAbstractMetSettingspath, supports cross-day state carry (e.g. GEOS pressure-fixer chained mass) and--start/--enddate ranges.otherwise → typed ERA5 spectral config path (
[input].spectral_dir).
Both paths converge on process_day(date, grid::AbstractTargetGeometry, settings, vertical; ...) for the per-day work. There is no parallel GEOS-only or per-source CLI — new sources plug in via AbstractMetSettings + load_met_settings.
AtmosTransport.Preprocessing.promote_streaming_binary! Function
promote_streaming_binary!(writer)Close and promote a staged binary file into its final path.
sourceAtmosTransport.Preprocessing.quarantine_streaming_binary! Function
quarantine_streaming_binary!(writer)Close and remove a staged binary file after a failed validation pass.
sourceAtmosTransport.Preprocessing.read_window! Function
read_window!(raw::RawWindow, settings::AbstractMetSettings, ctx,
date::Date, win_idx::Int) -> rawFill raw in place with one window of source data on the source's native grid. ctx is the day context returned by open_day. Implementations must NOT allocate per call beyond bounded scratch — raw is a pre-allocated workspace owned by the orchestrator — and should be idempotent in (date, win_idx).
AtmosTransport.Preprocessing.read_window! Method
read_window!(raw, settings, handles, date, win_idx) -> raw
Fill `raw` in place with one window of GEOS data on the source CS grid.
Both endpoints (t_n, t_{n+1}) carry dry DELP + dry PS reconstructed from
PS_total via the hybrid coordinate, plus the original QV. Dynamics-step
`am`/`bm` are MFXC/MFYC scaled by `1/mass_flux_dt`.The signature matches the canonical AbstractMetSettings contract declared in met_sources.jl::read_window!.
AtmosTransport.Preprocessing.regrid_ll_binary_to_cs Method
regrid_ll_binary_to_cs(ll_binary_path, cs_grid, out_path; FT=Float64, mass_basis=nothing)Regrid an existing LL transport binary to a cubed-sphere binary.
Reads each window from the LL binary, recovers cell-center winds from am/bm, conservatively regrids m/ps/winds to CS panels, rotates winds to panel-local coordinates, reconstructs CS face fluxes, closes continuity against forward mass endpoints, and stream-writes the CS binary.
This reuses the entire CS regrid/continuity/write infrastructure from the spectral→CS path — the only difference is the data source (binary reader instead of spectral synthesis).
Timestep metadata (dt_met_seconds, steps_per_window) is read directly from the source header by default. Passing steps_per_window overrides the output substep count: LL winds are recovered with the source scaling, then CS face fluxes are reconstructed with the output scaling.
Keyword arguments
FT::Type = Float64— on-disk float type for the output CS binary.mass_basis::Union{Nothing, Symbol} = nothing— output mass-basis label.nothing(default) = match the source. Setting this to a value that differs from the source'smass_basiscurrently errors: actual dry↔moist conversion requires loading the source'sqvand applyingapply_dry_basis_native!, which this function does not do. Invariant 14 mandates:dryend-to-end; use a dry source.steps_per_window::Union{Nothing, Integer} = nothing— output substep count.nothing= match the source; larger values reduce stored per-substep fluxes while preserving the window-integrated transport.allow_terminal_zero_tendency::Bool = false— diagnostic-only escape hatch for legacy LL sources that do not carrydm. Production-safe regrids should leave this atfalseso the final CS window is closed against an explicit endpoint target instead of an inferred zero-tendency fallback.run_cache = nothing— optionalPreprocessorRunCacheused to reuse the LL→CS conservative regridder across calls in the same preprocessing run.
AtmosTransport.Preprocessing.regrid_transport_binary Method
regrid_transport_binary(input_path, target_grid, out_path; kwargs...)Generic transport-binary regrid extension point. Concrete target/source combinations implement methods here instead of adding new topology-specific scripts. The currently implemented production pair is LL transport binary to cubed sphere.
sourceAtmosTransport.Preprocessing.reset_workspace! Function
reset_workspace!(workspace, day_state) -> workspaceReset a reusable workspace before ingesting a new day/source stream.
sourceAtmosTransport.Preprocessing.resolve_tm5_convection_settings Method
resolve_tm5_convection_settings(cfg) -> NamedTupleParse the optional [tm5_convection] section. When enable=true the preprocessor reads ERA5 physics binaries (built by convert_era5_physics_nc_to_bin, plan 24 Commit 2), computes TM5 entu/detu/entd/detd per hour via tm5_native_fields_for_hour! (plan 24 Commit 3), merges to the transport Nz, conservatively regrids to the target horizontal grid, and writes the four TM5 sections into the transport binary.
Fields:
tm5_convection_enable :: Bool— master switch.tm5_physics_bin_dir :: String— NVMe directory holdingera5_physics_YYYYMMDD.binfiles produced by Commit 2's converter. Empty when disabled.
AtmosTransport.Preprocessing.run_unified_preprocessor_day! Method
run_unified_preprocessor_day!(day::UnifiedPreprocessorDay; close_reader=true)Execute the additive unified-driver lifecycle for one day. The function is generic over concrete reader/workspace/contract/writer types and depends on hook methods for topology-specific ingest/drain/flush behavior.
The writer is closed before summarize_status!, so fatal positivity summaries can quarantine a closed staging file. Any exception before promotion closes and quarantines the writer, then closes the reader.
AtmosTransport.Preprocessing.set_end_of_day_seed! Method
set_end_of_day_seed!(reader::GEOSNativeReader, seed) → readerSet the end-of-day mass seed produced by the orchestrator's last window. Called once per day at the end of process_day. For NoChain readers this is a no-op.
AtmosTransport.Preprocessing.source_grid Function
source_grid(settings::AbstractMetSettings)Return the source grid mesh that read_window! produces data on. Used by the orchestrator to build the appropriate regridder for the target grid (or to detect the source==target passthrough case).
AtmosTransport.Preprocessing.source_grid Method
source_grid(settings::GEOSSettings) -> CubedSphereMeshThe native source mesh GEOS data is archived on (Nc × Nc per panel, GEOS-native panel convention).
AtmosTransport.Preprocessing.summarize_cs_positivity_status Method
summarize_cs_positivity_status(worst; cfl_limit, steps_per_window,
require_substep_positivity = true,
quarantine_path = nothing)Post-loop summary helper. Logs the worst-window outcome, and if it exceeds cfl_limit:
deletes
quarantine_path(if given) so a downstream consumer cannot pick up the half-written binary;errors when
require_substep_positivity = true, otherwise warns.
The error message includes a recommended steps_per_window value that would satisfy the gate, computed from the observed worst ratio.
AtmosTransport.Preprocessing.summarize_ll_positivity_status Method
summarize_ll_positivity_status(worst; cfl_limit, steps_per_window,
require_substep_positivity = true,
quarantine_path = nothing)Post-loop summary helper for the LL positivity accumulator. Logs the worst-window outcome, and if it exceeds cfl_limit:
deletes
quarantine_path(if given) so a downstream consumer cannot pick up the half-written binary;errors when
require_substep_positivity = true, otherwise warns.
The error/warn message includes a recommended steps_per_window value that would satisfy the gate, computed from the observed worst ratio. The "no representable rescue" branch from CS round-3 is mirrored here.
AtmosTransport.Preprocessing.summarize_rg_positivity_status Method
summarize_rg_positivity_status(worst; cfl_limit, steps_per_window,
require_substep_positivity = true,
quarantine_path = nothing)Post-loop summary helper for the RG positivity accumulator. Mirrors summarize_cs_positivity_status (CS round-2 + round-3 escape-hatch semantics).
AtmosTransport.Preprocessing.summarize_status! Function
summarize_status!(contract::AbstractWindowContract;
quarantine_path::Union{Nothing, AbstractString} = nothing)
-> nothingPost-loop summary helper. Logs the worst-window outcome; if the contract's policy demands and the worst exceeds the gate, deletes quarantine_path (if given) and errors. Otherwise warns when the gate is violated but the policy is set to record-and-continue.
AtmosTransport.Preprocessing.summarize_status! Method
summarize_status!(contract::CubedSphereContract;
quarantine_path::Union{Nothing, AbstractString} = nothing)Run the CS positivity post-loop summary using the contract's worst accumulator and policy fields. May log, warn, or error depending on the accumulator state and require_substep_positivity.
AtmosTransport.Preprocessing.target_summary Method
target_summary(grid) -> StringReturn a short human-readable summary of the configured target grid.
sourceAtmosTransport.Preprocessing.tm5_copy_or_regrid_ll! Method
tm5_copy_or_regrid_ll!(dst_3d, ws_field, ws)LL phase-2 helper: copy (identity, when shapes match) or regrid (conservative) a single source-grid merged TM5 field into the per-window target array dst_3d of shape (Nx, Ny, Nz).
AtmosTransport.Preprocessing.tm5_native_fields_for_hour! Method
tm5_native_fields_for_hour!(entu, detu, entd, detd,
udmf_hour, ddmf_hour, udrf_hour, ddrf_hour,
t_hour, q_hour, ps_hour,
ak_full, bk_full, Nz_native;
stats=nothing, scratch=nothing) -> nothingGrid-level entry point: for each column (i, j) in the 2D grid, compute dz from (T, Q, ps) then call ec2tm_from_rates!. Writes results into the 3D (Nlon, Nlat, Nz_native) output arrays.
All input 3D arrays are native-level (137 layers for ERA5); 2D ps_hour is surface pressure in Pa. stats counters are bumped across all columns. scratch is an optional 4-tuple of length-Nz_native vectors to avoid per-column allocation; when nothing, fresh ones are allocated inside.
AtmosTransport.Preprocessing.update_accumulator! Function
update_accumulator!(contract::AbstractWindowContract,
positivity_diag, win_idx::Int) -> nothingFold one window's positivity diagnostic into the contract's worst- window accumulator. Concrete contracts mutate their internal state and return nothing. Idempotent if positivity_diag.ratio does not exceed the current worst.
AtmosTransport.Preprocessing.update_accumulator! Method
update_accumulator!(contract::CubedSphereContract, positivity_diag, win_idx::Int)Fold one window's positivity diagnostic into the CS contract's worst-window accumulator. Mutates contract.worst.
AtmosTransport.Preprocessing.update_cs_positivity_accumulator Method
update_cs_positivity_accumulator(worst, diag, win_idx) -> NamedTupleReturn an updated accumulator from a fresh per-window diagnostic.
sourceAtmosTransport.Preprocessing.update_ll_positivity_accumulator Method
update_ll_positivity_accumulator(worst, diag, win_idx) -> NamedTupleReturn an updated LL accumulator from a fresh per-window diagnostic.
sourceAtmosTransport.Preprocessing.update_rg_positivity_accumulator Method
update_rg_positivity_accumulator(worst, diag, win_idx) -> NamedTupleReturn an updated RG accumulator from a fresh per-window diagnostic.
sourceAtmosTransport.Preprocessing.verify_boundary_stub_flux_rg Method
verify_boundary_stub_flux_rg(hflux, face_left, face_right;
tol = 0.0) -> NamedTupleExplicit-invariant scan: any non-zero hflux value on a boundary stub (face_left ≤ 0 or face_right ≤ 0) is a contract violation. The runtime advection silently discards such fluxes (StrangSplitting.jl:279), so a writer that produces them is emitting data the runtime cannot apply — almost always a sign-flip or boundary- masking bug in preprocessing.
Returns (violated, worst_flux, worst_face, worst_level):
violated :: Bool—trueiff any |flux| > tol on a boundary stub.worst_flux :: Float64— signed value of the worst-magnitude violation, or0.0if none.worst_face :: Int— face index of the worst violation, or0.worst_level :: Int— k-index of the worst violation, or0.
tol is the absolute tolerance below which a "near-zero" stub flux is permitted. Default 0.0 (strict) — RG writers should explicitly zero boundary stubs.
AtmosTransport.Preprocessing.verify_cs_window_contract! Method
verify_cs_window_contract!(m_cur, am, bm, cm, m_next, steps_per_window, win_idx;
replay_tol, positivity_cfl_limit = 0.95, halo_width = 0)Single canonical per-window CS binary contract check. Runs the replay gate (verify_write_replay_cs!, errors on failure) followed by the per-substep positivity scan (verify_substep_positivity_cs, returns a diagnostic). Every CS-producing preprocessor (spectral, regrid, GEOS-native) should call this so no path can silently skip a gate.
Returns (; replay, positivity) with both diagnostics. Positivity is non-fatal here — callers aggregate the worst window and pass it to summarize_cs_positivity_status after the loop, where the run-level require_substep_positivity policy decides whether to error or warn.
AtmosTransport.Preprocessing.verify_ll_window_contract! Method
verify_ll_window_contract!(m_cur, am, bm, cm, m_next, steps_per_window, win_idx;
replay_tol, positivity_cfl_limit = 0.95,
div_scratch = nothing)Single canonical per-window LL binary contract check. Runs the replay gate (errors on failure) followed by the per-substep positivity scan (verify_substep_positivity_ll!, returns a diagnostic).
div_scratch may be pre-allocated by the caller (workspace-owned scratch from P2) to suppress the per-window Array{Float64} the default-allocating verify_window_continuity_ll would otherwise produce. Default nothing → allocate locally.
Returns (; replay, positivity). Positivity is non-fatal here — callers aggregate the worst window across the loop and pass it to summarize_ll_positivity_status where the run-level require_substep_positivity policy decides whether to error or warn.
AtmosTransport.Preprocessing.verify_rg_window_contract! Method
verify_rg_window_contract!(m_cur, hflux, cm, m_next, face_left, face_right,
steps_per_window, win_idx;
replay_tol, positivity_cfl_limit = 0.95,
div_scratch = nothing,
outgoing_h = nothing, bad_h = nothing,
boundary_stub_tol = 0.0)Single canonical per-window RG binary contract check. Runs three gates in order:
Boundary-stub flux gate — errors hard if any boundary stub (
face_left ≤ 0/face_right ≤ 0) carries non-zerohfluxaboveboundary_stub_tol. Norequire_*escape hatch: such fluxes are silently discarded by the runtime (StrangSplitting.jl:279), so emitting them is always a writer bug.Replay gate —
verify_window_continuity_rg; errors on failure.Per-substep positivity scan —
verify_substep_positivity_rg!, returns a non-fatal diagnostic; the run-level accumulator +summarize_rg_positivity_statusdecides fatal-vs-warn.
div_scratch, outgoing_h, bad_h may be pre-allocated by the caller (workspace-owned scratch from P2) to suppress per-window allocation. Default nothing → allocate locally.
Returns (; replay, positivity). Boundary-stub failure does not return; it errors out before the replay gate so a broken writer cannot silently emit a binary the runtime would partially evaluate.
AtmosTransport.Preprocessing.verify_substep_positivity_cs! Method
verify_substep_positivity_cs!(m, am, bm, cm; cfl_limit = 0.95,
halo_width = 0, m_next = nothing)Verify the per-substep horizontal+vertical positivity contract that the runtime's _cs_static_subcycle_count depends on. For every interior cell on every panel:
The cell air mass itself must be positive (
m > 0). A non-positive cell mass is an immediate contract violation — the runtime divides bymand would produceInforNaNin the CFL scan. Such a cell is reported withratio = Infregardless of flux magnitude.The combined Strang-palindrome outgoing budget
2 * (out_x + out_y + out_z)must not exceedcfl_limit * m_ref, wherem_ref = min(m, m_next)whenm_nextis supplied by a caller that wants endpoint tightening, andm_ref = motherwise. The factor of 2 is required because the CS runtime applies the direction sequenceX-Y-Z-Z-Y-Xfor every met substep.
Returns a NamedTuple (direction, ratio, location, ok):
direction :: Union{Symbol, Nothing}— the dominant contributor among:x,:y,:z, ornothingwhen no cell was inspected.ratio :: Float64— worst observed palindrome outgoing budget over the reference mass, orInfif any inspected cell had invalid mass or flux.location :: NTuple{4, Int}—(panel, i, j, k)of the worst cell.ok :: Bool—trueiffratio <= cfl_limit.
The replay gate (verify_write_replay_cs!) only checks endpoint continuity. A binary that drives a cell mass negative mid-sweep can still pass replay because the cell re-fills from inflow before the window ends — but the runtime cannot recover. This gate is the actual contract the runtime depends on.
halo_width defaults to 0 (panel arrays are stored unhaloed at preprocess time); pass > 0 to scan only the interior of a haloed buffer.
AtmosTransport.Preprocessing.verify_substep_positivity_ll! Method
verify_substep_positivity_ll!(m, am, bm, cm; cfl_limit = 0.95)Per-substep horizontal+vertical positivity scan for a structured LL window. Mirrors verify_substep_positivity_cs! but operates on a single LL window (no panel dimension).
For every cell (i, j, k):
m > 0. A non-positive cell mass is reported withratio = Infand short-circuits this cell's CFL ratios; the runtime divides bymand wouldInf/NaNotherwise.Outgoing mass per substep, per direction, ≤
cfl_limit * m.
NaN/Inf cell mass and NaN/Inf fluxes are flagged as ratio = Inf (see CS round-2 fix in cubed_sphere_contracts.jl).
Returns (direction, ratio, location, ok) with:
direction :: Union{Symbol, Nothing}—:x/:y/:z/nothing(no inspection).ratio :: Float64— worstoutgoing / m.location :: NTuple{3, Int}—(i, j, k).ok :: Bool—ratio ≤ cfl_limit.
AtmosTransport.Preprocessing.verify_substep_positivity_rg! Method
verify_substep_positivity_rg!(m, hflux, cm, face_left, face_right;
cfl_limit = 0.95,
outgoing_h = nothing, bad_h = nothing)Per-substep horizontal+vertical positivity scan for a face-indexed RG window. Mirrors verify_substep_positivity_cs! / ..._ll! but operates on the face-indexed RG mass-flux representation.
For every cell (c, k):
m > 0. A non-positive cell mass is reported withratio = Inf.Horizontal outgoing mass per substep ≤
cfl_limit * m. Only interior faces (face_left > 0 && face_right > 0) contribute, matching the runtime advection inStrangSplitting.jl:279. Boundary stubs are not counted as outflow here — seeverify_boundary_stub_flux_rgfor the separate "non-zero flux on a boundary stub" invariant.Vertical outgoing mass per substep ≤
cfl_limit * m. Same as CS / LL:max(0, -cm[c, k]) + max(0, cm[c, k+1]).
NaN/Inf cell mass and NaN/Inf fluxes are flagged as ratio = Inf (matches the CS round-2 fix).
outgoing_h and bad_h can be passed in as workspace-owned scratch to suppress per-window allocation once P2 wires this into the unified driver. Default nothing → allocate locally.
Returns (direction, ratio, location, ok) with:
direction :: Union{Symbol, Nothing}—:h/:z/nothing.ratio :: Float64— worstoutgoing / m.location :: NTuple{2, Int}—(cell, level).ok :: Bool—ratio ≤ cfl_limit.
AtmosTransport.Preprocessing.verify_window! Function
verify_window!(window, contract::AbstractWindowContract, win_idx::Int)
-> (; replay, positivity)Run the contract's replay and positivity gates for one window. The replay gate throws on violation; the positivity gate is non-fatal at this layer (the run-level accumulator + summarize_status! decides fatal-vs-warn based on require_substep_positivity).
window is the topology's per-window payload (a NamedTuple in P1, a typed ReadyWindow{G, FT} in P2).
AtmosTransport.Preprocessing.verify_window! Method
verify_window!(window, contract::CubedSphereContract, win_idx::Int)
-> (; replay, positivity)Run the per-window CS contract on a NamedTuple window with fields m_cur, am, bm, cm, m_next (each a 6-tuple of panel arrays). Delegates to verify_cs_window_contract!; the replay gate throws on violation, the positivity gate is non-fatal here.
AtmosTransport.Preprocessing.verify_write_replay_cs! Method
verify_write_replay_cs!(m_cur, am, bm, cm, m_next, steps_per_window, tol_rel, win_idx;
div_scratch = nothing)Run the CS write-time replay gate for one window and return its diagnostic.
The check integrates the stored panel-local fluxes from m_cur under the runtime palindrome-continuity contract and verifies that the result matches the explicit endpoint m_next. A failure here means the binary would produce a runtime day-boundary or window-boundary mass inconsistency.
div_scratch may be pre-allocated by the caller (workspace-owned scratch from P2 / contract-owned lazy scratch from P1) so the gate doesn't allocate the panel-shared div_h per call. Default nothing → allocate locally. Shape must match size(m_cur[1]).
AtmosTransport.Preprocessing.window_metadata Function
window_metadata(reader::AbstractMetReader) → NamedTuplePer-source window timing metadata. Standard fields:
windows::Int—windows_per_day(reader).substeps::Int— sub-windows per write window (e.g. GEOS'sdt_met_seconds ÷ mass_flux_dt).dt_substep::Float64— substep wall-clock in seconds.
Concrete readers may add source-specific keys (e.g. GEOS's mass_flux_dt).
AtmosTransport.Preprocessing.windows_per_day Function
windows_per_day(settings::AbstractMetSettings, date::Date) -> IntNumber of preprocessing windows per UTC day for this source on date. For most sources this is constant (24 for hourly), but date-dependent sources (e.g. a leap-second day) may override.
AtmosTransport.Preprocessing.write_window! Function
write_window!(writer, ready)Write one validated ready window through a topology-specific binary writer. Concrete methods are topology-indexed by AbstractBinaryWriter and ReadyWindow so writer/window mismatches fail by dispatch.
AtmosTransport.Preprocessing.write_window! Method
write_window!(io, win_idx, storage, settings, merged, last_hour_next) -> Int64Write one window's payload blocks to the output stream in v4 on-disk order.
source