Skip to content

Output schema

The runtime writes NetCDF4 snapshot files declared by [output] path in the run config. split = "single" writes one file per run; split = "daily" writes one file per daily binary. This page documents the exact variable layout, dimensions, units, and per-topology conventions, so a downstream tool (Python / Julia / NCO / CDO) can consume the output without having to look up the writer source.

The writer entry point is src/Output/netcdf_writer.jl (write_snapshot_netcdf at line 81) which dispatches on the runtime mesh type into one of three per-topology writers.

The variable list is controlled by [output.fields]. By default every field below is written. Setting layers = "none" suppresses per-level tracer VMR variables; setting layers = "selected" writes the same variable names on the lev_selected dimension. tracers = [...] restricts all tracer diagnostics to that subset, with optional [output.fields.per_tracer.<name>] overrides.

Global attributes

Every snapshot file carries a CF-style global header set by _define_common_attributes! in src/Output/netcdf_schema.jl:

AttributeValue
Conventions"CF-1.8"
title"AtmosTransport runtime snapshot"
source"AtmosTransport.jl"
institutionENV["ATMOSTR_INSTITUTION"] if set, else "Caltech / Frankenberg group"
gridsummary(mesh) string (e.g. "72×37 LatLonMesh{Float32}")
grid_type"latlon" / "reduced_gaussian" / "cubed_sphere"
mass_basis"dry" or "moist" (matches state.air_mass)
output_contractversion tag for the schema
creation_dateISO-8601 UTC timestamp of the run
framework"AtmosTransport.jl"
framework_commitgit SHA of the source tree at run time (or "unknown")
framework_dirty"clean" or "dirty" (uncommitted changes flag)
runtimeJulia + backend string (e.g. "julia 1.10.5 / CUDA 12.4")
hostnameBase.Libc.gethostname() at run start
user$USER (or $USERNAME on Windows; "unknown" if neither is set)
output_optionsfloat_type=…, deflate_level=…, shuffle=… (only present when writer options are passed)
historyCF-canonical chain; the writer prepends "<creation_date>: written by AtmosTransport.Output (commit <sha>[+dirty]) with N frame(s)"

Every value is best-effort: non-git checkouts get framework_commit = "unknown"; environments without a USER env var get user = "unknown". No attribute is required at read time — but they are written unconditionally, so downstream tooling can rely on the keys being present.

Lat-lon snapshot

Dimensions:

DimLength
lonNx (cell centers)
latNy
levNz (positive = "down"lev[1] is TOA, lev[end] is surface)
timeone entry per configured output time that actually fired
lev_selectedonly present when [output.fields] layers = "selected" or air_mass_layers = "selected"

Coordinate variables:

VariableShapeUnits (writer string)
lon(lon,)degrees_east
lat(lat,)degrees_north
lon_bounds(lon, nv) (nv = 2)degrees_east
lat_bounds(lat, nv) (nv = 2)degrees_north
cell_area(lon, lat)m2
time(time,)hours since 2000-01-01 00:00:00
lev(lev,)1 (dimensionless level index; positive = "down")

Per-topology mass diagnostics (always written):

VariableShapeUnits (writer string)Meaning
air_mass(lon, lat, lev, time)kgper-cell air mass on mass_basis
air_mass_per_area(lon, lat, lev, time)kg m-2layer mass divided by cell_area
column_air_mass_per_area(lon, lat, time)kg m-2column total divided by cell_area

Per-tracer fields (one set per [tracers.<name>] block). The units string written into the NetCDF reflects the runtime basis:

VariableShapeUnits (DryBasis writer string)Units (MoistBasis writer string)
<tracer>(lon, lat, lev, time)mol mol-1 drymol mol-1
<tracer>_column_mean(lon, lat, time)mol mol-1 drymol mol-1
<tracer>_column_mass_per_area(lon, lat, time)kg m-2kg m-2

The per-tracer full-3D field <tracer> is the mixing ratio, not the mass; for mass × area use <tracer>_column_mass_per_area.

Reduced-Gaussian snapshot

Dimensions:

DimLength
cellncells (flat ring-by-ring; ring j starts at ring_offsets[j])
lev, timeas for LL
lon, latrasterized regular LL diagnostic grid (for plotting)

All horizontal fields are written in native face-indexed form (dimension cell). For plot tools that don't understand reduced Gaussian, a single rasterized variant — the per-tracer column mean — is also written on a regular LL grid ((lon, lat)) via nearest-neighbor lookup. The native fields remain authoritative for any quantitative analysis.

VariableNative shapeRasterized?
air_mass(cell, lev, time)no
air_mass_per_area(cell, lev, time)no
column_air_mass_per_area(cell, time)no
cell_area(cell,)no
<tracer>(cell, lev, time)no
<tracer>_column_mean_native(cell, time)
<tracer>_column_mean(lon, lat, time) (rasterized via nearest-neighbor — diagnostic only)
<tracer>_column_mass_per_area(cell, time)no

The native fields are authoritative; the rasterized ones are for visualization.

Cubed-sphere snapshot

Dimensions:

DimLength
XdimNc (per-panel cell-x index)
YdimNc (per-panel cell-y index)
nf6 (panel face index, ordered by the active panel_convention)
lev, timeas for LL

The per-panel arrays are stacked into the nf dimension at write time (_cs_stack3 / _cs_stack2 in netcdf_writer.jl:36-52).

Per-topology fields:

VariableShapeUnits (writer string)
air_mass(Xdim, Ydim, nf, lev, time)kg
air_mass_per_area(Xdim, Ydim, nf, lev, time)kg m-2
column_air_mass_per_area(Xdim, Ydim, nf, time)kg m-2
cell_area(Xdim, Ydim, nf)m2
<tracer>(Xdim, Ydim, nf, lev, time)mol mol-1 dry (or mol mol-1 on moist basis)
<tracer>_column_mean(Xdim, Ydim, nf, time)same as <tracer>
<tracer>_column_mass_per_area(Xdim, Ydim, nf, time)kg m-2

A grid_mapping = "cubed_sphere" attribute is set on the horizontally-resolved variables; the active CS definition, coordinate law, center law, panel convention (gnomonic / geos_native), and longitude offset are in the global header so consumers can reconstruct the panel layout if needed (see Cubed-sphere).

Reading the snapshot

ncdump

bash
ncdump -h ~/data/.../my_run.nc | head -40

Python (NetCDF4)

python
import netCDF4 as nc

ds = nc.Dataset("~/data/.../my_run.nc")
print(ds.dimensions)
print(list(ds.variables.keys()))

# LL example
co2_cm = ds["co2_bl_column_mean"][:]   # shape (time, lat, lon)
print(co2_cm.shape, co2_cm.min(), co2_cm.max(), co2_cm.mean())

# CS example
ds_cs = nc.Dataset("~/data/.../my_cs_run.nc")
co2_cs = ds_cs["co2_bl_column_mean"][:]   # shape (time, nf, Ydim, Xdim)
panel = co2_cs[-1, 0, :, :]               # last frame, panel 1

Julia (NCDatasets.jl)

julia
using NCDatasets

ds = NCDataset("~/data/.../my_run.nc")
@show keys(ds.variables)

co2_cm = ds["co2_bl_column_mean"][:, :, end]   # last frame, (lon, lat) for LL
co2_air = ds["air_mass"][:, :, :, end]         # full 3D, (lon, lat, lev) for LL

Fill value

Every payload variable is defined with _FillValue = 1.0e15 (and missing_value = 1.0e15 for older tools); this matches the GEOS-Chem convention (Met_AD._FillValue == 1.0e15) so Panoply / ncview / IDV mask the same out-of-range cells with the same value. Float32 outputs truncate to Float32(1e15), which sits comfortably below floatmax(Float32) ≈ 3.4e38 and outside any physical mass / mixing-ratio range. The sentinel is written via NetCDF4's storage default so uninitialised cells are masked even if the writer never reaches them.

Compression and packing

OptionDefaultEffect
[output] deflate_level0 (no compression)NetCDF4 zlib level 0..9
[output] shuffletrueshuffle filter (only effective when deflate_level > 0)

For long production runs, deflate_level = 4, shuffle = true cuts file size ~3-4× with negligible compute overhead. Higher levels (6+) hit diminishing returns and slow the writer noticeably.

float_type is determined by the runtime's [numerics].float_type — F32 runs write F32 NetCDF, F64 runs write F64.