Meteorological Data: Preprocessing and Data Sources
New users: For step-by-step instructions on downloading data, preprocessing, and running the model, see the Quick Start Guide (
docs/QUICKSTART.mdin the repository root). This document is a deep-dive reference on met data formats, vertical coordinates, and the TM5 comparison.
This document explains how meteorological data flows into both TM5 and our Julia model, what preprocessing each requires, and why TM5's approach differs fundamentally from a simple gridpoint wind-based method.
CRITICAL: Model Levels Only — Never Pressure Levels
ERA5 pressure-level data (e.g. era5_pl_*.nc on fixed hPa levels) CANNOT be used for mass-flux transport.
The transport model requires model-level data (levtype: ml) on the native hybrid sigma-pressure grid, with lnsp (log surface pressure, param 152) for per-column pressure reconstruction:
Δp[i,j,k] = (A[k+1] - A[k]) + (B[k+1] - B[k]) × exp(lnsp[i,j])Pressure-level data interpolates to fixed isobaric surfaces, which destroys:
- Terrain-following coordinate — mountains are cut off or extrapolated below the surface, creating fictitious mass in sub-surface cells
- Column mass consistency —
sum(Δp)no longer equalsps - p_top, so the continuity equation cannot be satisfied - Vertical mass flux closure —
cmis derived from horizontal divergence and the B-ratio; this requiresΔpfrom hybrid A/B coefficients, not fixed pressure differences
Use scripts/download_era5_model_levels.jl which requests levtype: ml with params 131 (U), 132 (V), 135 (omega), 152 (LNSP). The resulting files contain variables u, v, w, lnsp on model_level dimensions.
Overview
| Aspect | TM5 | Julia AtmosTransport |
|---|---|---|
| Advection input | Mass fluxes (kg/s) through cell faces | Wind velocities (m/s) at cell centers |
| Mass conservation | Guaranteed by spectral integration (Bregman et al. 2003) | Depends on advection scheme discretization |
| Vertical coordinate | Hybrid sigma-pressure (A/B coefficients) | Hybrid sigma-pressure (A/B coefficients) |
| Convective fluxes | Read from ECMWF archive (eu, ed, du, dd) | Read from met data if available |
| Met format | Preprocessed NetCDF/HDF with specific naming | Standard NetCDF (ERA5, GEOS-FP, MERRA-2) |
TM5 Meteo Preprocessing
Why TM5 needs its own preprocessor
TM5 uses mass fluxes (kg/s through cell faces), not wind velocities, for tracer advection. Computing mass fluxes from ECMWF wind fields requires working in spherical harmonic (spectral) space to guarantee that the resulting mass fluxes satisfy the continuity equation exactly. A naive gridpoint computation (multiply u by dp/g and cell width) introduces mass imbalances that grow over time.
The foundational paper is:
Bregman, B., Segers, A., Krol, M., Meijer, E., and van Velthoven, P.: "On the use of mass-conserving wind fields in chemistry-transport models", Atmos. Chem. Phys., 3, 447-457, 2003. https://acp.copernicus.org/articles/3/447/2003/
TM5 is its own preprocessor
The preprocessing code lives inside TM5 itself, in the TMM (Transport Model Meteo) module:
base/src/tmm.F90: Core mass flux computationtmm_Read_MFUV(~line 1944): horizontal mass fluxes from spectral U/Vtmm_Read_MFW(~line 2234): vertical mass flux from spectral divergence
base/src/tmm_mf_ecmwf_tmpp.F90: Reads raw ECMWF GRIB filesbase/src/file_grib.F90: GRIB I/O using GRIBEX librarybase/src/grid_interpol.F90: Spectral-to-gridpoint transforms
The mass flux formulas (computed in spectral space):
mfu = (R/g) * integral[ U * (dA + dB * exp(LNSP)) / cos(lat) * dlat ]
mfv = (R/g) * integral[ V * (dA + dB * exp(LNSP)) * dlon ]
mfw = derived from spectral divergence + surface pressure tendencywhere A, B are hybrid level coefficients, LNSP is log surface pressure, R is Earth's radius, and g is gravitational acceleration. The integrals are evaluated analytically in spectral space.
Preprocessing mode
When TM5 is configured with tmm.output : T, it:
- Reads raw ECMWF GRIB data (spectral vorticity, divergence, LNSP)
- Computes mass-conserving mass fluxes via spectral integration
- Writes preprocessed files in TM5's NetCDF format
- Subsequent runs read from this archive with
tmm.output : F
Required ECMWF fields (spectral GRIB)
TM5's mass flux computation requires these ERA5 fields in spectral form (spherical harmonic coefficients), not gridpoint:
| Field | GRIB param | Description | Form |
|---|---|---|---|
| VO | 138 | Vorticity | Spectral |
| D | 155 | Divergence | Spectral |
| LNSP | 152 | Log surface pressure | Spectral |
| T | 130 | Temperature | Spectral or gridpoint |
| Q | 133 | Specific humidity | Gridpoint |
| CLWC | 246 | Cloud liquid water | Gridpoint |
| CIWC | 247 | Cloud ice water | Gridpoint |
| CC | 248 | Cloud cover | Gridpoint |
Convective fields (gridpoint, from ECMWF forecast):
| Field | GRIB param | Description |
|---|---|---|
| MUMF | 20 (table 162) | Updraft mass flux |
| MDMF | 21 (table 162) | Downdraft mass flux |
| EU | 44 | Updraft entrainment |
| ED | 45 | Updraft detrainment |
| DU | 46 | Downdraft entrainment |
| DD | 47 | Downdraft detrainment |
Surface fields (gridpoint):
| Field | GRIB param | Description |
|---|---|---|
| SP | 134 | Surface pressure |
| Z | 129 | Orography (geopotential) |
| LSM | 172 | Land-sea mask |
| BLH | 159 | Boundary layer height |
| 10U | 165 | 10m U-wind |
| 10V | 166 | 10m V-wind |
| 2T | 167 | 2m temperature |
| 2D | 168 | 2m dewpoint |
| SSHF | 146 | Surface sensible heat flux |
| SLHF | 147 | Surface latent heat flux |
TM5 preprocessed file format
After preprocessing, TM5 stores files as NetCDF with:
- Directory structure:
ec-ea-fc012up2tr3-ml137-glb100x100/YYYY/MM/ - File naming:
<paramkey>_YYYYMMDD_<tres>.nc(e.g.mfuv_20250201_00p03.nc) - Variable groups:
sp,mfuv,mfw,t,q,cld,convec - Dimensions:
lon,lat,lev,time - Hybrid coefficients:
ap,b(and bounds) for vertical coordinate - Surface pressure:
psincluded in every 3D file - Time: 3-hourly (
_00p03= 00:00, 03:00, 06:00, ...) - CF conventions: CF-1.4 with standard names
The I/O module is base/src/tmm_mf_tm5_nc.F90 (and the ERA5-specific variant in proj/era5_met/src/tmm_mf_tm5_nc.F90).
Prerequisites for running TM5 preprocessing
- ecCodes (or legacy GRIBEX) with Fortran bindings, compiled with
ifx - TM5 built with
with_tmm_ecmwfandwith_gribmacros enabled - ERA5 data downloaded as GRIB with spectral coefficients (not gridpoint NetCDF)
- CDS API access configured (
~/.cdsapirc)
See TM5LOCALSETUP.md for build instructions.
Julia Model Meteorological Input
Design Principle: Universal Hybrid Grid
All met data sources run on the same hybrid sigma-pressure vertical coordinate. The transport model never operates on raw pressure levels — every source provides A/B coefficients that define terrain-following levels:
p(k) = A[k] + B[k] × p_surfaceThis is a fundamental design decision:
- ERA5 (137 levels): spectral-harmonic native grid → distributed on regular lat-lon. A/B coefficients from
config/era5_L137_coefficients.toml. - GEOS-FP (72 levels): cubed-sphere internally → output on regular lat-lon. A/B from
config/geos_L72_coefficients.toml. - MERRA-2 (72 levels): same vertical coordinate as GEOS-FP, different horizontal resolution. Same
geos_L72_coefficients.toml. - Future sources: any met data that provides horizontal winds, surface pressure, and hybrid A/B coefficients can be added by writing a new TOML config and (if needed) a coefficient file. No Julia code changes required.
The HybridSigmaPressure type provides the universal abstraction. A vertical coordinate is constructed from any met source config with:
config = default_met_config("geosfp")
vc = build_vertical_coordinate(config) # 72-level HybridSigmaPressureThis means the transport model sees the same type regardless of whether the underlying data came from ECMWF spectral harmonics, NASA's cubed-sphere model, or any future gridded dataset. The physics code (advection, convection, diffusion) only depends on the HybridSigmaPressure interface.
Supported data sources
| Source | Config file | Resolution | Levels | Coefficients | Access |
|---|---|---|---|---|---|
| ERA5 | config/met_sources/era5.toml | 0.25-2 deg | 137 model | era5_L137_coefficients.toml | CDS API (~/.cdsapirc) |
| GEOS-FP (lat-lon) | config/met_sources/geosfp.toml | 0.25 deg | 72 hybrid | geos_L72_coefficients.toml | OPeNDAP (no auth) |
| GEOS-FP (native CS) | N/A (direct reader) | C720 (~12.5 km) | 72 hybrid | geos_L72_coefficients.toml | WashU archive (no auth) |
| MERRA-2 | config/met_sources/merra2.toml | 0.5x0.625 deg | 72 hybrid | geos_L72_coefficients.toml | NASA Earthdata (~/.netrc) |
Processing pipeline
- Read (
src/IO/met_reader.jl): Load NetCDF variables, apply unit conversions - Build vertical coordinate:
build_vertical_coordinate(config)loads A/B coefficients and constructsHybridSigmaPressure - Stagger (
src/IO/met_fields_bridge.jl): Interpolate cell-center winds to facesu-> x-faces (Nx+1, Ny, Nz)v-> y-faces (Nx, Ny+1, Nz)omega-> w at z-interfaces (Nx, Ny, Nz+1), sign flip (omega>0 downward, w>0 upward)
- Convection: Read
conv_mass_flux_up/downif available, else convection is a no-op - Diffusivity: Read
kh(eddy diffusivity) if available
Download scripts
| Script | Source | Output location |
|---|---|---|
scripts/download_era5_week.jl | ERA5 (CDS API) | ~/data/metDrivers/era5/ |
scripts/download_era5_model_levels.jl | ERA5 model levels (CDS API) | ~/data/metDrivers/era5/ |
scripts/download_geosfp_week.jl | GEOS-FP lat-lon (OPeNDAP) | ~/data/metDrivers/geosfp/ |
scripts/download_geosfp_cs_massflux.jl | GEOS-FP native C720 cubed-sphere (WashU) | ~/data/geosfp_cs/ |
scripts/download_test_data.jl | All three | ~/data/metDrivers/*/test/ |
Current status
- ERA5 model-level data (L137) supported via
scripts/download_era5_model_levels.jl - A/B coefficients available for both ERA5 (137 levels) and GEOS/MERRA (72 levels)
- Universal
build_vertical_coordinate()works for all met sources - Native GEOS-FP C720 cubed-sphere mass flux reader implemented (
src/IO/geosfp_cubed_sphere_reader.jl) - No convective mass flux variables in the ERA5 config yet
- No spectral processing — uses gridpoint winds directly (TM5 comparison ongoing)
GEOS-FP Native Cubed-Sphere Mass Fluxes
Data source
Native cubed-sphere mass fluxes from GEOS-FP are archived by the GEOS-Chem Support Team at Washington University in St. Louis:
- URL:
http://geoschemdata.wustl.edu/ExtData/GEOS_C720/ - Reference: Martin et al. (2022), GMD 15, 8731-8748
- Access: HTTP, no authentication required
- Also available via: Globus
This data is NOT available on the NASA NCCS portal — the NCCS portal only serves lat-lon interpolated GEOS-FP products. The native cubed-sphere data is extracted from GMAO's internal FP system and distributed via the WashU archive.
Collections
| Directory | File pattern | Size/file | Contents |
|---|---|---|---|
GEOS_FP_Native/ | tavg_1hr_ctm_c0720_v72 | ~2.7 GB | MFXC, MFYC, DELP, PS, CX, CY |
GEOS_FP_DerivedWind/ | tavg_1hr_ctmwind_c0720_v72 | ~1.7 GB | Derived U/V winds from mass fluxes |
GEOS_FP_Raw/ | Same as Native | Same | Mirror of Native |
For mass-flux transport, use GEOS_FP_Native (tavg_1hr_ctm collection).
File format
GEOS.fp.asm.tavg_1hr_ctm_c0720_v72.YYYYMMDD_HHMM.V01.nc4Timestamps at HH30 (center of hour-long averaging window), 24 files per day.
Dimensions: (Xdim=720, Ydim=720, nf=6, lev=72, time=1)
Note: nf (panel/face index) comes before lev (vertical) in the dimension ordering — not after as might be expected.
Variables:
| Variable | Dimensions | Units | Description |
|---|---|---|---|
MFXC | (720,720,6,72) | Pa⋅m² | Pressure-weighted accumulated eastward mass flux |
MFYC | (720,720,6,72) | Pa⋅m² | Pressure-weighted accumulated northward mass flux |
DELP | (720,720,6,72) | Pa | Pressure thickness |
PS | (720,720,6) | Pa | Surface pressure |
CX | (720,720,6,72) | — | Eastward accumulated Courant number |
CY | (720,720,6,72) | — | Northward accumulated Courant number |
lons | (720,720,6) | degrees | Cell-center longitudes |
lats | (720,720,6) | degrees | Cell-center latitudes |
corner_lons | (721,721,6) | degrees | Cell-corner longitudes |
corner_lats | (721,721,6) | degrees | Cell-corner latitudes |
C-grid convention: MFXC and MFYC use the same (Nc, Nc) indexing as cell-center variables. MFXC(i,j) is the flux through the east face of cell (i,j); MFYC(i,j) is the flux through the north face.
Unit conversion to kg/s: divide by g × Δt_met where g = 9.80665 m/s² and Δt_met = 3600 s (1-hour accumulation interval).
Data volume
| Period | # Files | Total Size |
|---|---|---|
| 1 day | 24 | ~65 GB |
| 1 month | 720 | ~1.9 TB |
| 1 year | 8,760 | ~23 TB |
Known gaps: June 16-18, 2024 are missing from the archive.
Also available: GEOS-IT (C180)
For longer reanalysis periods, GEOS-IT (the successor to MERRA-2) provides C180 (~50 km) cubed-sphere data from 1998 to present:
http://geoschemdata.wustl.edu/ExtData/GEOS_C180/GEOS_IT_Native/GEOS-IT uses the same file format and variable names as GEOS-FP.
Validation Strategy: TM5 vs Julia Model
Using the same ERA5 data
For a fair comparison, both models must process the same underlying ERA5 meteorology. The pipeline is:
ERA5 (ECMWF archive)
|
+--> Spectral GRIB -----> TM5 preprocessing (tmm.F90) -----> TM5 forward run
| mass-conserving mass fluxes
|
+--> Gridpoint NetCDF ---> Julia model (met_reader.jl) -----> Julia forward run
direct wind-based advectionBoth runs use the same ERA5 fields for the same time period. The comparison then tests whether the Julia model's gridpoint approach gives comparable tracer transport to TM5's spectral mass-flux approach.
What the comparison tests
- Advection accuracy: Gridpoint winds vs spectral mass fluxes
- Mass conservation: TM5 guarantees it; Julia model should be close
- Convection: Both use ECMWF-derived convective mass fluxes (once enabled in Julia)
- Overall transport: Column-integrated tracers, zonal means, time series
GEOS-FP validation (separate)
TM5 has no GEOS-FP reader. For GEOS-FP validation, compare the Julia model against:
- GEOS-Chem (which natively uses GEOS-FP)
- Observations (surface stations, satellite retrievals)
- The Julia model already has a working GEOS-FP reader
Mass fluxes vs. winds — lessons from GCHP v13
Martin et al. (2022, GMD 15, 8731-8748) demonstrate that directly using air mass fluxes instead of inferring them from archived winds significantly reduces transport errors in offline models. Key findings relevant to our model:
Surface pressure tendency error: Using winds gives 15× larger error than using mass fluxes directly (MAE 15 Pa vs 1 Pa for a 5-min timestep).
Restaggering error: Even on the native cubed-sphere grid, converting between A-grid (cell-center winds) and C-grid (face mass fluxes) weakens vertical advection by ~8% (regression slope = 0.92). Our
stagger_velocitiesfunction performs exactly this operation.Regridding + restaggering: The traditional GEOS-FP pipeline (C720 cubed-sphere → 0.25° lat-lon winds → regrid to model grid → restagger) introduces compounding errors that dominate vertical transport, especially in the stratosphere where vertical motion is weak.
Native cubed-sphere mass fluxes: Since March 2021, GEOS-FP operationally archives hourly C720 cubed-sphere mass fluxes. GEOS-IT provides C180 archives for 1998-present. These avoid all regridding/restaggering errors.
Implications for our model:
- Our current wind-based approach (cell-center → face staggering) introduces a known ~1% mass conservation error, consistent with our observed 0.9% drift
- Supporting direct mass flux ingestion from native cubed-sphere GEOS-FP/GEOS-IT archives would eliminate this error source entirely
- The
CubedSphereGridimplementation uses the same gnomonic projection as FV3, enabling direct use of native GEOS-FP mass flux archives in the future
CO2 Surface Flux Components
The model supports multiple CO2 flux components, following the ECMWF IFS/CHE nature run framework. Each component can be loaded and applied independently or combined via CompositeEmission.
Comparison with ECMWF IFS and CarbonTracker
| Component | ECMWF IFS (CY49R1) | CarbonTracker CT-NRT | Our Model |
|---|---|---|---|
| Anthropogenic | CAMS-GLOB-ANT v6.2 | Miller fossil fuel | EDGAR v8.0 |
| Biosphere | ECLand (online) | CASA + optimization | CT-NRT biofluxopt |
| Ocean | Jena CarboScope v2020 | OIF ocean prior + opt | Jena CarboScope oc_v2024 |
| Fire | GFAS v1.4 | GFED4.1s | CT-NRT firefluximp or GFAS |
Available flux datasets
| Source | Reader | Resolution | Temporal | Units (native) | Download Script |
|---|---|---|---|---|---|
| EDGAR v8.0 | load_edgar_co2 | 0.1° | Annual | Tonnes/yr | (manual) |
| CT-NRT v2025-1 | load_carbontracker_fluxes | 1×1° | 3-hourly | mol/m²/s | download_carbontracker_fluxes.jl |
| Jena CarboScope | load_jena_ocean_flux | 1×1° | Daily | PgC/yr/cell | download_ocean_flux.jl |
| CAMS GFAS v1.4 | load_gfas_fire_flux | 0.1° | Daily | kg/m²/s | download_gfas_fire.jl |
Tagged tracers (future)
The IFS CHE nature run uses tagged CO2 tracers (GRIB table 210) to track atmospheric enhancement by source:
| Parameter | Description |
|---|---|
| 210061 | Total CO2 mass mixing ratio |
| 68.210 | Natural biosphere CO2 flux |
| 69.210 | Anthropogenic CO2 emissions |
| 64.210 | CO2 column-mean molar fraction |
Our model can replicate this by adding separate tracers (co2_anthro, co2_bio, co2_ocean, co2_fire) transported independently, with total CO2 = sum of all components. This is implemented as a stretch goal.
Monthly-Sharded Mass Flux Preprocessing
For year-long simulations, mass flux files are sharded by month:
~/data/metDrivers/era5/era5_ml_1deg_20240101_20241231/
massflux_era5_202401_float32.nc (~5-12 GB)
massflux_era5_202401_float32.bin (mmap-ready binary)
massflux_era5_202402_float32.nc
...
massflux_era5_202412_float32.ncThe forward model (run.jl with a preprocessed config) accepts the preprocessed directory via the TOML configuration and chains through monthly shards automatically.
Data size estimates (1-degree, L88, full year)
| Component | Per Month | Full Year |
|---|---|---|
| ERA5 model-level downloads | ~3-5 GB | ~40-60 GB |
| Mass flux NetCDF (deflate=1) | ~5-12 GB | ~60-100 GB |
| Mass flux binary (mmap) | ~11 GB | ~135 GB |
| CT-NRT fluxes (1x1, 3-hourly) | ~60 MB | ~730 MB |
| GFAS fire (0.1, daily) | ~150 MB | ~2 GB |
| Jena CarboScope ocean | — | ~100 MB (single file) |
| EDGAR v8.0 | — | ~26 MB |
Total storage for a full-year 1-degree simulation: ~200-300 GB.
Data on disk
Current meteorological data available:
| Dataset | Location | Size | Details |
|---|---|---|---|
| ERA5 model levels (June 1-7) | ~/data/metDrivers/era5/era5_ml_10deg_20240601_20240607/ | ~1.1 GB | 1-deg, L88 (levels 50-137), 4x daily, u/v/w/lnsp |
| ERA5 model levels (June 8-30) | ~/data/metDrivers/era5/era5_ml_10deg_20240608_20240630/ | ~3.6 GB | 1-deg, L88 (levels 50-137), 4x daily, u/v/w/lnsp |
| ERA5 full year 2024 | ~/data/metDrivers/era5/era5_ml_10deg_20240101_20241231/ | ~31 GB (partial) | 1-deg, downloading via CDS API |
| GEOS-FP lat-lon week | ~/data/metDrivers/geosfp/ | 654 MB | 2025-02-01 to 07, ~4x5 deg |
| GEOS-FP native C720 June | ~/data/geosfp_cs/YYYYMMDD/ | ~1.9 TB (downloading) | C720 cubed-sphere, hourly, from WashU |
| EDGAR v8.0 CO2 | ~/data/emissions/edgar_v8/ | ~26 MB | 2022 annual totals, 0.1 deg |
| TM5 meteo archive | (not yet created) | – | Requires preprocessing step above |
Deleted (unusable):
era5/era5_025deg_20240601_20240831/— 0.25-deg ERA5 on fixed pressure levels (not model levels). Cannot be used for mass-flux transport. See "Critical: Model Levels Only" above.
Data server: NFS on curry.gps.caltech.edu, 7 TB volume, ~4.8 TB available.
References
- Bregman et al. (2003): Mass-conserving wind fields. ACP 3, 447-457.
- Krol et al. (2005): TM5 algorithm. ACP 5, 417-432.
- Williams et al. (2017): TM5-MP description. GMD 10, 721.
- Huijnen et al. (2010): TM5 tropospheric chemistry. GMD 3, 445.
- Martin et al. (2022): GCHP v13 — improved advection, native CS archives. GMD 15, 8731-8748. https://doi.org/10.5194/gmd-15-8731-2022
- Putman & Lin (2007): FV3 cubed-sphere advection. JCP 227, 55-78.
- Agusti-Panareda et al. (2023): CAMS GHG production system. GMD.
- Rödenbeck et al. (2013): Jena CarboScope ocean CO2 fluxes. Ocean Sci.
- Agustí-Panareda et al. (2019): CHE global nature run. Sci. Data 9, 1-16. https://doi.org/10.1038/s41597-022-01228-2
- TM5 Wiki: https://sourceforge.net/p/tm5/wiki/Meteo/