6 · Linearization — operator-level chain rule
For: retrieval / inversion developers; anyone who needs
∂R/∂xfor a parameterx. The runnable workflow is at User Guide → Compute Jacobians; this page explains the why.Prev: 5 · Surfaces · Next: 7 · Architecture-Agnostic Code
The MOM solver in Concepts/04 computes
The three-tier Jacobian
State parameters x: ForwardDiff
τ_ref, n_r, n_i, r_m, σ_g, ──────(upstream)──────►
VMR, BRDF, ...
CoreScatteringOpticalPropertiesLin per layer
(τ̇, ϖ̇, Ż⁺⁺, Ż⁻⁺) ── AD boundary ──
┌────────────────────────────────────────────┐
│ Hand-coded chain rule (analytic RT kernel) │
│ Sanghavi 2014 App. C, Eqs C.8-C.21 │
└────────────────────────────────────────────┘
│
▼
∂R/∂x , ∂T/∂xThe Jacobian is split into two zones with a clean boundary between them.
Upstream — the AD zone (Mie cross-sections, gas absorption, atmospheric profile, surface BRDF parameters). Here ForwardDiff Dual numbers carry parameter derivatives forward, or analytic derivatives are computed by hand for the cases where Mie series are AD-hostile (refractive index, size distribution). The output of this zone is one CoreScatteringOpticalPropertiesLin per atmospheric layer:
# src/CoreRT/types_lin.jl:119–149
struct CoreScatteringOpticalPropertiesLin{T1,T2,T3} <: AbstractOpticalPropertiesLin
τ̇::T1 # ∂τ/∂x
ϖ̇::T2 # ∂ϖ_0/∂x
Ż⁺⁺::T3 # ∂Z⁺⁺/∂x
Ż⁻⁺::T3 # ∂Z⁻⁺/∂x
endThe aliases OpticalPropertyJacobian = CoreScatteringOpticalPropertiesLin make this the explicit AD-boundary handoff struct.
Downstream — the pure-FT zone (elemental, doubling, interaction). Here every kernel is plain Float32 or Float64 and every derivative is computed by a hand-coded tangent-linear partner of the forward kernel.
The split has three benefits:
The hot loop stays pure-
FT. NoDual{T,V,N}arithmetic in the inner kernels — those would multiply work by1+N.Analytic derivatives are numerically stable through batched matrix inversion. ForwardDiff through
batch_inv!works (and is supported on GPU), but the analytic chain rule on(E − R·R)⁻¹is closed-form and avoids accumulating AD round-off.The chain rule on adding-doubling is closed-form. Sanghavi 2014 App. C derives compact expressions for the tangent-linear adding/doubling updates. They're roughly the same shape as the forward updates — same matrix structure, computed against the same operands.
Why this is fast: the matrix inversion is reused
A combined forward + linearized run costs less than 2× a forward-only run. Not
1 + N_params×forward. Not even2× forward. Less than 2×. This is the production-impact property of operator-level analytic linearization in MOM, and it's the headline reason vSmartMOM is usable inside Levenberg-Marquardt loops with hundreds of state vector elements.
The dominant cost in the forward solver is the batched matrix inversion at each doubling step (the geometric-series factor T*_inv factors in Concepts/04 § Adding).
The linearized partner needs the derivative of that inverse:
It reuses the already-computed
Compare:
| Approach | Cost ratio (lin+fwd : fwd) |
|---|---|
| vSmartMOM analytic linearization (this codebase) | < 2× |
ForwardDiff Dual{T,V,N} through the kernels | (1 + N) × (every operation, including the inversion, picks up N partial slabs) |
| Finite differences | (1 + N_state) × (one full forward per parameter) |
For a typical OCO-2 retrieval with
That property holds at the kernel level (one doubling iteration, one adding step) and aggregates through the entire RT solve. It is preserved on GPU because both batched_mul / batch_inv! interface — see Concepts/07.
Status — refined AD→analytic boundary in progress The current production codebase runs the production-fast linearized path described above; a cleaner AD-upstream → hand-coded-analytic-downstream boundary is in active development. The goal is to make the seam between the two zones (the CoreScatteringOpticalPropertiesLin handoff struct, δ-M chain-rule expansion, microphysical parameter forwarding) more idiomatic — fewer manual chain-rule expansions in upstream code, cleaner extension points for adding new state-vector parameters. None of this changes the production performance claim above (the matrix-inversion reuse is the kernel-level guarantee and stays put); it cleans up the ergonomics of the AD boundary. See docs/dev_notes/ for the work-in-progress plans. :::
The chain rule on adding-doubling
The full derivation is Sanghavi 2014 App. C. The structure (skipping arithmetic):
| Eq. | What it says | Source file |
|---|---|---|
| (C.5)–(C.7) | Differentiation rules for matrix products and inverses | (foundation; used everywhere below) |
| (C.8)–(C.10) | Elemental derivatives | elemental_lin.jl |
| (C.11)–(C.16) | Doubling/adding derivatives — same shape as the forward Eqs (23)–(28), tangent-linear | doubling_lin.jl, interaction_lin.jl |
| (C.17)–(C.20) | D-matrix symmetry on derivatives — halves the linearized doubling cost | doubling_lin.jl |
| (C.21) | Final assembled derivative form — what lin_added_layer_all_params! produces | lin_added_layer_all_params.jl |
| (C.22)–(C.24) | Layer-averaged | compEffectiveLayerProperties.jl (forward); compEffectiveLayerProperties_lin.jl (lin) |
| (C.25)–(C.26) | Chain rule from the elemental SS variables to the microphysical parameters | lin_added_layer_all_params.jl + Scattering/types_lin.jl |
| (C.27)–(C.31) | δ derivatives (elemental thickness from N_doubl) | compEffectiveLayerProperties_lin.jl |
| (C.32)–(C.39) | same | |
| (C.40) | compute_Z_matrices.jl (linearized variant) | |
| (C.41)–(C.42) | delta_m_truncation_lin.jl |
The _lin.jl files in src/CoreRT/CoreKernel/ are tangent-linear partners of elemental.jl, doubling.jl, interaction.jl. Each forward kernel has a _lin companion that computes the derivative quantities alongside the forward ones in a single sweep.
ParameterLayout — naming the Jacobian columns
The Jacobian matrix has rows indexed by (VZA, Stokes, λ) (the radiance output) and columns indexed by retrieval parameters. The column ordering is centralized in src/CoreRT/parameter_layout.jl:1–67:
struct ParameterLayout
aerosol_params::Int # = 7 (τ_ref, n_r, n_i, r_m, σ_g, p₀, σ_p)
n_aerosols::Int
n_gases::Int
n_surface::Int
end
n_total(layout) # total number of Jacobian columns
aerosol_range(layout, iaer) # column indices for aerosol iaer (length 7)
gas_range(layout) # column indices for gas VMRs
surface_range(layout) # column indices for surface paramsAlways use these accessors instead of hand-writing arithmetic like 7*NAer + NGas + 1. The seven aerosol parameters per mode are:
| # | Parameter | Meaning |
|---|---|---|
| 1 | τ_ref | aerosol optical depth at reference wavelength |
| 2 | n_r | real refractive index |
| 3 | n_i | imaginary refractive index |
| 4 | r_m | median radius of size distribution |
| 5 | σ_g | geometric standard deviation of size distribution |
| 6 | p₀ | central pressure of vertical Gaussian |
| 7 | σ_p | width of vertical Gaussian |
A run with two aerosol modes, four gases, and one surface parameter has a Jacobian with 2·7 + 4 + 1 = 19 columns, and aerosol_range(layout, 2) returns 8:14.
What goes through ForwardDiff vs analytic
Strategy by parameter type (as currently implemented and recommended for new parameters):
| Parameter | Recommended path | Reason |
|---|---|---|
| Lambertian albedo | analytic | trivial: |
| RPV / Ross-Li / Cox-Munk BRDF | ForwardDiff | low-dimensional, simple surface code |
τ_ref (aerosol OD) | analytic | trivial: |
p₀, σ_p (aerosol height) | analytic | already in atmo_prof_lin.jl |
n_r, n_i (refractive index) | analytic Mie | Mie series is AD-hostile (recurrences) |
r_m, σ_g (size distribution) | analytic Mie | same |
| Gas VMR scaling | analytic | |
| Surface pressure | ForwardDiff | affects many code paths (Rayleigh, profile, absorption) |
| Temperature profile | ForwardDiff (future) | affects absorption cross-sections nonlinearly |
For parameters in the "ForwardDiff" rows, the upstream code carries Dual{T,V,N} numbers; the resulting CoreScatteringOpticalPropertiesLin arrays are real-valued (the Dual partials having been extracted at the boundary) and the analytic RT chain rule takes over. This is the hybrid AD pattern from Concepts/07, and it's why ForwardDiff Duals need to flow through batched_mul and batch_inv! on GPU — both for the upstream Mie/profile/surface code (when the user picks ForwardDiff for those) and for forward-mode AD of higher-level functions involving rt_run.
Code anchors
| Concept | Source |
|---|---|
| Linearized RT entry | src/CoreRT/rt_run_lin.jl |
| Linearized model construction | src/CoreRT/tools/lin_model_from_parameters.jl |
| Elemental derivatives | src/CoreRT/CoreKernel/elemental_lin.jl |
| Doubling derivatives | src/CoreRT/CoreKernel/doubling_lin.jl |
| Interaction derivatives | src/CoreRT/CoreKernel/interaction_lin.jl |
| Chain-rule expansion to all parameters | src/CoreRT/CoreKernel/lin_added_layer_all_params.jl:1–100 |
| Three-core-variable lin type | src/CoreRT/types_lin.jl:119–149 |
| Optical-property Jacobian boundary | src/CoreRT/types.jl::CoreScatteringOpticalPropertiesLin |
| ParameterLayout | src/CoreRT/parameter_layout.jl:1–67 |
| Mie linearization (analytic) | src/Scattering/types_lin.jl |
| Linearized atmospheric profile | src/CoreRT/tools/atmo_prof_lin.jl |
| Linearized δ-M truncation | src/CoreRT/LayerOpticalProperties/delta_m_truncation_lin.jl |
| Linearized Cox-Munk | src/CoreRT/Surfaces/coxmunk_surface_lin.jl |
See also User Guide → Compute Jacobians for the runnable workflow, Tutorial_Jacobians for a hands-on example with finite-difference cross-checks, and Tutorial_HybridAD for ForwardDiff across the analytic RT kernel.
References
Sanghavi et al. (2014), JQSRT 133:412–433, doi:10.1016/j.jqsrt.2013.09.004, App. C. Primary linearization reference.
Sanghavi et al. (2013), Linearization of a scalar matrix operator method radiative transfer model, JQSRT 116:1–16, doi:10.1016/j.jqsrt.2012.10.021. (Scalar predecessor; same structure, simpler derivation.)
Hasekamp & Landgraf (2005), Linearization of vector radiative transfer with respect to aerosol properties and its use in satellite remote sensing, JGR 110:D04203. (Independent linearized vector RT for comparison.)
Crib sheet:
docs/dev_notes/theory_references.md§H.