Skip to content

6 · Linearization — operator-level chain rule

For: retrieval / inversion developers; anyone who needs ∂R/∂x for a parameter x. The runnable workflow is at User Guide → Compute Jacobians; this page explains the why.

Prev: 5 · Surfaces · Next: 7 · Architecture-Agnostic Code

The MOM solver in Concepts/04 computes for a given atmospheric state. Retrievals need for every parameter they're trying to estimate — aerosol optical depths, refractive indices, size distributions, gas VMRs, surface BRDF parameters. This page explains how vSmartMOM computes those derivatives: operator-level analytic chain rule on the adding-doubling formulas, with ForwardDiff used only upstream at the optical-property boundary.

The three-tier Jacobian

   State parameters x:                ForwardDiff
   τ_ref, n_r, n_i, r_m, σ_g,    ──────(upstream)──────►
   VMR, BRDF, ...

           CoreScatteringOpticalPropertiesLin per layer
                  (τ̇, ϖ̇, Ż⁺⁺, Ż⁻⁺)              ── AD boundary ──

                  ┌────────────────────────────────────────────┐
                  │ Hand-coded chain rule (analytic RT kernel) │
                  │ Sanghavi 2014 App. C, Eqs C.8-C.21         │
                  └────────────────────────────────────────────┘


                              ∂R/∂x ,  ∂T/∂x

The Jacobian is split into two zones with a clean boundary between them.

Upstream — the AD zone (Mie cross-sections, gas absorption, atmospheric profile, surface BRDF parameters). Here ForwardDiff Dual numbers carry parameter derivatives forward, or analytic derivatives are computed by hand for the cases where Mie series are AD-hostile (refractive index, size distribution). The output of this zone is one CoreScatteringOpticalPropertiesLin per atmospheric layer:

julia
# src/CoreRT/types_lin.jl:119–149
struct CoreScatteringOpticalPropertiesLin{T1,T2,T3} <: AbstractOpticalPropertiesLin
    τ̇::T1        # ∂τ/∂x
    ϖ̇::T2        # ∂ϖ_0/∂x
    Ż⁺⁺::T3      # ∂Z⁺⁺/∂x
    Ż⁻⁺::T3      # ∂Z⁻⁺/∂x
end

The aliases OpticalPropertyJacobian = CoreScatteringOpticalPropertiesLin make this the explicit AD-boundary handoff struct.

Downstream — the pure-FT zone (elemental, doubling, interaction). Here every kernel is plain Float32 or Float64 and every derivative is computed by a hand-coded tangent-linear partner of the forward kernel. and their derivatives propagate together through the same adding-doubling sequence.

The split has three benefits:

  1. The hot loop stays pure-FT. No Dual{T,V,N} arithmetic in the inner kernels — those would multiply work by 1+N.

  2. Analytic derivatives are numerically stable through batched matrix inversion. ForwardDiff through batch_inv! works (and is supported on GPU), but the analytic chain rule on (E − R·R)⁻¹ is closed-form      and avoids accumulating AD round-off.

  3. The chain rule on adding-doubling is closed-form. Sanghavi 2014 App. C derives compact expressions for the tangent-linear adding/doubling updates. They're roughly the same shape as the forward updates — same matrix structure, computed against the same operands.

Why this is fast: the matrix inversion is reused

A combined forward + linearized run costs less than 2× a forward-only run. Not 1 + N_params×forward. Not even 2× forward. Less than 2×. This is the production-impact property of operator-level analytic linearization in MOM, and it's the headline reason vSmartMOM is usable inside Levenberg-Marquardt loops with hundreds of state vector elements.

The dominant cost in the forward solver is the batched matrix inversion at each doubling step (the geometric-series factor   ) and at each inhomogeneous-adding step (the two T*_inv factors in Concepts/04 § Adding). per spectral point per layer per Fourier moment, LU-decomp + back-substitution, dispatched to CUBLAS / KA-LU.

The linearized partner needs the derivative of that inverse:

It reuses the already-computed . No second LU, no second back-substitution. The marginal cost of the linearized step is two extra batched matmuls (cheap, well-parallelized, full GPU bandwidth) per core parameter  , not another inversion.

Compare:

ApproachCost ratio (lin+fwd : fwd)
vSmartMOM analytic linearization (this codebase)< 2×
ForwardDiff Dual{T,V,N} through the kernels(1 + N) × (every operation, including the inversion, picks up N partial slabs)
Finite differences(1 + N_state) × (one full forward per parameter)

For a typical OCO-2 retrieval with  , this is the difference between a Jacobian costing ~2 forward runs and one costing ~30. Inside a Levenberg-Marquardt loop that hits the forward model ~10 times to converge, the savings are dispositive — 20 forward equivalents vs 300+.

That property holds at the kernel level (one doubling iteration, one adding step) and aggregates through the entire RT solve. It is preserved on GPU because both and the matmuls dispatch through the same batched_mul / batch_inv! interface — see Concepts/07.

Status — refined AD→analytic boundary in progress The current production codebase runs the production-fast linearized path described above; a cleaner AD-upstream → hand-coded-analytic-downstream boundary is in active development. The goal is to make the seam between the two zones (the CoreScatteringOpticalPropertiesLin handoff struct, δ-M chain-rule expansion, microphysical parameter forwarding) more idiomatic — fewer manual chain-rule expansions in upstream code, cleaner extension points for adding new state-vector parameters. None of this changes the production performance claim above (the matrix-inversion reuse is the kernel-level guarantee and stays put); it cleans up the ergonomics of the AD boundary. See docs/dev_notes/ for the work-in-progress plans. :::

The chain rule on adding-doubling

The full derivation is Sanghavi 2014 App. C. The structure (skipping arithmetic):

Eq.What it saysSource file
(C.5)–(C.7)Differentiation rules for matrix products and inverses(foundation; used everywhere below)
(C.8)–(C.10)Elemental derivatives , , w.r.t. the three core layer variables elemental_lin.jl
(C.11)–(C.16)Doubling/adding derivatives — same shape as the forward Eqs (23)–(28), tangent-lineardoubling_lin.jl, interaction_lin.jl
(C.17)–(C.20)D-matrix symmetry on derivatives — halves the linearized doubling costdoubling_lin.jl
(C.21)Final assembled derivative form — what lin_added_layer_all_params! produceslin_added_layer_all_params.jl
(C.22)–(C.24)Layer-averaged , , definitionscompEffectiveLayerProperties.jl (forward); compEffectiveLayerProperties_lin.jl (lin)
(C.25)–(C.26)Chain rule from the elemental SS variables to the microphysical parameters lin_added_layer_all_params.jl + Scattering/types_lin.jl
(C.27)–(C.31)δ derivatives (elemental thickness from N_doubl)compEffectiveLayerProperties_lin.jl
(C.32)–(C.39) and derivatives (post-truncation)same
(C.40) from generalized spherical harmonicscompute_Z_matrices.jl (linearized variant)
(C.41)–(C.42) and for the truncated casedelta_m_truncation_lin.jl

The _lin.jl files in src/CoreRT/CoreKernel/ are tangent-linear partners of elemental.jl, doubling.jl, interaction.jl. Each forward kernel has a _lin companion that computes the derivative quantities alongside the forward ones in a single sweep.

ParameterLayout — naming the Jacobian columns

The Jacobian matrix has rows indexed by (VZA, Stokes, λ) (the radiance output) and columns indexed by retrieval parameters. The column ordering is centralized in src/CoreRT/parameter_layout.jl:1–67:

julia
struct ParameterLayout
    aerosol_params::Int     # = 7   (τ_ref, n_r, n_i, r_m, σ_g, p₀, σ_p)
    n_aerosols::Int
    n_gases::Int
    n_surface::Int
end

n_total(layout)             # total number of Jacobian columns
aerosol_range(layout, iaer) # column indices for aerosol iaer (length 7)
gas_range(layout)           # column indices for gas VMRs
surface_range(layout)       # column indices for surface params

Always use these accessors instead of hand-writing arithmetic like 7*NAer + NGas + 1. The seven aerosol parameters per mode are:

#ParameterMeaning
1τ_refaerosol optical depth at reference wavelength
2n_rreal refractive index
3n_iimaginary refractive index
4r_mmedian radius of size distribution
5σ_ggeometric standard deviation of size distribution
6p₀central pressure of vertical Gaussian
7σ_pwidth of vertical Gaussian

A run with two aerosol modes, four gases, and one surface parameter has a Jacobian with 2·7 + 4 + 1 = 19 columns, and aerosol_range(layout, 2) returns 8:14.

What goes through ForwardDiff vs analytic

Strategy by parameter type (as currently implemented and recommended for new parameters):

ParameterRecommended pathReason
Lambertian albedoanalytictrivial:  
RPV / Ross-Li / Cox-Munk BRDFForwardDifflow-dimensional, simple surface code
τ_ref (aerosol OD)analytictrivial:  
p₀, σ_p (aerosol height)analyticalready in atmo_prof_lin.jl
n_r, n_i (refractive index)analytic MieMie series is AD-hostile (recurrences)
r_m, σ_g (size distribution)analytic Miesame
Gas VMR scalinganalytic 
Surface pressureForwardDiffaffects many code paths (Rayleigh, profile, absorption)
Temperature profileForwardDiff (future)affects absorption cross-sections nonlinearly

For parameters in the "ForwardDiff" rows, the upstream code carries Dual{T,V,N} numbers; the resulting CoreScatteringOpticalPropertiesLin arrays are real-valued (the Dual partials having been extracted at the boundary) and the analytic RT chain rule takes over. This is the hybrid AD pattern from Concepts/07, and it's why ForwardDiff Duals need to flow through batched_mul and batch_inv! on GPU — both for the upstream Mie/profile/surface code (when the user picks ForwardDiff for those) and for forward-mode AD of higher-level functions involving rt_run.

Code anchors

ConceptSource
Linearized RT entrysrc/CoreRT/rt_run_lin.jl
Linearized model constructionsrc/CoreRT/tools/lin_model_from_parameters.jl
Elemental derivativessrc/CoreRT/CoreKernel/elemental_lin.jl
Doubling derivativessrc/CoreRT/CoreKernel/doubling_lin.jl
Interaction derivativessrc/CoreRT/CoreKernel/interaction_lin.jl
Chain-rule expansion to all parameterssrc/CoreRT/CoreKernel/lin_added_layer_all_params.jl:1–100
Three-core-variable lin typesrc/CoreRT/types_lin.jl:119–149
Optical-property Jacobian boundarysrc/CoreRT/types.jl::CoreScatteringOpticalPropertiesLin
ParameterLayoutsrc/CoreRT/parameter_layout.jl:1–67
Mie linearization (analytic)src/Scattering/types_lin.jl
Linearized atmospheric profilesrc/CoreRT/tools/atmo_prof_lin.jl
Linearized δ-M truncationsrc/CoreRT/LayerOpticalProperties/delta_m_truncation_lin.jl
Linearized Cox-Munksrc/CoreRT/Surfaces/coxmunk_surface_lin.jl

See also User Guide → Compute Jacobians for the runnable workflow, Tutorial_Jacobians for a hands-on example with finite-difference cross-checks, and Tutorial_HybridAD for ForwardDiff across the analytic RT kernel.

References

  • Sanghavi et al. (2014), JQSRT 133:412–433, doi:10.1016/j.jqsrt.2013.09.004, App. C. Primary linearization reference.

  • Sanghavi et al. (2013), Linearization of a scalar matrix operator method radiative transfer model, JQSRT 116:1–16, doi:10.1016/j.jqsrt.2012.10.021. (Scalar predecessor; same structure, simpler derivation.)

  • Hasekamp & Landgraf (2005), Linearization of vector radiative transfer with respect to aerosol properties and its use in satellite remote sensing, JGR 110:D04203. (Independent linearized vector RT for comparison.)

  • Crib sheet: docs/dev_notes/theory_references.md §H.