Hybrid AD: Combining ForwardDiff with Analytic RT
vSmartMOM uses a hybrid approach to compute Jacobians efficiently: the outer derivatives (optical properties w.r.t. state vector parameters) can use ForwardDiff.jl, while the inner RT solver uses fully analytic derivatives for performance. This tutorial explains the architecture and shows how to use both paths on CPU and GPU.
This is a local walkthrough. It builds HITRAN absorption and aerosol Mie optical properties, so it is intentionally heavier than docs/test_examples.jl.
1) The AD Boundary
The key architectural idea is a clean separation between two zones:
State Vector x = [albedo, τ_aer, nᵣ, nᵢ, rₘ, σ_g, p₀, σ_p, VMR_gas, ...]
│
│ ← AD zone: ForwardDiff.Dual allowed here
▼
Optical Properties: τ(λ,z), ω(λ,z), Z(λ,z) per layer
│
│ ← extract values; store ∂(τ,ω,Z)/∂x as arrays
▼
RT Kernels: pure Float64/Float32 [analytic ∂R/∂(τ,ω,Z)]
│
│ ← chain rule: ∂R/∂x = ∂R/∂(τ,ω,Z) · ∂(τ,ω,Z)/∂x
▼
Output Jacobians: ∂R/∂xThe RT kernels (elemental, doubling, interaction) never see Dual numbers. They compute analytic derivatives with respect to 3 core optical properties (τ, ϖ, Z) using the linearized adding-doubling method. This is fast, GPU-compatible, and avoids the overhead of tracking Dual numbers through deep batched matrix operations.
2) Linearized RT on CPU
Let's run the linearized model and inspect the Jacobian layout.
using vSmartMOM
using vSmartMOM.CoreRT
params = read_parameters(
joinpath(pkgdir(vSmartMOM),
"test", "test_parameters", "JacobianTestFast.yaml"))
params.architecture = vSmartMOM.Architectures.CPU()Build the linearized model. LinMode() tells the constructor to compute derivative containers alongside the forward model.
model, lin_model = model_from_parameters(LinMode(), params)Run linearized RT: returns (R, T, dR, dT)
NAer = length(params.scattering_params.rt_aerosols)
NGas = size(lin_model.τ̇_abs[1], 1)
NSurf = 1
R_cpu, T_cpu, dR_cpu, dT_cpu = rt_run_lin(model, lin_model, NAer, NGas, NSurf)
println("R shape: ", size(R_cpu), " (nVZA × nStokes × nSpec)")
println("dR shape: ", size(dR_cpu), " (nVZA × nStokes × nSpec × nParams)")3) Understanding ParameterLayout
The Jacobian dimension of dR is ordered by ParameterLayout: each aerosol gets 7 sub-parameters, followed by gas VMRs, then surface.
layout = CoreRT.ParameterLayout(aerosol_params=7, n_aerosols=NAer,
n_gases=NGas, n_surface=NSurf)
println("\nParameter layout:")
println(" Total parameters: ", CoreRT.n_total(layout))
for ia in 1:NAer
rng = CoreRT.aerosol_range(layout, ia)
println(" Aerosol $ia: indices $rng [τ_ref, nᵣ, nᵢ, rₘ, σ_g, p₀, σ_p]")
end
if NGas > 0
println(" Gas VMRs: indices ", CoreRT.gas_range(layout))
end
println(" Surface: indices ", CoreRT.surface_range(layout))Extract specific Jacobians (first VZA, Stokes-I):
idx_albedo = CoreRT.surface_index(layout, 1)
idx_tau = CoreRT.aerosol_range(layout, 1)[1] # τ_ref
idx_nr = CoreRT.aerosol_range(layout, 1)[2] # nᵣ
println("\ndR/d(albedo) first 3 spec points: ",
round.(dR_cpu[1, 1, 1:min(3,end), idx_albedo], digits=6))
println("dR/d(τ_ref) first 3 spec points: ",
round.(dR_cpu[1, 1, 1:min(3,end), idx_tau], digits=6))
println("dR/d(nᵣ) first 3 spec points: ",
round.(dR_cpu[1, 1, 1:min(3,end), idx_nr], digits=6))4) Linearized RT on GPU
The same code runs on GPU by switching the architecture. All arrays are automatically moved to CuArray and the RT kernels use KernelAbstractions.jl for device-portable dispatch.
if vSmartMOM.Architectures.has_cuda()
params_gpu = read_parameters(
joinpath(pkgdir(vSmartMOM),
"test", "test_parameters", "JacobianTestFast.yaml"))
params_gpu.architecture = vSmartMOM.Architectures.GPU()
model_gpu, lin_model_gpu = model_from_parameters(LinMode(), params_gpu)
R_gpu, T_gpu, dR_gpu, dT_gpu = rt_run_lin(model_gpu, lin_model_gpu,
NAer, NGas, NSurf)
R_gpu_a = Array(R_gpu)
dR_gpu_a = Array(dR_gpu)
max_R_diff = maximum(abs.(R_gpu_a .- R_cpu))
max_dR_diff = maximum(abs.(dR_gpu_a .- dR_cpu))
println("\nGPU vs CPU comparison:")
println(" max |R_gpu - R_cpu| = ", round(max_R_diff, sigdigits=3))
println(" max |dR_gpu - dR_cpu| = ", round(max_dR_diff, sigdigits=3))
else
println("\nCUDA not available — skipping GPU section.")
end5) ForwardDiff for Mie Optical Properties
The Mie code supports ForwardDiff through compute_aerosol_optical_properties(model; autodiff=true). This wraps the NAI2 computation in ForwardDiff.jacobian with respect to the 4 Mie-sensitive parameters [rₘ, σ, nᵣ, nᵢ].
using vSmartMOM.Scattering
rt_aer = params.scattering_params.rt_aerosols[1]
truncation_type = Scattering.δBGE{Float64}(params.l_trunc, params.Δ_angle)
mie_model = make_mie_model(params.scattering_params.decomp_type,
rt_aer.aerosol, params.scattering_params.λ_ref,
params.polarization_type,
truncation_type,
params.scattering_params.r_max,
params.scattering_params.nquad_radius)Analytic derivatives (the default production path)
aer_optics_analytic, lin_aer_optics = compute_aerosol_optical_properties(
LinMode(), mie_model, Float64)
println("\nAnalytic Mie derivatives:")
println(" ω̃ = ", round(aer_optics_analytic.ω̃, digits=6))
println(" dω̃/d[nᵣ,nᵢ,rₘ,σ] = ", round.(lin_aer_optics.ω̃̇, digits=6))ForwardDiff derivatives (for validation / custom parameters)
aer_optics_ad = compute_aerosol_optical_properties(mie_model; autodiff=true)
println("\nForwardDiff Mie derivatives stored in AerosolOptics.derivs:")
println(" derivs shape: ", size(aer_optics_ad.derivs))
println(" (rows = flattened [α;β;γ;δ;ϵ;ζ;ω̃;k], cols = [rₘ,σ,nᵣ,nᵢ])")6) The δ-M Truncation Chain Rule
The RawAerosolJacobian struct defines the AD boundary for aerosols. It holds un-truncated derivatives:
struct RawAerosolJacobian{T}
τ̇_aer::T # ∂τ_aer/∂x
ω̃̇::T # ∂ω̃/∂x
ḟᵗ::T # ∂fᵗ/∂x ← truncation factor derivative
Ż⁺⁺::T # ∂Z⁺⁺/∂x
Ż⁻⁺::T # ∂Z⁻⁺/∂x
endThe function delta_m_truncation_lin then applies the analytic δ-M chain rule to transform these raw derivatives into the CoreScatteringOpticalPropertiesLin that the RT kernel expects:
This separation means you can compute RawAerosolJacobian via any method (analytic Mie, ForwardDiff, or finite differences) and the downstream RT chain rule handles the rest identically.
The current production code in createAero (compEffectiveLayerProperties_lin.jl) uses the analytic Mie path. The delta_m_truncation_linfunction indelta_m_truncation.jl provides a cleaner, composable interface for new code.
Summary
The hybrid AD architecture gives you flexibility:
| Component | Method | Why |
|---|---|---|
| RT kernels | Analytic (linearized adding-doubling) | Performance, GPU, batched matmul |
| Mie optics | Analytic (keep for speed) or ForwardDiff (for validation) | Mie series is AD-hostile |
| Surface albedo | Analytic (trivial) | Simple scalar derivative |
| Surface pressure | ForwardDiff (planned) | Affects many paths simultaneously |
| Gas VMR | Analytic (trivial: ∂τ_abs/∂VMR = cross_section) | Simple scaling |
| Aerosol profiles | Analytic | Already validated in atmo_prof_lin.jl |
The ParameterLayout struct generalizes the index arithmetic, and the RawAerosolJacobian / delta_m_truncation_lin pair defines a clean boundary where any upstream derivative method can plug in.
This page was generated using Literate.jl.