Skip to content

Run on GPU

For: users who want to run the solver on GPU backends.

Next: Quick Start, Configure a Scene, Tutorial: GPU, Architecture-Agnostic Code (Concepts).

GPU support is provided through optional backend extensions. CUDA is the mature NVIDIA path; Metal is a first-pass Apple Silicon path for Float32 runs. The full task page will cover:

  • when vSmartMOM.Architectures.GPU() is available;

  • when vSmartMOM.Architectures.MetalGPU() is available;

  • how array_type(model) and architecture dispatch select CPU or GPU arrays;

  • which workflows are GPU-safe today;

  • memory and precision caveats for Float32 and Float64 runs;

  • how to fall back cleanly to CPU.

For now, see the long-form GPU tutorial.

Performance Notes

The main architecture switch is the radiative_transfer.architecture field in the scene configuration, or params.architecture = GPU() after loading a configuration. GPU runs are most useful when enough spectral points, layers, or viewing geometries are batched to amortize kernel-launch and transfer costs. Use CPU() for small debugging scenes and for workflows that are not yet GPU-safe.

For Apple Silicon experiments, load Metal.jl and use MetalGPU() with Float32 scene parameters. The current Metal path focuses on the core batched matrix multiply and inverse operations; fused kernels can be added after Mac validation. The portable inverse kernel uses Metal threadgroup memory and is intended for modest stream/Stokes dimensions; larger matrices fail early with a clear local-memory error instead of a driver launch failure. With the current 32 KiB guard, Float32 matrices with N = Nquad * nStokes >= 64 are rejected. Metal Jacobian workflows have not been validated yet, so use CPU or CUDA for linearized runs.