gatac.tl.spectral#
- gatac.tl.spectral(adata, n_comps=30, features='selected', random_state=0, weighted_by_sd=True, feature_weights=None, inplace=True, chunk_size=None)#
GPU-accelerated spectral embedding via Laplacian Eigenmaps.
Converts the cell × feature count matrix into a lower-dimensional representation using the spectrum of the normalized graph Laplacian defined by pairwise cosine similarity between cells. The entire computation runs on the GPU via CuPy, using a matrix-free approach that never materializes the N × N similarity matrix.
This is a GPU-accelerated port of SnapATAC2’s
tl.spectral.- Parameters:
- adata AnnData
AnnData object.
adata.Xshould be a sparse cell × tile (or peak) matrix — binarized or count-valued.- n_comps int
Number of spectral dimensions to compute. When
weighted_by_sd=True(default) the result is insensitive to this value as long as it is large enough (e.g. 30).- features str | ndarray | None
Which features (columns) to use.
"selected"uses the boolean mask inadata.var["selected"]and requires a prior call topp.select_features. You can also pass a NumPy boolean array of lengthn_varsorNoneto use all features.- random_state int
Seed for reproducibility of the Lanczos starting vector.
- weighted_by_sd bool
If True (default), weight each eigenvector by the square root of its eigenvalue and discard components with non-positive eigenvalues. This typically eliminates the need to manually choose the number of components.
- feature_weights ndarray | None
Optional per-feature IDF weights. If
None, IDF weights are computed automatically from the data.- inplace bool
If True, store the embedding in
adata.obsm["X_spectral"]and eigenvalues inadata.uns["spectral_eigenvalue"]. If False, return(eigenvalues, eigenvectors)as numpy arrays.- chunk_size int | None
When set, process the matrix in row-batches of this many cells instead of loading the full matrix into GPU memory at once. Only one chunk resides on the GPU at a time during each Lanczos iteration, trading throughput for reduced peak GPU memory. Recommended values: 20 000 – 50 000.
None(default) loads the full matrix to the GPU (fastest, but requires enough VRAM).
- Returns:
tuple[np.ndarray, np.ndarray] | None If
inplace=True: stores results in adata and returnsNone. Ifinplace=False: returns(eigenvalues, eigenvectors).- Return type:
Notes
In the default full-GPU path, the matrix is uploaded in its original integer dtype and only
X.datais converted to float32 in place during normalization. The sparse index arrays stay int32, and peak GPU memory is roughly one float32 copy of thedataarray.When
chunk_sizeis set, peak GPU memory is roughly one chunk. The full matrix stays on the CPU, row chunks are streamed to the GPU for each eigsh matvec, and throughput is lower because each matvec requires two passes over the streamed chunks.The algorithm:
Apply IDF weights and L2-normalize each row of the selected feature matrix so that row dot-products equal cosine similarities.
Define a matrix-free linear operator
A v = X (X^T v) - D vwhereDis the degree vector (sum of cosine similarities per cell). This implicitly representsD^{-1/2} (S - I) D^{-1/2}withS = X X^T.Compute the top-k eigenpairs via CuPy’s Lanczos (
eigsh).Optionally weight eigenvectors by
sqrt(eigenvalue).
Examples
>>> import gatac >>> # After tile matrix creation and feature selection: >>> gatac.pp.select_features(adata) >>> gatac.tl.spectral(adata) >>> adata.obsm["X_spectral"].shape (n_cells, n_effective_comps)
For large datasets that cause GPU OOM:
>>> gatac.tl.spectral(adata, chunk_size=30_000)