gatac.tl.lda

Contents

gatac.tl.lda#

gatac.tl.lda(adata, n_topics=20, layer=None, binarize=True, batch_size=256, n_epochs=5, e_step_iters=20, key_saved='X_lda', seed=0, **kwargs)#

Learn topics from a binarized peak matrix using GPU-accelerated mini-batch LDA.

Uses Online Variational Bayes (Hoffman et al., 2013) with JAX. Optimised for single-cell ATAC-seq binary peak-by-cell matrices.

Parameters:
adata AnnData

Annotated data matrix with a peak accessibility matrix.

n_topics int, optional (default: 20)

Number of topics to learn.

layer str or None, optional (default: None)

Key in adata.layers for the input matrix. If None, adata.X is used.

binarize bool, optional (default: True)

Clip values to {0, 1} before processing.

batch_size int, optional (default: 256)

Cells per mini-batch. Reduce for large peak sets to save GPU memory.

n_epochs int, optional (default: 5)

Number of passes over the data.

e_step_iters int, optional (default: 20)

Variational inference iterations per E-step.

key_saved str, optional (default: "X_lda")

Key in adata.obsm where topic proportions are saved.

seed int, optional (default: 0)

Random seed for reproducibility.

**kwargs

Additional keyword arguments passed to MiniBatchLDA.

Returns:

MiniBatchLDA The fitted model. Results are also stored in-place:

  • adata.obsm[key_saved]: topic proportions (n_cells, n_topics)

  • adata.varm['lda_topics']: peak loadings per topic (n_peaks, n_topics)

Examples

>>> import gatac as ga
>>> model = ga.tl.lda(adata, n_topics=20, n_epochs=10)
>>> # Cell × topic proportions are stored in adata.obsm["X_lda"]
>>> adata.obsm["X_lda"].shape
(n_cells, 20)