gatac.tl.lda#
- gatac.tl.lda(adata, n_topics=20, layer=None, binarize=True, batch_size=256, n_epochs=5, e_step_iters=20, key_saved='X_lda', seed=0, **kwargs)#
Learn topics from a binarized peak matrix using GPU-accelerated mini-batch LDA.
Uses Online Variational Bayes (Hoffman et al., 2013) with JAX. Optimised for single-cell ATAC-seq binary peak-by-cell matrices.
- Parameters:
- adata
AnnData Annotated data matrix with a peak accessibility matrix.
- n_topics
int,optional (default:20) Number of topics to learn.
- layer
strorNone,optional (default:None) Key in
adata.layersfor the input matrix. If None,adata.Xis used.- binarize
bool,optional (default:True) Clip values to {0, 1} before processing.
- batch_size
int,optional (default:256) Cells per mini-batch. Reduce for large peak sets to save GPU memory.
- n_epochs
int,optional (default:5) Number of passes over the data.
- e_step_iters
int,optional (default:20) Variational inference iterations per E-step.
- key_saved
str,optional (default:"X_lda") Key in
adata.obsmwhere topic proportions are saved.- seed
int,optional (default:0) Random seed for reproducibility.
- **kwargs
Additional keyword arguments passed to
MiniBatchLDA.
- adata
- Returns:
MiniBatchLDA The fitted model. Results are also stored in-place:
adata.obsm[key_saved]: topic proportions (n_cells, n_topics)adata.varm['lda_topics']: peak loadings per topic (n_peaks, n_topics)
Examples
>>> import gatac as ga >>> model = ga.tl.lda(adata, n_topics=20, n_epochs=10) >>> # Cell × topic proportions are stored in adata.obsm["X_lda"] >>> adata.obsm["X_lda"].shape (n_cells, 20)