gatac.tl.MiniBatchLDA#
- class gatac.tl.MiniBatchLDA(n_topics=20, *, alpha=None, eta=None, batch_size=256, n_epochs=5, kappa=0.7, tau=64.0, e_step_iters=20, use_full_gpu_matrix=False, seed=0, verbose=True)#
Bases:
objectGPU-accelerated Mini-batch LDA via Online Variational Bayes.
Learns latent topics from a binary cell-by-peak matrix using stochastic variational inference with CuPy GPU acceleration. The E-step works directly on the CSR sparsity pattern so the per-cell working set is
K × nnz_iinstead ofK × V.- Parameters:
- n_topics
int Number of topics.
- alpha
floatorNone Symmetric Dirichlet prior on cell-topic distributions. If
None, defaults to1 / n_topics.- eta
floatorNone Symmetric Dirichlet prior on topic-peak distributions. If
None, defaults to1 / n_topics.- batch_size
int Cells per mini-batch.
- n_epochs
int Full passes over the data.
- kappa
float Learning rate decay exponent in (0.5, 1].
- tau
float Learning rate offset (down-weights early updates).
- e_step_iters
int Coordinate-ascent iterations in each E-step.
- use_full_gpu_matrix
bool If
True, try to cache the full CSR input matrix on GPU. Disabled by default because very large matrices can exhaust GPU memory before training starts.- seed
int Random seed.
- verbose
bool Print epoch progress.
- n_topics
Examples
>>> import gatac as ga >>> model = ga.tl.MiniBatchLDA(n_topics=20, n_epochs=10, verbose=True) >>> model.fit_transform(peak_adata.X, binarize=True) >>> # Inspect the most-weighted peaks per topic >>> top = model.top_peaks(peak_adata.var_names, n_top=20)
- __init__(n_topics=20, *, alpha=None, eta=None, batch_size=256, n_epochs=5, kappa=0.7, tau=64.0, e_step_iters=20, use_full_gpu_matrix=False, seed=0, verbose=True)#
Methods
__init__([n_topics, alpha, eta, batch_size, ...])fit(X[, binarize])Fit the model.
fit_transform(X[, binarize])Fit and return topic proportions.
top_peaks(var_names[, n_top])Return a DataFrame with the top-weighted peaks in each topic.
transform(X[, binarize])Project cells onto learned topics.
Attributes
- property alpha#
- property eta#
- fit(X, binarize=True)#
Fit the model.
- Parameters:
- X
array-likeorsparse,shape (n_cells,n_peaks) Peak accessibility matrix. If binarize is
True(default) values are clipped to {0, 1}.- binarize
bool Clip input values to binary before processing.
- X
- Returns:
self
- transform(X, binarize=True)#
Project cells onto learned topics.
- Parameters:
- X
array-likeorsparse,shape (n_cells,n_peaks) - binarize
bool Clip input values to binary before processing.
- X
- Returns:
theta : ndarray, shape (n_cells, n_topics) Normalised topic proportions per cell.
- fit_transform(X, binarize=True)#
Fit and return topic proportions.