gatac.tl.MiniBatchLDA#

class gatac.tl.MiniBatchLDA(n_topics=20, *, alpha=None, eta=None, batch_size=256, n_epochs=5, kappa=0.7, tau=64.0, e_step_iters=20, use_full_gpu_matrix=False, seed=0, verbose=True)#

Bases: object

GPU-accelerated Mini-batch LDA via Online Variational Bayes.

Learns latent topics from a binary cell-by-peak matrix using stochastic variational inference with CuPy GPU acceleration. The E-step works directly on the CSR sparsity pattern so the per-cell working set is K × nnz_i instead of K × V.

Parameters:
n_topics int

Number of topics.

alpha float or None

Symmetric Dirichlet prior on cell-topic distributions. If None, defaults to 1 / n_topics.

eta float or None

Symmetric Dirichlet prior on topic-peak distributions. If None, defaults to 1 / n_topics.

batch_size int

Cells per mini-batch.

n_epochs int

Full passes over the data.

kappa float

Learning rate decay exponent in (0.5, 1].

tau float

Learning rate offset (down-weights early updates).

e_step_iters int

Coordinate-ascent iterations in each E-step.

use_full_gpu_matrix bool

If True, try to cache the full CSR input matrix on GPU. Disabled by default because very large matrices can exhaust GPU memory before training starts.

seed int

Random seed.

verbose bool

Print epoch progress.

Examples

>>> import gatac as ga
>>> model = ga.tl.MiniBatchLDA(n_topics=20, n_epochs=10, verbose=True)
>>> model.fit_transform(peak_adata.X, binarize=True)
>>> # Inspect the most-weighted peaks per topic
>>> top = model.top_peaks(peak_adata.var_names, n_top=20)
__init__(n_topics=20, *, alpha=None, eta=None, batch_size=256, n_epochs=5, kappa=0.7, tau=64.0, e_step_iters=20, use_full_gpu_matrix=False, seed=0, verbose=True)#

Methods

__init__([n_topics, alpha, eta, batch_size, ...])

fit(X[, binarize])

Fit the model.

fit_transform(X[, binarize])

Fit and return topic proportions.

top_peaks(var_names[, n_top])

Return a DataFrame with the top-weighted peaks in each topic.

transform(X[, binarize])

Project cells onto learned topics.

Attributes

property alpha#
property eta#
fit(X, binarize=True)#

Fit the model.

Parameters:
X array-like or sparse, shape (n_cells, n_peaks)

Peak accessibility matrix. If binarize is True (default) values are clipped to {0, 1}.

binarize bool

Clip input values to binary before processing.

Returns:

self

transform(X, binarize=True)#

Project cells onto learned topics.

Parameters:
X array-like or sparse, shape (n_cells, n_peaks)

binarize bool

Clip input values to binary before processing.

Returns:

theta : ndarray, shape (n_cells, n_topics) Normalised topic proportions per cell.

fit_transform(X, binarize=True)#

Fit and return topic proportions.

top_peaks(var_names, n_top=20)#

Return a DataFrame with the top-weighted peaks in each topic.

Parameters:
var_names array-like

Peak / variable names (e.g. adata.var_names).

n_top int

Number of top peaks per topic.

Returns:

pd.DataFrame Columns Topic_0 Topic_{K-1}, rows are ranks.