gatac.tl.MiniBatchLDA

gatac.tl.MiniBatchLDA#

class gatac.tl.MiniBatchLDA(n_topics=20, *, alpha=None, eta=None, batch_size=256, n_epochs=5, kappa=0.7, tau=64.0, e_step_iters=20, use_full_gpu_matrix=False, seed=0, verbose=True)#

Bases: object

GPU-accelerated Mini-batch LDA via Online Variational Bayes.

Learns latent topics from a binary cell-by-peak matrix using stochastic variational inference with CuPy GPU acceleration. The E-step works directly on the CSR sparsity pattern so the per-cell working set is K × nnz_i instead of K × V.

Parameters:

n_topics int: Number of topics.
alpha float or None: Symmetric Dirichlet prior on cell-topic distributions. If None, defaults to 1 / n_topics.
eta float or None: Symmetric Dirichlet prior on topic-peak distributions. If None, defaults to 1 / n_topics.
batch_size int: Cells per mini-batch.
n_epochs int: Full passes over the data.
kappa float: Learning rate decay exponent in (0.5, 1].
tau float: Learning rate offset (down-weights early updates).
e_step_iters int: Coordinate-ascent iterations in each E-step.
use_full_gpu_matrix bool: If True, try to cache the full CSR input matrix on GPU. Disabled by default because very large matrices can exhaust GPU memory before training starts.
seed int: Random seed.
verbose bool: Print epoch progress.

Examples

>>> import gatac as ga
>>> model = ga.tl.MiniBatchLDA(n_topics=20, n_epochs=10, verbose=True)
>>> model.fit_transform(peak_adata.X, binarize=True)
>>> # Inspect the most-weighted peaks per topic
>>> top = model.top_peaks(peak_adata.var_names, n_top=20)

__init__(n_topics=20, *, alpha=None, eta=None, batch_size=256, n_epochs=5, kappa=0.7, tau=64.0, e_step_iters=20, use_full_gpu_matrix=False, seed=0, verbose=True)#

Methods

`__init__`([n_topics, alpha, eta, batch_size, ...])
`fit`(X[, binarize])	Fit the model.
`fit_transform`(X[, binarize])	Fit and return topic proportions.
`top_peaks`(var_names[, n_top])	Return a DataFrame with the top-weighted peaks in each topic.
`transform`(X[, binarize])	Project cells onto learned topics.

Attributes

`alpha`
`eta`

property alpha#

property eta#

fit(X, binarize=True)#

Fit the model.

Parameters:

X array-like or sparse, shape (n_cells, n_peaks): Peak accessibility matrix. If binarize is True (default) values are clipped to {0, 1}.
binarize bool: Clip input values to binary before processing.

Returns:

self

transform(X, binarize=True)#

Project cells onto learned topics.

Parameters:

X array-like or sparse, shape (n_cells, n_peaks)
binarize bool: Clip input values to binary before processing.

Returns:

theta : ndarray, shape (n_cells, n_topics) Normalised topic proportions per cell.

fit_transform(X, binarize=True)#: Fit and return topic proportions.

top_peaks(var_names, n_top=20)#

Return a DataFrame with the top-weighted peaks in each topic.

Parameters:

var_names array-like: Peak / variable names (e.g. adata.var_names).
n_top int: Number of top peaks per topic.

Returns:

pd.DataFrame Columns Topic_0 … Topic_{K-1}, rows are ranks.

gatac.tl.MiniBatchLDA

Contents

gatac.tl.MiniBatchLDA#