gatac.tl.compute_deviations

gatac.tl.compute_deviations#

gatac.tl.compute_deviations(adata, *, batch_size=5000, motif_batch_size=-1, key_added='chromvar', return_adata=False)#

Compute chromVAR TF deviation scores.

Computes per-cell, per-motif deviation scores normalized by background expectation. Requires prior setup:

  1. sample_bg_peaks() to generate adata.varm[“bg_peaks”]

  2. scan_motifs() to generate adata.varm[“motif_match”]

The algorithm: - For each cell, computes observed motif accessibility - Computes expected accessibility based on overall peak accessibility and cell depth - For background peaks, computes deviation - Z-score normalizes: (observed_dev - mean_bg_dev) / std_bg_dev

Parameters:
adata AnnData

AnnData object with peak matrix (cells × peaks). Must have: - adata.varm[“bg_peaks”]: Background peak indices from sample_bg_peaks() - adata.varm[“motif_match”]: Motif match matrix from scan_motifs() - adata.uns[“motif_name”]: Motif names from scan_motifs()

batch_size int, default 5000

Number of cells to process at once. Reduce if GPU memory is limited.

motif_batch_size int, default -1

Number of motifs to process at once. If -1, uses default of 100 motifs to balance memory usage and speed. Reduce further for very large datasets.

key_added str, default "chromvar"

Key under which the deviation DataFrame is stored in adata.obsm.

return_adata bool, default False

If True, also return a new AnnData with deviations as .X.

Returns:

None or AnnData Always stores deviations as a DataFrame in adata.obsm[key_added]. Returns an AnnData (cells × motifs) only when return_adata=True.

Return type:

AnnData | None

Examples

>>> import gatac as ga
>>>
>>> # 1. Create peak matrix
>>> peak_adata = ga.tl.make_peak_matrix(tile_adata, parquet_path)
>>>
>>> # 2. Compute biases and sample background
>>> ga.tl.compute_peak_bias(peak_adata, "genome.fa")
>>> ga.tl.sample_bg_peaks(peak_adata)
>>>
>>> # 3. Scan motifs
>>> motifs = ga.tl.read_motifs("motifs.meme")
>>> ga.tl.scan_motifs(peak_adata, motifs, "genome.fa")
>>>
>>> # 4. Compute deviations (stored in peak_adata.obsm["chromvar"])
>>> ga.tl.compute_deviations(peak_adata)
>>> peak_adata.obsm["chromvar"]  # DataFrame (cells × motifs)