gatac.tl.sample_bg_peaks

gatac.tl.sample_bg_peaks#

gatac.tl.sample_bg_peaks(adata, *, method='knn', n_iterations=50, bg_columns=['gc_content', 'reads_per_peak'], genome_fasta=None, n_neighbors=50, bs=50, w=0.1)#

Sample background peaks for chromVAR analysis.

This function matches foreground peaks with background peaks that have similar biases (e.g., GC content and accessibility). Two methods are supported:

  1. “knn” (default): GPU-accelerated k-NN using cuML. Faster and recommended.

  2. “chromvar”: Original chromVAR binning method. Slower but faithful to R package.

Parameters:
adata AnnData

AnnData object with peak matrix. Must have bias columns in adata.var (e.g., from compute_peak_bias).

method {"knn", "chromvar"}, default "knn"

Background sampling method: - “knn”: cuML nearest neighbors (GPU, faster) - “chromvar”: Original chromVAR binning (CPU, slower)

n_iterations int, default 50

Number of background peaks to sample per peak

bg_columns list[str], default [``”gc_content”, ``"reads_per_peak"]

Columns in adata.var to use for bias matching. Any column listed here that is absent from adata.var will be computed automatically when genome_fasta is provided.

genome_fasta str or Path, optional

Path to genome FASTA file. Required when bg_columns contains “gc_content” and it has not been precomputed.

n_neighbors int, default 50

Number of neighbors for k-NN method (only used if method=”knn”)

bs int, default 50

Bin size for chromVAR method (only used if method=”chromvar”)

w float, default 0.1

Gaussian kernel width for chromVAR method (only used if method=”chromvar”)

Returns:

None Adds adata.varm[“bg_peaks”] with shape (n_peaks, n_iterations) containing background peak indices for each peak.

Return type:

None

Examples

>>> import gatac as ga
>>> # Option A: precompute biases separately
>>> ga.tl.compute_peak_bias(peak_adata, "genome.fa")
>>> ga.tl.sample_bg_peaks(peak_adata, method="knn")
>>>
>>> # Option B: let sample_bg_peaks compute gc_content on the fly
>>> ga.tl.sample_bg_peaks(peak_adata, method="knn", genome_fasta="genome.fa")
>>> peak_adata.varm["bg_peaks"]  # Background indices