gatac.tl.call_peaks#
- gatac.tl.call_peaks(adata, groupby, parquet_path, *, genome='hg38', q_thresh=0.05, d_treat=200, d_ctrl=10000, max_gap=30, peak_amp=200, fe_cutoff=1.0, key_added='gmacs', inplace=True, verbose=True, batch_size=10, filter_chromosomes=True)#
GPU-accelerated peak calling per cluster using gmacs algorithm.
Calls peaks for each group of cells defined by groupby, using the fragment data from parquet files. Similar to SnapATAC2’s macs3 function but using GPU-accelerated gmacs.
- Parameters:
- adata
AnnData Annotated data matrix (typically a tiled matrix). If source_file is in obs, it’s used to map cells to parquet files.
- groupby
str Key in adata.obs containing cluster/group assignments.
- parquet_path
strorPath Path to parquet file or directory containing parquet files. If directory, uses source_file obs to find specific files.
- genome
strordict Genome name (e.g., ‘hg38’, ‘mm10’) or dict of chromosome sizes. Used to compute genome_length for significance calculations and for chromosome filtering.
- q_thresh
float Q-value threshold for peak calling (default: 0.05)
- d_treat
int Treatment extension distance in bp (default: 200)
- d_ctrl
int Large local control extension distance in bp (default: 10000). Corresponds to MACS3’s –llocal parameter.
- max_gap
int Maximum gap for merging peaks (default: 30)
- peak_amp
int Minimum peak amplitude in bp (default: 200)
- fe_cutoff
float Fold enrichment cutoff (default: 1.0) Peaks with signal_value < fe_cutoff are filtered out
- key_added
str Key in adata.uns where peaks will be stored (default: ‘gmacs’)
- inplace
bool If True, store results in adata.uns[key_added] (default: True)
- verbose
bool If True, show tqdm progress bars (default: True)
- batch_size
int Number of parquet row groups to load at once (default: 10). Increase for faster processing, decrease to reduce memory usage.
- filter_chromosomes
bool If True, only keep fragments from chromosomes recognized by the provided genome (keys if genome is a dict, or standard chromosomes for the named genome) (default: True).
- adata
- Returns:
dict[str, pd.DataFrame] or None If inplace=False, returns dict mapping group names to peak DataFrames. Each DataFrame has columns: chrom, start, end, name, score, strand, signal_value, p_value, q_value, peak.
- Return type:
Optional[dict]
Examples
>>> import gatac >>> # Load tiled AnnData with cluster assignments >>> adata = sc.read_h5ad("tiled_matrix.h5ad") >>> # Call peaks per cluster (default settings for ATAC-seq) >>> gatac.tl.call_peaks( ... adata, ... groupby="leiden", ... parquet_path="/path/to/fragments.parquet", ... ) >>> # Access peaks for cluster "0" >>> peaks_cluster0 = adata.uns["gmacs"]["0"]