gatac.tl.marker_peaks

gatac.tl.marker_peaks#

gatac.tl.marker_peaks(adata, groupby, groups=None, reference='rest', max_cells=500, min_pct=0.05, min_log2_fc=1.0, use_raw=False, key_added='marker_peaks', seed=42)#

GPU-accelerated marker peak detection using binomial test.

Identifies differentially accessible peaks between cell groups using the binomial test on binarized accessibility data. This approach follows ArchR’s getMarkerFeatures with testMethod=”binomial”.

For each group, compares the proportion of cells with accessible peaks in the foreground group vs the background (all other cells or a specific reference group).

Parameters:

adata AnnData: Annotated data matrix with cells × peaks.
groupby str: Column name in adata.obs containing group labels.
groups str or list of str, optional: Groups to test. If None, test all groups in groupby.
reference str: Reference group for comparison: - “rest”: compare each group against all other cells (default) - group name: compare each group against the specified group
max_cells int: Maximum number of cells to sample from each group (foreground and background). This controls test sensitivity - larger values detect smaller differences but may flag many biologically irrelevant peaks. Default: 500 (following ArchR).
min_pct float: Minimum fraction of cells (0-1) with accessible peak in either group to include the peak in results. Default: 0.05
min_log2_fc float: Minimum absolute log2 fold change threshold for results. Default: 1.0
use_raw bool: If True, use adata.raw.X. Default: False
key_added str: Key to store results in adata.uns. Default: “marker_peaks”
seed int: Random seed for cell subsampling reproducibility. Default: 42

Returns:

dict[str, pd.DataFrame] Dictionary mapping group names to DataFrames with columns: - “feature”: Peak/feature name - “log2_fc”: Log2 fold change (foreground vs background) - “mean_fg”: Mean accessibility in foreground (proportion of cells) - “mean_bg”: Mean accessibility in background - “mean_diff”: mean_fg - mean_bg - “p_value”: Raw p-value from two-sided binomial test - “fdr”: Benjamini-Hochberg adjusted p-value

Return type:

dict[str, DataFrame]

Notes

The binomial test compares the observed count of accessible cells in the foreground to what would be expected under the null hypothesis that the foreground has the same accessibility rate as the background.

Important: The test is highly sensitive to sample size. With many cells, even tiny differences become statistically significant. The max_cells parameter (default 500, following ArchR) subsamples both foreground and background to control sensitivity. Adjust min_log2_fc and min_pct to focus on biologically meaningful differences.

Two-sided test is performed (following ArchR): - If fg_rate >= bg_rate: tests for enrichment - If fg_rate < bg_rate: tests for depletion

Results are sorted by FDR (ascending) then by absolute log2_fc (descending).

Examples

>>> import gatac as ga
>>> # Find marker peaks for all clusters
>>> results = ga.tl.marker_peaks(adata, groupby="cluster")
>>>
>>> # Get top markers for a specific cluster
>>> cd8_markers = results["CD8_T"].filter(pl.col("fdr") < 0.05).head(100)
>>>
>>> # For more sensitive detection (more hits), increase max_cells:
>>> results = ga.tl.marker_peaks(adata, groupby="cluster", max_cells=1000, min_log2_fc=0.5)

gatac.tl.marker_peaks

Contents

gatac.tl.marker_peaks#