gatac.tl.marker_peaks#
- gatac.tl.marker_peaks(adata, groupby, groups=None, reference='rest', max_cells=500, min_pct=0.05, min_log2_fc=1.0, use_raw=False, key_added='marker_peaks', seed=42)#
GPU-accelerated marker peak detection using binomial test.
Identifies differentially accessible peaks between cell groups using the binomial test on binarized accessibility data. This approach follows ArchR’s getMarkerFeatures with testMethod=”binomial”.
For each group, compares the proportion of cells with accessible peaks in the foreground group vs the background (all other cells or a specific reference group).
- Parameters:
- adata
AnnData Annotated data matrix with cells × peaks.
- groupby
str Column name in adata.obs containing group labels.
- groups
strorlistofstr, optional Groups to test. If None, test all groups in groupby.
- reference
str Reference group for comparison: - “rest”: compare each group against all other cells (default) - group name: compare each group against the specified group
- max_cells
int Maximum number of cells to sample from each group (foreground and background). This controls test sensitivity - larger values detect smaller differences but may flag many biologically irrelevant peaks. Default: 500 (following ArchR).
- min_pct
float Minimum fraction of cells (0-1) with accessible peak in either group to include the peak in results. Default: 0.05
- min_log2_fc
float Minimum absolute log2 fold change threshold for results. Default: 1.0
- use_raw
bool If True, use adata.raw.X. Default: False
- key_added
str Key to store results in adata.uns. Default: “marker_peaks”
- seed
int Random seed for cell subsampling reproducibility. Default: 42
- adata
- Returns:
dict[str, pd.DataFrame] Dictionary mapping group names to DataFrames with columns: - “feature”: Peak/feature name - “log2_fc”: Log2 fold change (foreground vs background) - “mean_fg”: Mean accessibility in foreground (proportion of cells) - “mean_bg”: Mean accessibility in background - “mean_diff”: mean_fg - mean_bg - “p_value”: Raw p-value from two-sided binomial test - “fdr”: Benjamini-Hochberg adjusted p-value
- Return type:
Notes
The binomial test compares the observed count of accessible cells in the foreground to what would be expected under the null hypothesis that the foreground has the same accessibility rate as the background.
Important: The test is highly sensitive to sample size. With many cells, even tiny differences become statistically significant. The max_cells parameter (default 500, following ArchR) subsamples both foreground and background to control sensitivity. Adjust min_log2_fc and min_pct to focus on biologically meaningful differences.
Two-sided test is performed (following ArchR): - If fg_rate >= bg_rate: tests for enrichment - If fg_rate < bg_rate: tests for depletion
Results are sorted by FDR (ascending) then by absolute log2_fc (descending).
Examples
>>> import gatac as ga >>> # Find marker peaks for all clusters >>> results = ga.tl.marker_peaks(adata, groupby="cluster") >>> >>> # Get top markers for a specific cluster >>> cd8_markers = results["CD8_T"].filter(pl.col("fdr") < 0.05).head(100) >>> >>> # For more sensitive detection (more hits), increase max_cells: >>> results = ga.tl.marker_peaks(adata, groupby="cluster", max_cells=1000, min_log2_fc=0.5)