gatac.tl.sample_gc_matched_background#
- gatac.tl.sample_gc_matched_background(regions, genome_fasta, *, background_pool, n_background=None, n_bins=50, replace=True, random_state=0)#
Sample background peaks whose GC-content distribution matches target peaks.
- Parameters:
- regions
list[str]ordict[str,list[str]] Target peaks to match. When a dict is provided, each group is sampled independently and the return value mirrors the same keys.
- genome_fasta
strorPath Path to the reference genome FASTA used to compute GC content.
- background_pool
list[str]ordict[str,list[str]] Candidate background peaks to sample from. When regions is a dict, this can be either one shared pool or a dict keyed like regions.
- n_background
int, optional Number of peaks to sample per target group. Defaults to the size of the corresponding target set.
- n_bins
int, default50 Matching resolution. Larger values enforce tighter GC matching.
- replace
bool, defaultTrue Whether sampled background peaks may be reused. Set to
Falseto require unique sampled peaks within each returned background set.- random_state
int, optional Seed for deterministic sampling.
- regions
- Returns:
list[str] or dict[str, list[str]] GC-matched background peaks with the same container shape as regions.
- Return type:
Examples
>>> matched_bg = ga.tl.sample_gc_matched_background( ... da_peaks, ... genome_fasta="genome.fa", ... background_pool=all_peaks, ... ) >>> matched_bg_by_group = ga.tl.sample_gc_matched_background( ... marker_peaks, ... genome_fasta="genome.fa", ... background_pool=list(peak_adata.var_names), ... replace=False, ... )