gatac.tl.sample_gc_matched_background

gatac.tl.sample_gc_matched_background#

gatac.tl.sample_gc_matched_background(regions, genome_fasta, *, background_pool, n_background=None, n_bins=50, replace=True, random_state=0)#

Sample background peaks whose GC-content distribution matches target peaks.

Parameters:

regions list[str] or dict[str, list[str]]: Target peaks to match. When a dict is provided, each group is sampled independently and the return value mirrors the same keys.
genome_fasta str or Path: Path to the reference genome FASTA used to compute GC content.
background_pool list[str] or dict[str, list[str]]: Candidate background peaks to sample from. When regions is a dict, this can be either one shared pool or a dict keyed like regions.
n_background int, optional: Number of peaks to sample per target group. Defaults to the size of the corresponding target set.
n_bins int, default 50: Matching resolution. Larger values enforce tighter GC matching.
replace bool, default True: Whether sampled background peaks may be reused. Set to False to require unique sampled peaks within each returned background set.
random_state int, optional: Seed for deterministic sampling.

Returns:

list[str] or dict[str, list[str]] GC-matched background peaks with the same container shape as regions.

Return type:

list[str] | dict[str, list[str]]

Examples

>>> matched_bg = ga.tl.sample_gc_matched_background(
...     da_peaks,
...     genome_fasta="genome.fa",
...     background_pool=all_peaks,
... )
>>> matched_bg_by_group = ga.tl.sample_gc_matched_background(
...     marker_peaks,
...     genome_fasta="genome.fa",
...     background_pool=list(peak_adata.var_names),
...     replace=False,
... )

gatac.tl.sample_gc_matched_background

Contents

gatac.tl.sample_gc_matched_background#