gatac doublets#
Detect doublet / multiplet cells in a single-cell ATAC-seq sample using the AMULET Poisson method of Thibodeau et al. (2021). Cells with significantly more overlapping insertions than expected are flagged as doublets.
Synopsis#
gatac doublets <input.parquet> -g <genome> [-o OUTPUT]
[-m MIN_FRAGS] [--q Q] [--q-rep Q_REP]
[--expected-overlap N] [--max-insert N]
[--min-overlap N] [--repeat-filter BED]
[-j THREADS]
Arguments#
Positional#
Argument |
Description |
|---|---|
|
Path to the fragment Parquet file |
Options#
Flag |
Default |
Description |
|---|---|---|
|
required |
Genome name ( |
|
|
Output CSV path |
|
|
Minimum unique fragments per cell to include |
|
|
FDR threshold for doublet calling |
|
|
FDR threshold for inferring repetitive regions |
|
|
Expected number of reads overlapping (Poisson mean) |
|
|
Maximum fragment insert size in bp |
|
|
Minimum overlap length in bp |
|
— |
BED file of known repetitive regions to exclude |
|
|
Parallel workers for overlap detection |
Output#
A CSV with one row per cell:
Column |
Description |
|---|---|
|
Cell barcode |
|
Poisson p-value for the observed overlap count |
|
Benjamini–Hochberg adjusted p-value |
|
|
Examples#
Basic usage#
gatac doublets pbmc.parquet -g hg38 -o pbmc_doublets.csv
Stricter threshold#
gatac doublets pbmc.parquet -g hg38 --q 0.001 -o pbmc_doublets.csv
Exclude known repetitive regions#
gatac doublets pbmc.parquet -g hg38 --repeat-filter hg38_repeats.bed
Python equivalent#
import gatac as ga
result = ga.pp.detect_doublets(
"pbmc.parquet",
chrom_sizes="hg38",
min_fragments=100,
q_threshold=0.01,
)
print(result.head())
# cell_id p_value q_value is_doublet
# 0 AAAC... 0.8321 0.9412 False
# 1 AAAC... 0.0001 0.0034 True
# ...
# Drop doublets from an AnnData
doublets = set(result.loc[result["is_doublet"], "cell_id"])
adata = adata[~adata.obs_names.isin(doublets)].copy()
Notes#
AMULET’s Poisson model assumes a uniform single-copy background and is
designed for autosomes only. Make sure the supplied chrom_sizes
covers the autosomes of your species (the built-in hg38, hg19, mm10,
mm39 aliases do).