gatac metrics#
Compute per-barcode quality metrics from a fragment Parquet file. All computation is GPU-accelerated using streaming row-group processing, so files larger than GPU VRAM are handled transparently.
Synopsis#
gatac metrics <input.parquet> -g <annotations.gtf>
[-o OUTPUT] [--min-frags N]
[--batch-size N] [--memory-resource RESOURCE]
Arguments#
Positional#
Argument |
Description |
|---|---|
|
Path to the fragment Parquet file |
Options#
Flag |
Default |
Description |
|---|---|---|
|
required |
GTF annotation file (used to extract TSS positions) |
|
|
Output CSV path |
|
|
Minimum unique fragments; barcodes below this are excluded |
|
|
Parquet row-groups processed per GPU batch |
|
|
RAPIDS memory resource ( |
Computed metrics#
Column |
Description |
|---|---|
|
Cell barcode |
|
TSS enrichment score (signal/background ratio) |
|
Number of unique (deduplicated) fragments |
|
Fraction of total reads that are duplicates |
|
Fraction of fragments mapping to mitochondrial contigs |
Examples#
Basic usage#
gatac metrics pbmc.parquet -g GRCh38.gtf.gz -o pbmc_metrics.csv
Adjust batch size for limited VRAM#
gatac metrics pbmc.parquet -g GRCh38.gtf.gz --batch-size 32
Use managed memory (helpful for very large files)#
gatac metrics pbmc.parquet -g GRCh38.gtf.gz \
--memory-resource managed-pool
Python equivalent#
import gatac as ga
metrics = ga.pp.compute_metrics(
"pbmc.parquet",
"GRCh38.gtf.gz",
min_unique_frags=100,
row_groups_per_batch=64,
)
metrics.to_pandas().to_csv("pbmc_metrics.csv", index=False)
Downstream use#
The output CSV is accepted by gatac filter, gatac tile, and gatac gene
via the --metrics flag for on-the-fly quality filtering:
gatac filter pbmc.parquet \
--metrics pbmc_metrics.csv \
--filter "tsse_score > 5 and n_unique > 1000" \
-o pbmc_filtered.parquet