gatac metrics

Contents

`gatac metrics`#

Compute per-barcode quality metrics from a fragment Parquet file. All computation is GPU-accelerated using streaming row-group processing, so files larger than GPU VRAM are handled transparently.

Synopsis#

gatac metrics <input.parquet> -g <annotations.gtf>
              [-o OUTPUT] [--min-frags N]
              [--batch-size N] [--memory-resource RESOURCE]

Arguments#

Positional#

Argument	Description
`input.parquet`	Path to the fragment Parquet file

Options#

Flag	Default	Description
`-g`, `--gtf`	required	GTF annotation file (used to extract TSS positions)
`-o`, `--output`	`<input>_metrics.csv`	Output CSV path
`--min-frags`	`100`	Minimum unique fragments; barcodes below this are excluded
`--batch-size`	`64`	Parquet row-groups processed per GPU batch
`--memory-resource`	`cuda-async`	RAPIDS memory resource (`cuda-async`, `managed`, `managed-pool`, `cuda`)

Computed metrics#

Column	Description
`barcode`	Cell barcode
`tsse_score`	TSS enrichment score (signal/background ratio)
`n_unique`	Number of unique (deduplicated) fragments
`duplicate_fraction`	Fraction of total reads that are duplicates
`mito_fraction`	Fraction of fragments mapping to mitochondrial contigs

Examples#

Basic usage#

gatac metrics pbmc.parquet -g GRCh38.gtf.gz -o pbmc_metrics.csv

Adjust batch size for limited VRAM#

gatac metrics pbmc.parquet -g GRCh38.gtf.gz --batch-size 32

Use managed memory (helpful for very large files)#

gatac metrics pbmc.parquet -g GRCh38.gtf.gz \
    --memory-resource managed-pool

Python equivalent#

import gatac as ga

metrics = ga.pp.compute_metrics(
    "pbmc.parquet",
    "GRCh38.gtf.gz",
    min_unique_frags=100,
    row_groups_per_batch=64,
)
metrics.to_pandas().to_csv("pbmc_metrics.csv", index=False)

Downstream use#

The output CSV is accepted by gatac filter, gatac tile, and gatac gene via the --metrics flag for on-the-fly quality filtering:

gatac filter pbmc.parquet \
    --metrics pbmc_metrics.csv \
    --filter "tsse_score > 5 and n_unique > 1000" \
    -o pbmc_filtered.parquet