gatac tile#
Build a cell × genomic-tile count matrix from either an ATAC-seq fragment
Parquet file or an interval matrix stored as .h5ad or 10x .h5. The output
is an AnnData (.h5ad) object compatible with Scanpy and SnapATAC2.
Synopsis#
gatac tile <input.parquet|input.h5ad|input.h5> -g <genome|chr_sizes>
[-o OUTPUT] [-t TILE_SIZE] [-m MIN_FRAGS]
[-e CHROMS ...] [--metrics METRICS] [--filter QUERY]
[--count-strategy STRATEGY] [--barcode-prefix PREFIX]
[--low-memory]
Arguments#
Positional#
Argument |
Description |
|---|---|
|
Path to a fragment |
Options#
Flag |
Default |
Description |
|---|---|---|
|
required |
Genome name ( |
|
|
Output h5ad path |
|
|
Tile / bin size in base pairs |
|
|
Minimum unique fragments per barcode |
|
|
Chromosomes to exclude (space-separated) |
|
— |
Metrics CSV for quality-based cell filtering |
|
— |
Polars query string applied to metrics |
|
|
Counting strategy: |
|
— |
String prepended to barcodes |
|
off |
Enable low-memory (single row-group) mode |
Count strategies#
Strategy |
Description |
Compatibility |
|---|---|---|
|
Unique fragment insertions per tile (deduplicated) |
SnapATAC2-compatible (default) |
|
All fragment insertions (not deduplicated) |
— |
|
Binary accessibility (0 or 1) |
ArchR-compatible |
For .h5ad and 10x .h5 inputs, GATAC detects interval-like features from
var_names using names such as chr1:100-200 or chr1;100-200. Those
features are aggregated into fixed tiles by overlap. In that mode,
unique and count both preserve the original matrix values, while
binarize clips the final tile matrix to 0/1.
Built-in genomes#
Pass a genome name string to use built-in chromosome sizes:
Name |
Assembly |
|---|---|
|
Human GRCh38 (Gencode v41) |
|
Human GRCh37 |
|
Mouse GRCm38 (Gencode vM25) |
|
Mouse GRCm39 (Gencode vM30) |
Examples#
Quick start#
gatac tile pbmc.parquet -g hg38 -t 500 -m 100
From a 10x peak matrix#
gatac tile filtered_peak_bc_matrix.h5 -g hg38 -t 500 -o pbmc_tile.h5ad
With quality filtering#
gatac tile pbmc.parquet -g hg38 \
--metrics pbmc_metrics.csv \
--filter "tsse_score > 5 and n_unique > 1000" \
-o pbmc_tile.h5ad
Custom tile size, exclude sex chromosomes#
gatac tile pbmc.parquet -g hg38 \
-t 1000 \
-e chrM chrY \
-o pbmc_tile_1kb.h5ad
Binarized matrix (ArchR-style)#
gatac tile pbmc.parquet -g hg38 --count-strategy binarize
Low-memory mode for constrained GPUs#
gatac tile pbmc.parquet -g hg38 --low-memory -o pbmc_tile.h5ad
Python equivalent#
import gatac as ga
adata = ga.pp.make_tile_matrix(
"pbmc.parquet",
chrom_sizes="hg38",
tile_size=500,
min_fragments_per_cell=100,
exclude_chroms=["chrM", "chrY"],
metrics="pbmc_metrics.csv",
filter_query="tsse_score > 5 and n_unique > 1000",
count_strategy="unique",
)
adata.write_h5ad("pbmc_tile.h5ad")
Output AnnData structure#
Slot |
Content |
|---|---|
|
Sparse cell × tile count matrix ( |
|
Barcode metadata (barcode string as index) |
|
Tile metadata: |