gatac.pp.make_gene_score_matrix#
- gatac.pp.make_gene_score_matrix(input_parquet, gene_anno, output_path=None, gene_model='exp(-abs(x)/5000) + exp(-1)', tile_size=500, extend_upstream=(1000, 100000), extend_downstream=(1000, 100000), gene_upstream=5000, gene_downstream=0, use_gene_boundaries=True, use_tss=False, ceiling=4, gene_scale_factor=5.0, scale_to=10000.0, exclude_chroms=('chrY', 'chrM'), min_fragments_per_cell=100, metrics=None, filter_query=None, barcode_prefix=None, low_memory=False, cell_batch_size=None, gene_name_key='gene_name', gene_id_key='gene_id')#
GPU-accelerated ArchR-style gene activity score matrix.
Faithful port of ArchR
addGeneScoreMatrix. See module docstring for the algorithm. Parameter defaults match ArchR’s defaults.- Parameters:
- input_parquet str | Path
ATAC fragments parquet (columns: chrom, start, end, barcode, count).
- gene_anno str | Path
GTF/GFF gene annotation, or a CSV with columns
symbol, seqnames, start, end, strand.- gene_model Union[str, Callable]
ArchR
geneModel: an expression string inx(signed distance to TSS), or a Python callablef(x)->weight.- tile_size int
ArchR tiling / capping / normalisation / gene-width-weight parameters.
- ceiling int
ArchR tiling / capping / normalisation / gene-width-weight parameters.
- scale_to float
ArchR tiling / capping / normalisation / gene-width-weight parameters.
- gene_scale_factor float
ArchR tiling / capping / normalisation / gene-width-weight parameters.
- extend_upstream Tuple[int, int]
(min, max)bp extension used for the regulatory search window.- extend_downstream Tuple[int, int]
(min, max)bp extension used for the regulatory search window.- gene_upstream int
bp the gene body is grown before the model is applied.
- gene_downstream int
bp the gene body is grown before the model is applied.
- use_gene_boundaries bool
Clip windows so tiles cannot contribute across a neighbouring gene.
- use_tss bool
Build the model on the 1bp TSS rather than the gene body.
- output_path Optional[str | Path]
- exclude_chroms Optional[list]
- min_fragments_per_cell int
- metrics Optional[str | Path | 'cudf.DataFrame']
- filter_query Optional[str]
- barcode_prefix Optional[str]
- low_memory bool
- cell_batch_size Optional[int]
- gene_name_key str
- gene_id_key str
- Returns:
AnnData of shape
(cells, genes)with normalised gene scores.- Return type:
sc.AnnData