gatac.pp.read_fragments_parquet

gatac.pp.read_fragments_parquet#

gatac.pp.read_fragments_parquet(input_path, low_memory=True, columns=None)#

Read ATAC fragments from Parquet file optimized for GPU memory.

Note: Parquet files contain intrinsic schema metadata. This function expects the file to contain columns matching FRAGMENT_DTYPES keys: [‘chrom’, ‘start’, ‘end’, ‘barcode’, ‘count’].

Examples

>>> import gatac as ga
>>> df = ga.pp.read_fragments_parquet("pbmc.parquet")
>>> df.columns.tolist()
['chrom', 'start', 'end', 'barcode', 'count']
Parameters:
input_path str | Path

low_memory bool

columns List[str] | None

Return type:

cudf.DataFrame