gatac.pp.select_features_multi

gatac.pp.select_features_multi#

gatac.pp.select_features_multi(input_paths, output_path, n_features=500000, filter_lower_quantile=0.005, filter_upper_quantile=0.005, binarize=True)#

Streaming feature selection across multiple h5ad files.

Processes files one at a time to avoid OOM, producing a single combined output with selected features from all inputs.

Parameters:
input_paths list of str or Path

List of paths to h5ad files

output_path str or Path

Output path for combined h5ad file

n_features int

Target number of features to select (default: 500000)

filter_lower_quantile float

Lower quantile threshold for filtering (default: 0.005)

filter_upper_quantile float

Upper quantile threshold for filtering (default: 0.005)

chunk_size int

Chunk size for row processing (default: 6000)

binarize bool

Whether to binarize the output matrix (default: True)

Examples

>>> import gatac as ga
>>> ga.pp.select_features_multi(
...     ["sampleA.h5ad", "sampleB.h5ad"],
...     output_path="combined.h5ad",
...     n_features=500_000,
... )