gatac.pp.select_features_multi#
- gatac.pp.select_features_multi(input_paths, output_path, n_features=500000, filter_lower_quantile=0.005, filter_upper_quantile=0.005, binarize=True)#
Streaming feature selection across multiple h5ad files.
Processes files one at a time to avoid OOM, producing a single combined output with selected features from all inputs.
- Parameters:
- input_paths
listofstrorPath List of paths to h5ad files
- output_path
strorPath Output path for combined h5ad file
- n_features
int Target number of features to select (default: 500000)
- filter_lower_quantile
float Lower quantile threshold for filtering (default: 0.005)
- filter_upper_quantile
float Upper quantile threshold for filtering (default: 0.005)
- chunk_size
int Chunk size for row processing (default: 6000)
- binarize
bool Whether to binarize the output matrix (default: True)
- input_paths
Examples
>>> import gatac as ga >>> ga.pp.select_features_multi( ... ["sampleA.h5ad", "sampleB.h5ad"], ... output_path="combined.h5ad", ... n_features=500_000, ... )