scgpt.tasks package
Submodules
scgpt.tasks.cell_emb module
- scgpt.tasks.cell_emb.embed_data(adata_or_file: Union[AnnData, str, PathLike], model_dir: Union[str, PathLike], cell_type_key: str = 'cell_type', gene_col: str = 'feature_name', max_length=1200, batch_size=64, obs_to_save: Optional[list] = None, device: Union[str, device] = 'cuda', return_new_adata: bool = False) AnnData[source]
Preprocess anndata and embed the data using the model.
- Parameters:
adata_or_file (Union[AnnData, PathLike]) – The AnnData object or the path to the AnnData object.
model_dir (PathLike) – The path to the model directory.
cell_type_key (str) – The key in adata.obs that contains the cell type labels. Defaults to “cell_type”.
gene_col (str) – The column in adata.var that contains the gene names.
max_length (int) – The maximum length of the input sequence. Defaults to 1200.
batch_size (int) – The batch size for inference. Defaults to 64.
obs_to_save (Optional[list]) – The list of obs columns to save in the output adata. If None, will only keep the column of
cell_type_key. Defaults to None.device (Union[str, torch.device]) – The device to use. Defaults to “cuda”.
return_new_adata (bool) – Whether to return a new AnnData object. If False, will add the cell embeddings to a new
adata.obsmwith key “X_scGPT”.
- Returns:
The AnnData object with the cell embeddings.
- Return type:
AnnData
- scgpt.tasks.cell_emb.get_batch_cell_embeddings(adata, cell_embedding_mode: str = 'cls', model=None, vocab=None, max_length=1200, batch_size=64, model_configs=None, gene_ids=None, use_batch_labels=False) ndarray[source]
Get the cell embeddings for a batch of cells.
- Parameters:
adata (AnnData) – The AnnData object.
cell_embedding_mode (str) – The mode to get the cell embeddings. Defaults to “cls”.
model (TransformerModel, optional) – The model. Defaults to None.
vocab (GeneVocab, optional) – The vocabulary. Defaults to None.
max_length (int) – The maximum length of the input sequence. Defaults to 1200.
batch_size (int) – The batch size for inference. Defaults to 64.
model_configs (dict, optional) – The model configurations. Defaults to None.
gene_ids (np.ndarray, optional) – The gene vocabulary ids. Defaults to None.
use_batch_labels (bool) – Whether to use batch labels. Defaults to False.
- Returns:
The cell embeddings.
- Return type:
np.ndarray