scgpt.tasks package
Submodules
scgpt.tasks.cell_emb module
- scgpt.tasks.cell_emb.embed_data(adata_or_file: AnnData | str | PathLike, model_dir: str | PathLike, gene_col: str = 'feature_name', max_length=1200, batch_size=64, obs_to_save: list | None = None, device: str | device = 'cuda', use_fast_transformer: bool = True, return_new_adata: bool = False) AnnData[source]
Preprocess anndata and embed the data using the model.
- Parameters:
adata_or_file (Union[AnnData, PathLike]) – The AnnData object or the path to the AnnData object.
model_dir (PathLike) – The path to the model directory.
gene_col (str) – The column in adata.var that contains the gene names.
max_length (int) – The maximum length of the input sequence. Defaults to 1200.
batch_size (int) – The batch size for inference. Defaults to 64.
obs_to_save (Optional[list]) – The list of obs columns to save in the output adata. Useful for retaining meta data to output. Defaults to None.
device (Union[str, torch.device]) – The device to use. Defaults to “cuda”.
use_fast_transformer (bool) – Whether to use flash-attn. Defaults to True.
return_new_adata (bool) – Whether to return a new AnnData object. If False, will add the cell embeddings to a new
adata.obsmwith key “X_scGPT”.
- Returns:
The AnnData object with the cell embeddings.
- Return type:
AnnData
- scgpt.tasks.cell_emb.get_batch_cell_embeddings(adata, cell_embedding_mode: str = 'cls', model=None, vocab=None, max_length=1200, batch_size=64, model_configs=None, gene_ids=None, use_batch_labels=False) ndarray[source]
Get the cell embeddings for a batch of cells.
- Parameters:
adata (AnnData) – The AnnData object.
cell_embedding_mode (str) – The mode to get the cell embeddings. Defaults to “cls”.
model (TransformerModel, optional) – The model. Defaults to None.
vocab (GeneVocab, optional) – The vocabulary. Defaults to None.
max_length (int) – The maximum length of the input sequence. Defaults to 1200.
batch_size (int) – The batch size for inference. Defaults to 64.
model_configs (dict, optional) – The model configurations. Defaults to None.
gene_ids (np.ndarray, optional) – The gene vocabulary ids. Defaults to None.
use_batch_labels (bool) – Whether to use batch labels. Defaults to False.
- Returns:
The cell embeddings.
- Return type:
np.ndarray