scgpt.tasks package

Submodules

scgpt.tasks.cell_emb module

scgpt.tasks.cell_emb.embed_data(adata_or_file: Union[AnnData, str, PathLike], model_dir: Union[str, PathLike], cell_type_key: str = 'cell_type', gene_col: str = 'feature_name', max_length=1200, batch_size=64, obs_to_save: Optional[list] = None, device: Union[str, device] = 'cuda', return_new_adata: bool = False) AnnData[source]

Preprocess anndata and embed the data using the model.

Parameters:
  • adata_or_file (Union[AnnData, PathLike]) – The AnnData object or the path to the AnnData object.

  • model_dir (PathLike) – The path to the model directory.

  • cell_type_key (str) – The key in adata.obs that contains the cell type labels. Defaults to “cell_type”.

  • gene_col (str) – The column in adata.var that contains the gene names.

  • max_length (int) – The maximum length of the input sequence. Defaults to 1200.

  • batch_size (int) – The batch size for inference. Defaults to 64.

  • obs_to_save (Optional[list]) – The list of obs columns to save in the output adata. If None, will only keep the column of cell_type_key. Defaults to None.

  • device (Union[str, torch.device]) – The device to use. Defaults to “cuda”.

  • return_new_adata (bool) – Whether to return a new AnnData object. If False, will add the cell embeddings to a new adata.obsm with key “X_scGPT”.

Returns:

The AnnData object with the cell embeddings.

Return type:

AnnData

scgpt.tasks.cell_emb.get_batch_cell_embeddings(adata, cell_embedding_mode: str = 'cls', model=None, vocab=None, max_length=1200, batch_size=64, model_configs=None, gene_ids=None, use_batch_labels=False) ndarray[source]

Get the cell embeddings for a batch of cells.

Parameters:
  • adata (AnnData) – The AnnData object.

  • cell_embedding_mode (str) – The mode to get the cell embeddings. Defaults to “cls”.

  • model (TransformerModel, optional) – The model. Defaults to None.

  • vocab (GeneVocab, optional) – The vocabulary. Defaults to None.

  • max_length (int) – The maximum length of the input sequence. Defaults to 1200.

  • batch_size (int) – The batch size for inference. Defaults to 64.

  • model_configs (dict, optional) – The model configurations. Defaults to None.

  • gene_ids (np.ndarray, optional) – The gene vocabulary ids. Defaults to None.

  • use_batch_labels (bool) – Whether to use batch labels. Defaults to False.

Returns:

The cell embeddings.

Return type:

np.ndarray

scgpt.tasks.grn module

class scgpt.tasks.grn.GeneEmbedding(embeddings: Mapping)[source]

Bases: object

static average_vector_results(vec1, vec2, fname)[source]
cluster_definitions_as_df(top_n=20)[source]
compute_similarities(gene, subset=None, feature_type=None)[source]
generate_network(threshold=0.5)[source]
generate_vector(genes)[source]
generate_weighted_vector(genes, weights)[source]
get_adata(resolution=20)[source]
get_metagenes(gdata)[source]
get_similar_genes(vector)[source]
plot_metagene(gdata, mg=None, title='Gene Embedding')[source]
plot_metagenes_scores(adata, metagenes, column, plot=None)[source]
plot_similarities(gene, n_genes=10, save=None)[source]
read_embedding(filename)[source]
static read_vector(vec)[source]
score_metagenes(adata, metagenes)[source]

Module contents