scgpt.tasks package

Submodules

scgpt.tasks.cell_emb module

scgpt.tasks.cell_emb.embed_data(adata_or_file: AnnData | str | PathLike, model_dir: str | PathLike, gene_col: str = 'feature_name', max_length=1200, batch_size=64, obs_to_save: list | None = None, device: str | device = 'cuda', use_fast_transformer: bool = True, return_new_adata: bool = False) → AnnData[source]

Preprocess anndata and embed the data using the model.

Parameters:

adata_or_file (Union[AnnData, PathLike]) – The AnnData object or the path to the AnnData object.
model_dir (PathLike) – The path to the model directory.
gene_col (str) – The column in adata.var that contains the gene names.
max_length (int) – The maximum length of the input sequence. Defaults to 1200.
batch_size (int) – The batch size for inference. Defaults to 64.
obs_to_save (Optional[list]) – The list of obs columns to save in the output adata. Useful for retaining meta data to output. Defaults to None.
device (Union[str, torch.device]) – The device to use. Defaults to “cuda”.
use_fast_transformer (bool) – Whether to use flash-attn. Defaults to True.
return_new_adata (bool) – Whether to return a new AnnData object. If False, will add the cell embeddings to a new adata.obsm with key “X_scGPT”.

Returns:

The AnnData object with the cell embeddings.

Return type:

AnnData

scgpt.tasks.cell_emb.get_batch_cell_embeddings(adata, cell_embedding_mode: str = 'cls', model=None, vocab=None, max_length=1200, batch_size=64, model_configs=None, gene_ids=None, use_batch_labels=False) → ndarray[source]

Get the cell embeddings for a batch of cells.

Parameters:

adata (AnnData) – The AnnData object.
cell_embedding_mode (str) – The mode to get the cell embeddings. Defaults to “cls”.
model (TransformerModel, optional) – The model. Defaults to None.
vocab (GeneVocab, optional) – The vocabulary. Defaults to None.
max_length (int) – The maximum length of the input sequence. Defaults to 1200.
batch_size (int) – The batch size for inference. Defaults to 64.
model_configs (dict, optional) – The model configurations. Defaults to None.
gene_ids (np.ndarray, optional) – The gene vocabulary ids. Defaults to None.
use_batch_labels (bool) – Whether to use batch labels. Defaults to False.

Returns:

The cell embeddings.

Return type:

np.ndarray

scgpt.tasks.grn module

class scgpt.tasks.grn.GeneEmbedding(embeddings: Mapping)[source]

Bases: object

static average_vector_results(vec1, vec2, fname)[source]

cluster_definitions_as_df(top_n=20)[source]

compute_similarities(gene, subset=None, feature_type=None)[source]

generate_network(threshold=0.5)[source]

generate_vector(genes)[source]

generate_weighted_vector(genes, weights)[source]

get_adata(resolution=20)[source]

get_metagenes(gdata)[source]

get_similar_genes(vector)[source]

plot_metagene(gdata, mg=None, title='Gene Embedding')[source]

plot_metagenes_scores(adata, metagenes, column, plot=None)[source]

plot_similarities(gene, n_genes=10, save=None)[source]

read_embedding(filename)[source]

static read_vector(vec)[source]

score_metagenes(adata, metagenes)[source]

scgpt.tasks package

Submodules

scgpt.tasks.cell_emb module

scgpt.tasks.grn module

Module contents