scgpt package

Modules

scgpt.data_collator

scgpt.data_sampler

class scgpt.data_sampler.SubsetSequentialSampler(indices: Sequence[int])[source]

Bases: Sampler

Samples elements sequentially from a given list of indices, without replacement.

Parameters:: indices (sequence) – a sequence of indices

class scgpt.data_sampler.SubsetsBatchSampler(subsets: List[Sequence[int]], batch_size: int, intra_subset_shuffle: bool = True, inter_subset_shuffle: bool = True, drop_last: bool = False)[source]

Bases: Sampler[List[int]]

Samples batches of indices from a list of subsets of indices. Each subset of indices represents a data subset and is sampled without replacement randomly or sequentially. Specially, each batch only contains indices from a single subset. This sampler is for the scenario where samples need to be drawn from multiple subsets separately.

Parameters:

subsets (List[Sequence[int]]) – A list of subsets of indices.
batch_size (int) – Size of mini-batch.
intra_subset_shuffle (bool) – If True, the sampler will shuffle the indices within each subset.
inter_subset_shuffle (bool) – If True, the sampler will shuffle the order of subsets.
drop_last (bool) – If True, the sampler will drop the last batch if its size would be less than batch_size.

scgpt.loss

scgpt.loss.criterion_neg_log_bernoulli(input: Tensor, target: Tensor, mask: Tensor) → Tensor[source]: Compute the negative log-likelihood of Bernoulli distribution

scgpt.loss.masked_mse_loss(input: Tensor, target: Tensor, mask: Tensor) → Tensor[source]: Compute the masked MSE loss between input and target.

scgpt.loss.masked_relative_error(input: Tensor, target: Tensor, mask: LongTensor) → Tensor[source]: Compute the masked relative error between input and target.

scgpt.preprocess

class scgpt.preprocess.Preprocessor(use_key: Optional[str] = None, filter_gene_by_counts: Union[int, bool] = False, filter_cell_by_counts: Union[int, bool] = False, normalize_total: Union[float, bool] = 10000.0, result_normed_key: Optional[str] = 'X_normed', log1p: bool = False, result_log1p_key: str = 'X_log1p', subset_hvg: Union[int, bool] = False, hvg_use_key: Optional[str] = None, hvg_flavor: str = 'seurat_v3', binning: Optional[int] = None, result_binned_key: str = 'X_binned')[source]

Bases: object

Prepare data into training, valid and test split. Normalize raw expression values, binning or using other transform into the preset model input format.

check_logged(adata: AnnData, obs_key: Optional[str] = None) → bool[source]

Check if the data is already log1p transformed.

Args:

adata (AnnData):: The AnnData object to preprocess.
obs_key (str, optional):: The key of AnnData.obs to use for batch information. This arg is used in the highly variable gene selection step.

scgpt.preprocess.binning(row: Union[ndarray, Tensor], n_bins: int) → Union[ndarray, Tensor][source]: Binning the row into n_bins.