Go to main content

PDF

Description

Single-cell RNA sequence datasets often contain unwanted technical variation from differences in sample collection, protocol, sequencing depth, experimental labs, and biological factors. These nuisance factors, known as batch effects, are especially common in newer datasets that span multiple conditions and hundreds of donors. To correct for such batch effects, integration methods like single-cell variational inference (scVI) combine samples of data and produce a self-consistent version for downstream analysis. In this thesis, we benchmark scVI's current performance on complex integration tasks of 100+ donor datasets, evaluating its ability to both remove batch effects and retain important biological information. We further propose the addition of a donor embedding to the model architecture, and demonstrate that the embedding is effective at interpreting batch correction for confounding covariates. Finally, we assess scVI integration in relation to gene expression through a scoring protocol that measures the batch sensitivity of each gene.

Details

Files

Statistics

from
to
Export
Download Full History
Formats
Format
BibTeX
MARCXML
TextMARC
MARC
DublinCore
EndNote
NLM
RefWorks
RIS