13. Pseudotemporal ordering#
13.1. Motivation#
Single-cell sequencing assays provide high resolution measurements of biological tissues [Islam et al., 2011], [Hwang et al., 2018]. Consequently, such technologies can help decipher and understand cellular heterogeneity [Briggs et al., 2018], [Sikkema et al., 2022] and the dynamics of a biological process [Jardine et al., 2021], [He et al., 2022]. Corresponding studies include quantifying cellular fates as well as identifying genes driving the process. However, as cells are destroyed when sequenced in classical single-cell RNA sequencing (scRNA-seq) protocols, it is impossible to track their development and, for example, gene expression profile over time. Although recent technological advances allow recording the transcriptome sequentially [Chen et al., 2022], they are experimentally challenging and currently fail to scale to larger dataset. Consequently, the underlying dynamic process needs to be estimated from the measured snapshot data, instead.
Even though samples are, traditionally, taken from a single experimental time point, a multitude of cell types can be observed. This diversity stems from the asynchronous nature of biological processes. As such, a range of the developmental process can be observed. Reconstructing the developmental landscape is the goal of the field coined trajectory inference (TI). This task is achieved by ordering the observed cellular states according to the developmental process. States are aligned along the developmental direction by mapping discrete annotations to a continuous domain - the so-called pseudotime.
Pseudotimes rank cells relative to each other according to their respective stage in the developmental process. Less mature cells are assigned small, mature cells large values. Studying a bone marrow sample for example, hematopoietic stem cells are assigned a low, and erythroid cells a high pseudotime. The assignment is, in case of single-cell RNA sequencing data, based on the transcriptomic profile of a cell. Additionally, the construction usually requires the specification of an initial, or, equivalently, root cell where the overall process starts.
13.2. Pseudotime construction#
Pseudotime construction generally follows a common workflow: As a first step, the ultra high-dimensional single-cell data is projected onto a lower dimensional representation. This procedure is justified by the observation that dynamical processes progress on a low-dimensional manifold [Wagner et al., 2016]. In practice, pseudotime methods may rely on principal components (for example Palantir [Setty et al., 2019]) or diffusion components (for example diffusion pseudotime (DPT) [Haghverdi et al., 2016]). Following, pseudotimes are constructed based on one of the following principles.
Observations are first clustered and, following, connections between these clusters identified. The clusters can be ordered and, thereby, a pseudotime constructed. Henceforth, we will refer to this approach as the cluster approach. Classical cluster algorithms include \(k\)-means [Lloyd, 1982], [MacQueen, 1967], Leiden [Traag et al., 2019], or hierarchical clustering [Müllner, 2011]. Clusters may be connected based on similarity, or by constructing a minimum spanning tree (MST) [Pettie and Ramachandran, 2002].
The graph approach first finds connections between the lower dimensional representation of the observations. This procedure defines a graph based on which clusters, and thus an ordering, are defined. PAGA [Wolf et al., 2019], for example, partitions the graph into Leiden clusters and estimates connections between them. Intuitively, this approach preserves the global topology of the data while analyzing it at a lower resolution. Consequently, the computational efficiency is increased.
Manifold-learning based approaches proceed similar to the cluster approach. However, connections between clusters are defined by using principal curves or graphs to estimate the underlying trajectories. Principal curves find a one-dimensional curve that connects cellular observations in the higher dimensional space. A notable representation of this approach is Slingshot [Street et al., 2018].
Probabilistic frameworks assign transition probabilities to ordered cell-cell pairs. Each transition probability quantifies how likely the reference cell is the ancestor of the other cell. These probabilities define random processes that are used to define a pseudotime. DPT, for example, is defined as the difference between consecutive states of a random walk. Contrastingly, Palantir [Setty et al., 2019] models trajectories themselves as Markov chains. While both approaches rely on a probabilistic framework, they require a root cell to be specified. The pseudotime itself is computed with respect to this cell.
TI is a well-studied field providing a rich set of methods. To apply the appropriate method to analyze a single-cell dataset, the biological process itself needs to be understood first. This understanding especially includes the nature of the process, i.e., if it, for example, is linear, cyclic, or branching. Similarly, orthogonal processes within one and the same dataset limits the TI methods applicable. To help identifying appropriate tools, dynguidelines [Deconinck et al., 2021] provides an exhaustive overview of algorithms and their characteristics.
13.3. Down-stream tasks and outlook#
Even though TI and pseudotime can already provide valuable insight, they usually act as a stepping stone for more fine grained analysis. Identifying terminal states, for example, is a classical biological question that can be studied. Similarly, lineage bifurcation and genes driving fate decisions can be identified based on TI and pseudotime. Which question can answer and how the answer is found is usually method specific. Palantir, for example, identifies terminal states as absorbing states of its constructed Markov chain.
The success of trajectory inference is well documented and, consequently, many methods have been proposed. However, with the advances of sequencing technologies, new sources of information become available. ATAC-seq [Buenrostro et al., 2015], CITE-seq [Stoeckius et al., 2017], and DOGMA-seq [Mimitou et al., 2021], for example, measure additional modalities beyond the transcriptome. Lineage tracing [] and metabolic labeling [Erhard et al., 2019], [Battich et al., 2020], [Qiu et al., 2020], [Erhard et al., 2022] even provide the (likely) future state of a given cell. Consequently, future TI tools will be able to include more information to estimate trajectories and pseudotime more accurately and robustly, and allow answering novel questions. For example, RNA velocity [Manno et al., 2018], [Bergen et al., 2020], [Bergen et al., 2021] is one technique that uses unspliced and spliced mRNA to infer directed, dynamic information beyond classical, static snapshot data.
13.4. Inferring pseudotime for adult human bone marrow#
To show how a pseudotime can be constructed and different pseudotimes compared, we study a dataset of adult human bone marrow [Setty et al., 2019].
13.4.1. Environment setup#
from pathlib import Path
import scanpy as sc
13.4.2. General settings#
DATA_DIR = Path("../../data/")
DATA_DIR.mkdir(parents=True, exist_ok=True)
FILE_NAME = DATA_DIR / "bone_marrow.h5ad"
13.4.3. Data loading#
adata = sc.read(
filename=FILE_NAME,
backup_url="https://figshare.com/ndownloader/files/35826944",
)
adata
AnnData object with n_obs × n_vars = 5780 × 27876
obs: 'clusters', 'palantir_pseudotime', 'palantir_diff_potential'
var: 'palantir'
uns: 'clusters_colors', 'palantir_branch_probs_cell_types'
obsm: 'MAGIC_imputed_data', 'X_tsne', 'palantir_branch_probs'
layers: 'spliced', 'unspliced'
To construct pseudotimes, the data must be preprocessed. Here, we filter out genes expressed in only a few number of cells (here, at least 20). Notably, the construction of the pseudotime later on is robust to the exact choice of the threshold. Following to this first gene filtering, the cell size is normalized, and counts log1p transformed to reduce the effect of outliers. As usual, we also identify and annotate highly variable genes. Finally, a nearest neighbor graph is constructed based on which we will define the pseudotime. The number of principle components is chosen based on the explained variance.
sc.pp.filter_genes(adata, min_counts=20)
sc.pp.normalize_total(adata)
sc.pp.log1p(adata)
sc.pp.highly_variable_genes(adata)
sc.tl.pca(adata)
sc.pp.neighbors(adata, n_pcs=10)
The two-dimensional t-SNE representation colored by cell type annotations shows that the cell types cluster together well. Additionally, the developmental hierarchy is visible.
sc.pl.scatter(adata, basis="tsne", color="clusters")
13.4.4. Pseudotime construction#
To calculate diffusion pseudotime (DPT), first, the corresponding diffusion maps need to be calculated.
sc.tl.diffmap(adata)
The differentiation hierarchy in bone marrow is well understood. However, it is only know that the developmental process starts in the form of hematopoietic stem cells but not at which cell exactly in the corresponding cluster in our dataset. To identify a putative initial cell, we study the individual diffusion components. We identify the stem cell with the most extreme diffusion component in one dimension (in our case dimension 3).
# Setting root cell as described above
root_ixs = adata.obsm["X_diffmap"][:, 3].argmin()
sc.pl.scatter(
adata,
basis="diffmap",
color=["clusters"],
components=[2, 3],
)
adata.uns["iroot"] = root_ixs
sc.tl.dpt(adata)
Different pseudotime methods give different results. Sometimes, one pseudotime captures the underlying developmental processes more accurately. Here, we compare the just computed DPT with the pre-computed Palantir pseudotime (see here for the corresponding tutorial). One option to compare different pseudotimes is by coloring the low dimensional embedding of the data (here, t-SNE). Here, DPT is extremely high in the cluster of CLPs compared to all other cell types. Contrastingly, the Palantir pseudotime increases continuously with developmental maturity.
sc.pl.scatter(
adata,
basis="tsne",
color=["dpt_pseudotime", "palantir_pseudotime"],
color_map="gnuplot2",
)
Instead of coloring the lower dimensional representation of the data, we can study the distribution of pseudotime values assigned to each cell type cluster. This representation again shows that the CLP cluster forms an outlier in case of DPT. Additionally, clusters such as HSC_1 and HSC_2 include several cells with increased pseudotime. These inflated values contrast our prior biological knowledge that these clusters form the beginning of the developmental process.
sc.pl.violin(
adata,
keys=["dpt_pseudotime", "palantir_pseudotime"],
groupby="clusters",
rotation=45,
order=[
"HSC_1",
"HSC_2",
"Precursors",
"Ery_1",
"Ery_2",
"Mono_1",
"Mono_2",
"CLP",
"DCs",
"Mega",
],
)
Considering these observations and prior knowledge about the development in bone marrow, we would conclude to continue working with the Palantir pseudotime.
13.5. Key takeaways#
TI methods require the start of the biological process to be known (approximately).
The nature of the biological process defines which TI algorithms can be used. dynguidelines helps selecting the appropriate TI method.
13.6. References#
Nico Battich, Joep Beumer, Buys de Barbanson, Lenno Krenning, Chloé S. Baron, Marvin E. Tanenbaum, Hans Clevers, and Alexander van Oudenaarden. Sequencing metabolically labeled transcripts in single cells reveals mRNA turnover strategies. Science, 367(6482):1151–1156, March 2020. URL: https://doi.org/10.1126/science.aax3072, doi:10.1126/science.aax3072.
Volker Bergen, Marius Lange, Stefan Peidli, F. Alexander Wolf, and Fabian J. Theis. Generalizing RNA velocity to transient cell states through dynamical modeling. Nature Biotechnology, 38(12):1408–1414, August 2020. URL: https://doi.org/10.1038/s41587-020-0591-3, doi:10.1038/s41587-020-0591-3.
Volker Bergen, Ruslan A Soldatov, Peter V Kharchenko, and Fabian J Theis. RNA velocity—current challenges and future perspectives. Molecular Systems Biology, August 2021. URL: https://doi.org/10.15252/msb.202110282, doi:10.15252/msb.202110282.
James A. Briggs, Caleb Weinreb, Daniel E. Wagner, Sean Megason, Leonid Peshkin, Marc W. Kirschner, and Allon M. Klein. The dynamics of gene expression in vertebrate embryogenesis at single-cell resolution. Science, June 2018. URL: https://doi.org/10.1126/science.aar5780, doi:10.1126/science.aar5780.
Jason D. Buenrostro, Beijing Wu, Ulrike M. Litzenburger, Dave Ruff, Michael L. Gonzales, Michael P. Snyder, Howard Y. Chang, and William J. Greenleaf. Single-cell chromatin accessibility reveals principles of regulatory variation. Nature, 523(7561):486–490, June 2015. URL: https://doi.org/10.1038/nature14590, doi:10.1038/nature14590.
Wanze Chen, Orane Guillaume-Gentil, Pernille Yde Rainer, Christoph G. Gäbelein, Wouter Saelens, Vincent Gardeux, Amanda Klaeger, Riccardo Dainese, Magda Zachara, Tomaso Zambelli, Julia A. Vorholt, and Bart Deplancke. Live-seq enables temporal transcriptomic recording of single cells. Nature, 608(7924):733–740, August 2022. URL: https://doi.org/10.1038/s41586-022-05046-9, doi:10.1038/s41586-022-05046-9.
Louise Deconinck, Robrecht Cannoodt, Wouter Saelens, Bart Deplancke, and Yvan Saeys. Recent advances in trajectory inference from single-cell omics data. Current Opinion in Systems Biology, 27:100344, September 2021. URL: https://doi.org/10.1016/j.coisb.2021.05.005, doi:10.1016/j.coisb.2021.05.005.
Florian Erhard, Marisa A. P. Baptista, Tobias Krammer, Thomas Hennig, Marius Lange, Panagiota Arampatzi, Christopher S. Jürges, Fabian J. Theis, Antoine-Emmanuel Saliba, and Lars Dölken. scSLAM-seq reveals core features of transcription dynamics in single cells. Nature, 571(7765):419–423, July 2019. URL: https://doi.org/10.1038/s41586-019-1369-y, doi:10.1038/s41586-019-1369-y.
Florian Erhard, Antoine-Emmanuel Saliba, Alexandra Lusser, Christophe Toussaint, Thomas Hennig, Bhupesh K. Prusty, Daniel Kirschenbaum, Kathleen Abadie, Eric A. Miska, Caroline C. Friedel, Ido Amit, Ronald Micura, and Lars Dölken. Time-resolved single-cell RNA-seq using metabolic RNA labelling. Nature Reviews Methods Primers, September 2022. URL: https://doi.org/10.1038/s43586-022-00157-z, doi:10.1038/s43586-022-00157-z.
Laleh Haghverdi, Maren Büttner, F Alexander Wolf, Florian Buettner, and Fabian J Theis. Diffusion pseudotime robustly reconstructs lineage branching. Nature Methods, 13(10):845–848, August 2016. URL: https://doi.org/10.1038/nmeth.3971, doi:10.1038/nmeth.3971.
Peng He, Kyungtae Lim, Dawei Sun, Jan Patrick Pett, Quitz Jeng, Krzysztof Polanski, Ziqi Dong, Liam Bolt, Laura Richardson, Lira Mamanova, Monika Dabrowska, Anna Wilbrey-Clark, Elo Madissoon, Zewen Kelvin Tuong, Emma Dann, Chenqu Suo, Isaac Goh, Masahiro Yoshida, Marko Z Nikolić, Sam M Janes, Xiaoling He, Roger A Barker, Sarah A Teichmann, John C. Marioni, Kerstin B Meyer, and Emma L Rawlins. A human fetal lung cell atlas uncovers proximal-distal gradients of differentiation and key regulators of epithelial fates. bioRxiv, 2022. URL: https://www.biorxiv.org/content/early/2022/09/30/2022.01.11.474933, arXiv:https://www.biorxiv.org/content/early/2022/09/30/2022.01.11.474933.full.pdf, doi:10.1101/2022.01.11.474933.
Byungjin Hwang, Ji Hyun Lee, and Duhee Bang. Single-cell RNA sequencing technologies and bioinformatics pipelines. Experimental &$\mathsemicolon $ Molecular Medicine, 50(8):1–14, August 2018. URL: https://doi.org/10.1038/s12276-018-0071-8, doi:10.1038/s12276-018-0071-8.
Saiful Islam, Una Kjällquist, Annalena Moliner, Pawel Zajac, Jian-Bing Fan, Peter Lönnerberg, and Sten Linnarsson. Characterization of the single-cell transcriptional landscape by highly multiplex RNA-seq. Genome Research, 21(7):1160–1167, May 2011. URL: https://doi.org/10.1101/gr.110882.110, doi:10.1101/gr.110882.110.
Laura Jardine, Simone Webb, Issac Goh, Mariana Quiroga Londoño, Gary Reynolds, Michael Mather, Bayanne Olabi, Emily Stephenson, Rachel A. Botting, Dave Horsfall, Justin Engelbert, Daniel Maunder, Nicole Mende, Caitlin Murnane, Emma Dann, Jim McGrath, Hamish King, Iwo Kucinski, Rachel Queen, Christopher D. Carey, Caroline Shrubsole, Elizabeth Poyner, Meghan Acres, Claire Jones, Thomas Ness, Rowen Coulthard, Natalina Elliott, Sorcha O'Byrne, Myriam L. R. Haltalli, John E. Lawrence, Steven Lisgo, Petra Balogh, Kerstin B. Meyer, Elena Prigmore, Kirsty Ambridge, Mika Sarkin Jain, Mirjana Efremova, Keir Pickard, Thomas Creasey, Jaume Bacardit, Deborah Henderson, Jonathan Coxhead, Andrew Filby, Rafiqul Hussain, David Dixon, David McDonald, Dorin-Mirel Popescu, Monika S. Kowalczyk, Bo Li, Orr Ashenberg, Marcin Tabaka, Danielle Dionne, Timothy L. Tickle, Michal Slyper, Orit Rozenblatt-Rosen, Aviv Regev, Sam Behjati, Elisa Laurenti, Nicola K. Wilson, Anindita Roy, Berthold Göttgens, Irene Roberts, Sarah A. Teichmann, and Muzlifah Haniffa. Blood and immune development in human fetal bone marrow and down syndrome. Nature, 598(7880):327–331, September 2021. URL: https://doi.org/10.1038/s41586-021-03929-x, doi:10.1038/s41586-021-03929-x.
S. Lloyd. Least squares quantization in PCM. IEEE Transactions on Information Theory, 28(2):129–137, March 1982. URL: https://doi.org/10.1109/tit.1982.1056489, doi:10.1109/tit.1982.1056489.
Daniel Müllner. Modern hierarchical, agglomerative clustering algorithms. 2011. URL: https://arxiv.org/abs/1109.2378, doi:10.48550/ARXIV.1109.2378.
J. B. MacQueen. Some methods for classification and analysis of multivariate observations. In L. M. Le Cam and J. Neyman, editors, Proc. of the fifth Berkeley Symposium on Mathematical Statistics and Probability, volume 1, 281–297. University of California Press, 1967.
Gioele La Manno, Ruslan Soldatov, Amit Zeisel, Emelie Braun, Hannah Hochgerner, Viktor Petukhov, Katja Lidschreiber, Maria E. Kastriti, Peter Lönnerberg, Alessandro Furlan, Jean Fan, Lars E. Borm, Zehua Liu, David van Bruggen, Jimin Guo, Xiaoling He, Roger Barker, Erik Sundström, Gonçalo Castelo-Branco, Patrick Cramer, Igor Adameyko, Sten Linnarsson, and Peter V. Kharchenko. RNA velocity of single cells. Nature, 560(7719):494–498, August 2018. URL: https://doi.org/10.1038/s41586-018-0414-6, doi:10.1038/s41586-018-0414-6.
Eleni P. Mimitou, Caleb A. Lareau, Kelvin Y. Chen, Andre L. Zorzetto-Fernandes, Yuhan Hao, Yusuke Takeshima, Wendy Luo, Tse-Shun Huang, Bertrand Z. Yeung, Efthymia Papalexi, Pratiksha I. Thakore, Tatsuya Kibayashi, James Badger Wing, Mayu Hata, Rahul Satija, Kristopher L. Nazor, Shimon Sakaguchi, Leif S. Ludwig, Vijay G. Sankaran, Aviv Regev, and Peter Smibert. Scalable, multimodal profiling of chromatin accessibility, gene expression and protein levels in single cells. Nature Biotechnology, 39(10):1246–1258, June 2021. URL: https://doi.org/10.1038/s41587-021-00927-2, doi:10.1038/s41587-021-00927-2.
Seth Pettie and Vijaya Ramachandran. An optimal minimum spanning tree algorithm. Journal of the ACM, 49(1):16–34, January 2002. URL: https://doi.org/10.1145/505241.505243, doi:10.1145/505241.505243.
Qi Qiu, Peng Hu, Xiaojie Qiu, Kiya W. Govek, Pablo G. Cámara, and Hao Wu. Massively parallel and time-resolved RNA sequencing in single cells with scNT-seq. Nature Methods, 17(10):991–1001, August 2020. URL: https://doi.org/10.1038/s41592-020-0935-4, doi:10.1038/s41592-020-0935-4.
Manu Setty, Vaidotas Kiseliovas, Jacob Levine, Adam Gayoso, Linas Mazutis, and Dana Pe'er. Characterization of cell fate probabilities in single-cell data with palantir. Nature Biotechnology, 37(4):451–460, March 2019. URL: https://doi.org/10.1038/s41587-019-0068-4, doi:10.1038/s41587-019-0068-4.
Lisa Sikkema, Daniel C Strobl, Luke Zappia, Elo Madissoon, Nikolay S Markov, Laure-Emmanuelle Zaragosi, Meshal Ansari, Marie-Jeanne Arguel, Leonie Apperloo, Christophe Becavin, Marijn Berg, Evgeny Chichelnitskiy, Mei-I Chung, Antoine Collin, Aurore C A Gay, Baharak Hooshiar Kashani, Manu Jain, Theodore Kapellos, Tessa M Kole, Christoph H Mayr, Von Michael Papen, Lance Peter, Ciro Ramirez-Suastegui, Janine Schniering, Chase J Taylor, Thomas Walzthoeni, Chuan Xu, Linh T Bui, Carlo de Donno, Leander Dony, Minzhe Guo, Austin J Gutierrez, Lukas Heumos, Ni Huang, Ignacio L Ibarra, Nathan D Jackson, Preetish Kadur Lakshminarasimha Murthy, Mohammad Lotfollahi, Tracy Tabib, Carlos Talavera-Lopez, Kyle J Travaglini, Anna Wilbrey-Clark, Kaylee B Worlock, Masahiro Yoshida, Tushar J Desai, Oliver Eickelberg, Christine Falk, Naftali Kaminski, Mark A Krasnow, Robert Lafyatis, Marko Z Nikoli, Joseph E Powell, Jayaraj Rajagopal, Orit Rozenblatt-Rosen, Max A Seibold, Dean Sheppard, Douglas P Shepherd, Sarah A Teichmann, Alexander M Tsankov, Jeffrey Whitsett, Yan Xu, Nicholas E Banovich, Pascal Barbry, Thu E Duong, Kerstin B Meyer, Jonathan A Kropski, Dana Pe'er, Herbert B Schiller, Purushothama Rao Tata, Joachim L Schultze, Alexander V Misharin, Martijn C Nawijn, Malte D Luecken, and Fabian J Theis. An integrated cell atlas of the human lung in health and disease. bioRxiv, pages 2022.03.10.483747, March 2022. doi:10.1101/2022.03.10.483747.
Marlon Stoeckius, Christoph Hafemeister, William Stephenson, Brian Houck-Loomis, Pratip K Chattopadhyay, Harold Swerdlow, Rahul Satija, and Peter Smibert. Simultaneous epitope and transcriptome measurement in single cells. Nature Methods, 14(9):865–868, July 2017. URL: https://doi.org/10.1038/nmeth.4380, doi:10.1038/nmeth.4380.
Kelly Street, Davide Risso, Russell B. Fletcher, Diya Das, John Ngai, Nir Yosef, Elizabeth Purdom, and Sandrine Dudoit. Slingshot: cell lineage and pseudotime inference for single-cell transcriptomics. BMC Genomics, June 2018. URL: https://doi.org/10.1186/s12864-018-4772-0, doi:10.1186/s12864-018-4772-0.
V. A. Traag, L. Waltman, and N. J. van Eck. From louvain to leiden: guaranteeing well-connected communities. Scientific Reports, March 2019. URL: https://doi.org/10.1038/s41598-019-41695-z, doi:10.1038/s41598-019-41695-z.
Allon Wagner, Aviv Regev, and Nir Yosef. Revealing the vectors of cellular identity with single-cell genomics. Nature Biotechnology, 34(11):1145–1160, November 2016. URL: https://doi.org/10.1038/nbt.3711, doi:10.1038/nbt.3711.
F. Alexander Wolf, Fiona K. Hamey, Mireya Plass, Jordi Solana, Joakim S. Dahlin, Berthold Göttgens, Nikolaus Rajewsky, Lukas Simon, and Fabian J. Theis. Paga: graph abstraction reconciles clustering with trajectory inference through a topology preserving map of single cells. Genome Biology, 20(1):59, Mar 2019. URL: https://doi.org/10.1186/s13059-019-1663-x, doi:10.1186/s13059-019-1663-x.
13.7. Contributors#
We gratefully acknowledge the contributions of:
13.7.2. Reviewers#
Lukas Heumos