The most recent breakthrough in genomics is the ability to determine the molecular profiles of single cells in a high throughput fashion. Data from hundred thousands, even millions of cells, is becoming available, measuring full genomes, transcriptomes and proteomes of every single cell. These advances are generating a wealth of data, allowing researchers to capture the hidden heterogeneity in each sample. For example, which cells do appear in blood, and how does their transcriptional profile look like. Or, which cells make up the hippocampus of the brain, and what tasks do they perform. In cancer, this offers possibilities to capture the heterogeneity of different cell clones and how they evolve over the course of disease to escape targeted therapies. Clearly, the sheer amount of data stresses the limits of current analysis techniques. But, we also observe stochastic behavior between similar cells. For example, that in one cell there are a few RNA molecules of a specific gene, whereas in a similar other cell there are none. This call for new analysis methodologies that can deal with these stochasticity. In LCBC, we develop new algorithms to analyze single cell data to answer biological questions.
Processes within the living cell, tissue and body are regulated on various molecular levels of which many can be interrogated in detail using a specific so-called ‘omics’ platform. Perhaps the most renowned class of ‘omics’ platforms are the ‘genomics’ approaches in which variations in the DNA or RNA sequence can be measured at a nucleotide resolution using next generation sequencing technology. Other ‘omics’ approaches include for instance ‘epigenomics’, - platforms that employ next generation sequencing technology to measure genome-wide epigenetic modifications -, or ‘metabolomics’, - platforms employing Nuclear Magnetic Resonance or Mass Spectrometry to quantitate metabolite levels -. Each of these omics platforms captures an unique aspect of the molecular processes that may play a role in the investigated traits or diseases and by finding smart ways of jointly analysing these data sources, we might gain a more comprehensive view on the determinants of health and disease.
Recent developments in data acquisition, especially for bio-medical applications, lead to ever increasing size and complexity of datasets. To find the unknown in such data, exploration beyond computational analytics is necessary. Our goal is to combine computational approaches with interactive visualization to allow the efficient exploration of data and provide the tools for sound analytical reasoning on its contents driven by visual inspection. A large amount of biological data is so called high-dimensional data, i.e., a single data-point is described by a large number of values, ranging from a handful of dimensions in flow and tensor data to tens of proteins in in Proteomics data to thousands of genes in Genomics data. The analysis of such high-dimensional data is the main focus of our visualization and visual analytics research, where we work on specialized tools for dimensionality reduction in combination with clustering and interactive and integrated approaches.
Cells develop over time, they interact with each other, and perform different functions in different tissues. Consequently, when studying life, cells need to be recorded and modelled across time and space. This realization is answered with large international efforts on generating atlases containing genomic data and the development of new measuring space-time aware molecular technologies. For example, The Cancer Genome Atlas (TCGA) or the International Cancer Genome Consortium (ICGC) collect data on cancers in different tissues. The BrainSpan atlas collected RNA expressions across different brain regions from fetal to adult brains. The Genotype-Tissue Expression (GTEx) collect data on all possible tissues throughout the body, and the Human Cell Atlas (HCA) even goes a step further by collecting single cell data. Novel technologies even can measure on subcellular resolution, like imaging CyTOF to measure proteomic activity, or multiplexed single molecule FISH (smFISH) for measuring RNA across slices of biomaterial.
LCBC uses the spatio-temporal data to get to new biological insights. For example, based on the Brainspan data we characterized how autism-related genes alter expression over development with an important change at birth. Or, we have characterized clustered data in TCGA to reveal overlap of cancer types as well as new subtypes. Alternatively, we have shown that for the DMD gene different isoforms are active in different tissues and explained possible cognitive effects of a malfunctioning DMD gene. To gain these insights LCBC develops new computational analyses to deal with the different data types and modelling questions. As an example, lineage tracing modelling from single cell data in heterogenous tumours, or a imaging-genetics modelling in which we modelled observed genetic variants in Alzheimer patients with physiological effects observed in MRI scans of the brain in one structural equation model.