84 episodes

Die Universitätsbibliothek (UB) verfügt über ein umfangreiches Archiv an elektronischen Medien, das von Volltextsammlungen über Zeitungsarchive, Wörterbücher und Enzyklopädien bis hin zu ausführlichen Bibliographien und mehr als 1000 Datenbanken reicht. Auf iTunes U stellt die UB unter anderem eine Auswahl an Dissertationen der Doktorandinnen und Doktoranden an der LMU bereit. (Dies ist der 2. von 2 Teilen der Sammlung 'Fakultät für Mathematik, Informatik und Statistik - Digitale Hochschulschriften der LMU'.)

Fakultät für Mathematik, Informatik und Statistik - Digitale Hochschulschriften der LMU - Teil 02/02 Ludwig-Maximilians-Universität München

- Education

- 29 APR 2016
Network-based analysis of gene expression data

Network-based analysis of gene expression data

The methods of molecular biology for the quantitative measurement of gene
expression have undergone a rapid development in the past two decades.
High-throughput assays with the microarray and RNA-seq technology now enable whole-genome studies in which several thousands of genes can be
measured at a time. However, this has also imposed serious challenges on data storage and analysis, which are subject of the young, but rapidly developing field of computational biology.
To explain observations made on such a large scale requires suitable and accordingly scaled models of gene regulation. Detailed models, as
available for single genes, need to be extended and assembled in larger networks of regulatory interactions between genes and gene products.
Incorporation of such networks into methods for data analysis is crucial to identify molecular mechanisms that are drivers of the observed expression. As methods for this purpose emerge in parallel to each other and without knowing the standard of truth, results need to be critically checked in a competitive setup and in the context of the available rich literature corpus.
This work is centered on and contributes to the following subjects, each of which represents important and distinct research topics in the field of computational biology: (i) construction of realistic gene regulatory network models; (ii) detection of subnetworks that are significantly
altered in the data under investigation; and (iii) systematic biological interpretation of detected subnetworks.
For the construction of regulatory networks, I review existing methods with a focus on curation and inference approaches. I first describe how
literature curation can be used to construct a regulatory network for a specific process, using the well-studied diauxic shift in yeast as an
example. In particular, I address the question how a detailed understanding, as available for the regulation of single genes, can be
scaled-up to the level of larger systems.
I subsequently inspect methods for large-scale network inference showing that they are significantly skewed towards master regulators.
A recalibration strategy is introduced and applied, yielding an improved genome-wide regulatory network for yeast.
To detect significantly altered subnetworks, I introduce GGEA as a method for network-based enrichment analysis. The key idea is to score regulatory interactions within functional gene sets for consistency with the observed
expression. Compared to other recently published methods, GGEA yields results that consistently and coherently align expression changes with
known regulation types and that are thus easier to explain. I also suggest and discuss several significant enhancements to the original method that are improving its applicability, outcome and runtime.
For the systematic detection and interpretation of subnetworks, I have developed the EnrichmentBrowser software package. It implements several state-of-the-art methods besides GGEA, and allows to combine and explore results across methods. As part of the Bioconductor repository, the package provides a unified access to the different methods and, thus, greatly simplifies the usage for biologists. Extensions to this framework, that support automating of biological interpretation routines, are also presented.
In conclusion, this work contributes substantially to the research field of network-based analysis of gene expression data with respect to regulatory network construction, subnetwork detection, and their biological interpretation. This also includes recent developments as well as areas of ongoing research, which are discussed in the context of
current and future questions arising from the new generation of genomic data.
- 22 APR 2016
Context-based RNA-seq mapping

Context-based RNA-seq mapping

In recent years, the sequencing of RNA (RNA-seq) using next generation sequencing (NGS) technology has become a powerful tool for analyzing the transcriptomic state of a cell. Modern NGS platforms allow for performing RNA-seq experiments in a few days, resulting in millions of short sequencing reads. A crucial step in analyzing RNA-seq data generally is determining the transcriptomic origin of the sequencing reads (= read mapping). In principal, read mapping is a sequence alignment problem, in which the short sequencing reads (30 - 500 nucleotides) are aligned to much larger reference sequences such as the human genome (3 billion nucleotides).
In this thesis, we present ContextMap, an RNA-seq mapping approach that evaluates the context of the sequencing reads for determining the most likely origin of every read. The context of a sequencing read is defined by all other reads aligned to the same genomic region. The ContextMap project started with a proof of concept study, in which we showed that our approach is able to improve already existing read mapping results provided by other mapping programs. Subsequently, we developed a standalone version of ContextMap. This implementation no longer relied on mapping results of other programs, but determined initial alignments itself using a modification of the Bowtie short read alignment program. However, the original ContextMap implementation had several drawbacks. In particular, it was not able to predict reads spanning over more than two exons and to detect insertions or deletions (indels). Furthermore, ContextMap depended on a modification of a specific Bowtie version. Thus, it could neither benefit of Bowtie updates nor of novel developments (e.g. improved running times) in the area of short read alignment software.
For addressing these problems, we developed ContextMap 2, an extension of the original ContextMap algorithm. The key features of ContextMap 2 are the context-based resolution of ambiguous read alignments and the accurate detection of reads crossing an arbitrary number of exon-exon junctions or containing indels. Furthermore, a plug-in interface is provided that allows for the easy integration of alternative short read alignment programs (e.g. Bowtie 2 or BWA) into the mapping workflow. The performance of ContextMap 2 was evaluated on real-life as well as synthetic data and compared to other state-of-the-art mapping programs. We found that ContextMap 2 had very low rates of misplaced reads and incorrectly predicted junctions or indels. Additionally, recall values were as high as for the top competing methods. Moreover, the runtime of ContextMap 2 was at least two fold lower than for the best competitors.
In addition to the mapping of sequencing reads to a single reference, the ContextMap approach allows the investigation of several potential read sources (e.g. the human host and infecting pathogens) in parallel. Thus, ContextMap can be applied to mine for infections or contaminations or to map data from meta-transcriptomic studies. Furthermore, we developed methods based on mapping-derived statistics that allow to assess confidence of mappings to identified species and to detect false positive hits. ContextMap was evaluated on three real-life data sets and results were compared to metagenomics tools. Here, we showed that ContextMap can successfully identify the species contained in a sample. Moreover, in contrast to most other metagenomics approaches, ContextMap also provides read mapping results to individual species. As a consequence, read mapping results determined by ContextMap can be used to study the gene expression of all species contained in a sample at the same time. Thus, ContextMap might be applied in clinical studies, in which the influence of infecting agents on host organisms is investigated.
The methods presented in this thesis allow for an accurate and fast mapping of RNA-seq data. As the amount of available sequencing data increases constantl
- 8 APR 2016
Computing hybridization networks using agreement forests

Computing hybridization networks using agreement forests

Rooted phylogenetic trees are widely used in biology to represent the evolutionary history of certain species. Usually, such a tree is a simple binary tree only containing internal nodes of in-degree one and out-degree two representing specific speciation events. In applied phylogenetics, however, trees can contain nodes of out-degree larger than two because, often, in order to resolve some orderings of speciation events, there is only insufficient information available and the common way to model this uncertainty is to use nonbinary nodes (i.e., nodes of out-degree of at least three), also denoted as polytomies.
Moreover, in addition to such speciation events, there exist certain biological events that cannot be modeled by a tree and, thus, require the more general concept of rooted phylogenetic networks or, more specifically, of hybridization networks. Examples for such reticulate events are horizontal gene transfer, hybridization, and recombination.
Nevertheless, in order to construct hybridization networks, the less general concept of a phylogenetic tree can still be used as building block. More precisely, often, in a first step, phylogenetic trees for a set of species, each based on a distinctive orthologous gene, are constructed. In a second step, specific sets containing common subtrees of those trees, known as maximum acyclic agreement forests, are calculated, which are then glued together to a single hybridization network. In such a network, hybridization nodes (i.e., nodes of in-degree larger than or equal to two) can exist representing potential reticulate events of the underlying evolutionary history. As such events are considered as rare phenomena, from a biological point of view, especially those networks representing a minimum number of reticulate events, which is denoted as hybridization number, are of high interest.
Consequently, in a mathematical aspect, the problem of calculating hybridization networks can be briefly described as follows. Given a set T of rooted phylogenetic trees sharing the same set of taxa, compute a hybridization network N displaying T with minimum hybridization number. In this context, we say that such a network N displays a phylogenetic tree T, if we can obtain T from N by removing as well as contracting some of its nodes and edges. Unfortunately, this is a computational hard problem (i.e., it is NP-hard), even for the simplest case given just two binary input trees.
In this thesis, we present several methods tackling this NP-hard problem. Our first approach describes how to compute a representative set of minimum hybridization networks for two binary input trees. For that purpose, our approach implements the first non-naive algorithm - called allMAAFs - calculating all maximum acyclic agreement forests for two rooted binary phylogenetic trees on the same set of taxa. In a subsequent step, in order to maximize the efficiency of the algorithm allMAAFs, we have developed additionally several modifications each reducing the number of computational steps and, thus, significantly improving its practical runtime.
Our second approach is an extension of our first approach making the underlying algorithm accessible to more than two binary input trees. For this purpose, our approach implements the algorithm allHNetworks being the first algorithm calculating all relevant hybridization networks displaying a set of rooted binary phylogenetic trees on the same set of taxa, which is a preferable feature when studying hybridization events.
Lastly, we have developed a generalization of our second approach that can now deal with multiple nonbinary input trees. For that purpose, our approach implements the first non-naive algorithm - called allMulMAAFs - calculating a relevant set of nonbinary maximum acyclic agreement forests for two rooted (nonbinary) phylogenetic trees on the same set of taxa.
Each of the algorithms above is integrated into our user friendly Java-based software
- 3 MAR 2016
Exploiting autobiographical memory for fallback authentication on smartphones

Exploiting autobiographical memory for fallback authentication on smartphones

Smartphones have advanced from simple communication devices to multipurpose devices that capture almost every single moment in our daily lives and thus contain sensitive data like photos or contact information. In order to protect this data, users can choose from a variety of authentication schemes. However, what happens if one of these schemes fails, for example, when users are not able to provide the correct password within a limited number of attempts? So far, situations like this have been neglected by the usable security and privacy community that mainly focuses on primary authentication schemes. But fallback authentication is comparably important to enable users to regain access to their devices (and data) in case of lockouts. In theory, any scheme for primary authentication on smartphones could also be used as fallback solution. In practice, fallback authentication happens less frequently and imposes different requirements and challenges on its design.
The aim of this work is to understand and address these challenges. We investigate the oc- currences of fallback authentication on smartphones in real life in order to grasp the charac- teristics that fallback authentication conveys. We also get deeper insights into the difficulties that users have to cope with during lockout situations. In combination with the knowledge from previous research, these insights are valuable to provide a detailed definition of fall- back authentication that has been missing so far. The definition covers usability and security characteristics and depicts the differences to primary authentication.
Furthermore, we explore the potential of autobiographical memory, a part of the human memory that relates to personal experiences of the past, for the design of alternative fall- back schemes to overcome the well-known memorability issues of current solutions. We present the design and evaluation of two static approaches that are based on the memory of locations and special drawings. We also cover three dynamic approaches that relate to re- cent smartphone activities, icon arrangements and installed apps. This series of work allows us to analyze the suitability of different types of memories for fallback authentication. It also helps us to extend the definition of fallback authentication by identifying factors that influence the quality of fallback schemes.
The main contributions of this thesis can be summarized as follows: First, it gives essen- tial insights into the relevance, frequency and problems of fallback authentication on smart- phones in real life. Second, it provides a clear definition of fallback authentication to classify authentication schemes based on usability and security properties. Third, it shows example implementations and evaluations of static and dynamic fallback schemes that are based on different autobiographical memories. Finally, it discusses the advantages and disadvantages of these memories and gives recommendations for their design, evaluation and analysis in the context of fallback authentication.
- 29 FEB 2016
Efficient data mining algorithms for time series and complex medical data

Efficient data mining algorithms for time series and complex medical data
- 5 FEB 2016
Cross-species network and transcript transfer

Cross-species network and transcript transfer

Metabolic processes, signal transduction, gene regulation, as well as gene and protein expression are largely controlled by biological networks. High-throughput experiments allow the measurement of a wide range of cellular states and interactions. However, networks are often not known in detail for specific biological systems and conditions. Gene and protein annotations are often transferred from model organisms to the species of interest. Therefore, the question arises whether biological networks can be transferred between species or whether they are specific for individual contexts. In this thesis, the following aspects are investigated: (i) the conservation and (ii) the cross-species transfer of eukaryotic protein-interaction and gene regulatory (transcription factor- target) networks, as well as (iii) the conservation of alternatively spliced variants.
In the simplest case, interactions can be transferred between species, based solely on the sequence similarity of the orthologous genes. However, such a transfer often results either in the transfer of only a few interactions (medium/high sequence similarity threshold) or in the transfer of many speculative interactions (low sequence similarity threshold). Thus, advanced network transfer approaches also consider the annotations of orthologous genes involved in the interaction transfer, as well as features derived from the network structure, in order to enable a reliable interaction transfer, even between phylogenetically very distant species. In this work, such an approach for the transfer of protein interactions is presented (COIN). COIN uses a sophisticated machine-learning model in order to label transferred interactions as either correctly transferred (conserved) or as incorrectly transferred (not conserved).
The comparison and the cross-species transfer of regulatory networks is more difficult than the transfer of protein interaction networks, as a huge fraction of the known regulations is only described in the (not machine-readable) scientific literature. In addition, compared to protein interactions, only a few conserved regulations are known, and regulatory elements appear to be strongly context-specific. In this work, the cross-species analysis of regulatory interaction networks is enabled with software tools and databases for global (ConReg) and thousands of context-specific (CroCo) regulatory interactions that are derived and integrated from the scientific literature, binding site predictions and experimental data.
Genes and their protein products are the main players in biological networks. However, to date, the aspect is neglected that a gene can encode different proteins. These alternative proteins can differ strongly from each other with respect to their molecular structure, function and their role in networks. The identification of conserved and species-specific splice variants and the integration of variants in network models will allow a more complete cross-species transfer and comparison of biological networks. With ISAR we support the cross-species transfer and comparison of alternative variants by introducing a gene-structure aware (i.e. exon-intron structure aware) multiple sequence alignment approach for variants from orthologous and paralogous genes.
The methods presented here and the appropriate databases allow the cross-species transfer of biological networks, the comparison of thousands of context-specific networks, and the cross-species comparison of alternatively spliced variants. Thus, they can be used as a starting point for the understanding of regulatory and signaling mechanisms in many biological systems.