Comparison of public pathway databases
Multiple pathway databases (e.g, KEGG, Reactome, BiGG) are available and describe the human metabolic network. They have proven their usefulness in many applications, ranging from the analysis and interpretation of high-throughput data to their use as a reference repository. However, so far the various human metabolic networks described by these databases have not been systematically compared and contrasted, nor has the extent to which they differ been quantified. For a researcher using these databases for particular analyses of human metabolism, it is crucial to know the extent of the differences in content and their underlying causes. Moreover, the outcomes of such a comparison are important for ongoing integration efforts. We have compared several pathway databases to gain insight in differences in content but also in the way they represent pathway information (Stobbe, 2011, 2013, 2014). Results from this comparison were used in the reconstruction of a map of human metabolism (Thiele, 2013). We also specifically investigated the differences between ten pathway databases for the well-known TCA cycle. Remarkably, none of the descriptions given by these databases is entirely correct. Moreover, consensus exists on only 3 reactions (Stobbe, 2012).
Compendiumdb: an R package for retrieval and storage of functional genomics data
We have developed a gene expression compendium database that allows to automatically extract one or more data sets from the public Gene Expression Omnibus (GEO) database of the NCBI. This compendium databases is embedded in the statistical package R, which allows the subsequent analysis of the integrated datasets (Nandal, in prep).
Development of domain specific knowledge bases
One important area of clinical genomics research involves the elucidation of molecular mechanisms underlying (complex) disorders that eventually may lead to new diagnostic or drug targets. To further advance this area of clinical genomics one of the main challenges is the acquisition and integration of data, information and expert knowledge for specific biomedical domains and diseases. Currently the required information is not very well organized but scattered over biological and biomedical databases, basic textbooks, scientific literature and experts’ minds and may be highly specific, heterogeneous, complex and voluminous. We developed a novel framework to construct knowledge bases with concept maps for presentation of information and the web ontology language OWL for the representation of information. We demonstrate this framework through the construction of a peroxisomal knowledge base, which focuses on four key peroxisomal pathways and several related genetic disorder (Willemsen, 2008). As part of the EpiPredict project we will work on a knowledge base about Tamoxifen resistance in breast cancer.
Advanced distributed data and computing infrastructures, also known as e-infrastructures, enable biomedical researchers to manage and process (omics) data and facilitate collaboration. However, these researchers often do not have the advanced technical knowledge that is required to fully exploit these advanced infrastructures. Science Gateways (SGs) have emerged to address this challenge. Our research focussed on the investigation, design, development and evaluation of state-of-the-art SGs to access e-infrastructures for biomedical research (Shahand, 2011). Currently we aim to extend our SG framework for research data management while focussing on immunology research as a test case.