Information management (KEBB)

Comparison of public pathway databases

Multiple pathway databases (e.g, KEGG, Reactome, BiGG) are available and describe the human metabolic network. They have proven their usefulness in many applications, ranging from the analysis and interpretation of high-throughput data to their use as a reference repository. However, so far the various human metabolic networks described by these databases have not been systematically compared and contrasted, nor has the extent to which they differ been quantified. For a researcher using these databases for particular analyses of human metabolism, it is crucial to know the extent of the differences in content and their underlying causes. Moreover, the outcomes of such a comparison are important for ongoing integration efforts. We have compared several pathway databases to gain insight in differences in content but also in the way they represent pathway information (Stobbe, 2011, 2013, 2014). Results from this comparison were used in the reconstruction of a map of human metabolism (Thiele, 2013). We also specifically investigated the differences between ten pathway databases for the well-known TCA cycle. Remarkably, none of the descriptions given by these databases is entirely correct. Moreover, consensus exists on only 3 reactions (Stobbe, 2012).

5 Figure comparison of the TCA cycle in five metabolic pathway databases (Stobbe 2011). Map illustrating the (lack of) consensus for the TCA cycle. Metabolites are represented by rectangles, genes by rounded rectangles, and EC numbers by parallelograms. Color indicates how many of the five databases agree on a specific entity (gene, EC number, reaction). Color of an arrow indicates the number of databases that agree upon an entire reaction, i.e., all its metabolites. ‘x’ denotes a missing EC number. A comparison of ten databases is published in Stobbe (2012) 5 Figure comparison of the TCA cycle in five metabolic pathway databases (Stobbe 2011). Map illustrating the (lack of) consensus for the TCA cycle. Metabolites are represented by rectangles, genes by rounded rectangles, and EC numbers by parallelograms. Color indicates how many of the five databases agree on a specific entity (gene, EC number, reaction). Color of an arrow indicates the number of databases that agree upon an entire reaction, i.e., all its metabolites. ‘x’ denotes a missing EC number. A comparison of ten databases is published in Stobbe (2012)

Compendiumdb: an R package for retrieval and storage of functional genomics data

We have developed a gene expression compendium database that allows to automatically extract one or more data sets from the public Gene Expression Omnibus (GEO) database of the NCBI. This compendium databases is embedded in the statistical package R, which allows the subsequent analysis of the integrated datasets (Nandal, in prep).

Development of domain specific knowledge bases

One important area of clinical genomics research involves the elucidation of molecular mechanisms underlying (complex) disorders that eventually may lead to new diagnostic or drug targets. To further advance this area of clinical genomics one of the main challenges is the acquisition and integration of data, information and expert knowledge for specific biomedical domains and diseases. Currently the required information is not very well organized but scattered over biological and biomedical databases, basic textbooks, scientific literature and experts’ minds and may be highly specific, heterogeneous, complex and voluminous. We developed a novel framework to construct knowledge bases with concept maps for presentation of information and the web ontology language OWL for the representation of information. We demonstrate this framework through the construction of a peroxisomal knowledge base, which focuses on four key peroxisomal pathways and several related genetic disorder (Willemsen, 2008). As part of the EpiPredict project we will work on a knowledge base about Tamoxifen resistance in breast cancer.

6 peroxisomal knowledge base, which focuses on four key peroxisomal pathways and several related genetic disorder (Willemsen, 2008). As part of the EpiPredict project (www.epipredict.eu) we will work on a knowledge base about Tamoxifen resistance in breast cancer. Figure example concept map. This concept map shows the four key peroxisomal pathways in our knowledge base. Icons associated with specific concepts provide links to other concept maps or information resources (pdf documents). 6 peroxisomal knowledge base, which focuses on four key peroxisomal pathways and several related genetic disorder (Willemsen, 2008). As part of the EpiPredict project (www.epipredict.eu) we will work on a knowledge base about Tamoxifen resistance in breast cancer. Figure example concept map. This concept map shows the four key peroxisomal pathways in our knowledge base. Icons associated with specific concepts provide links to other concept maps or information resources (pdf documents).

Science gateways

Advanced distributed data and computing infrastructures, also known as e-infrastructures, enable biomedical researchers to manage and process (omics) data and facilitate collaboration. However, these researchers often do not have the advanced technical knowledge that is required to fully exploit these advanced infrastructures. Science Gateways (SGs) have emerged to address this challenge. Our research focussed on the investigation, design, development and evaluation of state-of-the-art SGs to access e-infrastructures for biomedical research (Shahand, 2011). Currently we aim to extend our SG framework for research data management while focussing on immunology research as a test case.

Bioinformatics Laboratory

Prof dr Antoine H.C. van Kampen
Clinical Epidemiology, Biostatistics and Bioinformatics (KEBB)
Academic Medical Center

Visiting address
Meibergdreef 9
Location J1B-208
Amsterdam Zuidoost

Postal address
P.O. Box 22700
1100 DE Amsterdam
the Netherlands

Tel.: +31-20-5667096
Mobile: 06-41768067

Skype
antoine.van.kampen

Contact