LinkHub: a Semantic Web system that facilitates cross-database queries and information retrieval in proteomics
Open Access
- 9 May 2007
- journal article
- review article
- Published by Springer Nature in BMC Bioinformatics
- Vol. 8 (S3) , S5
- https://doi.org/10.1186/1471-2105-8-s3-s5
Abstract
Background A key abstraction in representing proteomics knowledge is the notion of unique identifiers for individual entities (e.g. proteins) and the massive graph of relationships among them. These relationships are sometimes simple (e.g. synonyms) but are often more complex (e.g. one-to-many relationships in protein family membership). Results We have built a software system called LinkHub using Semantic Web RDF that manages the graph of identifier relationships and allows exploration with a variety of interfaces. For efficiency, we also provide relational-database access and translation between the relational and RDF versions. LinkHub is practically useful in creating small, local hubs on common topics and then connecting these to major portals in a federated architecture; we have used LinkHub to establish such a relationship between UniProt and the North East Structural Genomics Consortium. LinkHub also facilitates queries and access to information and documents related to identifiers spread across multiple databases, acting as "connecting glue" between different identifier spaces. We demonstrate this with example queries discovering "interologs" of yeast protein interactions in the worm and exploring the relationship between gene essentiality and pseudogene content. We also show how "protein family based" retrieval of documents can be achieved. LinkHub is available at hub.gersteinlab.org and hub.nesg.org with supplement, database models and full-source code. Conclusion LinkHub leverages Semantic Web standards-based integrated data to provide novel information retrieval to identifier-related documents through relational graph queries, simplifies and manages connections to major hubs such as UniProt, and provides useful interactive and query interfaces for exploring the integrated data.Keywords
This publication has 35 references indexed in Scilit:
- Named graphsJournal of Web Semantics, 2005
- From XML to RDF: how semantic web technologies will change the design of 'omic' standardsNature Biotechnology, 2005
- Are the current ontologies in biology good ontologies?Nature Biotechnology, 2005
- Identification of common molecular subsequencesPublished by Elsevier ,2004
- Globally distributed object identification for biological knowledgebasesBriefings in Bioinformatics, 2004
- A small reservoir of disabled ORFs in the yeast genome and its implications for the dynamics of proteome evolutionJournal of Molecular Biology, 2002
- The Protein Data BankNucleic Acids Research, 2000
- Efficient view maintenance at data warehousesACM SIGMOD Record, 1997
- Database links are a foundation for interoperabilityTrends in Biotechnology, 1996
- Basic local alignment search toolJournal of Molecular Biology, 1990