The Genome Sequence DataBase (GSDB): Improving data quality and data access
In 1997 the primary focus of the Genome Sequence DataBase (GSDB; www.ncgr.org/gsdb) located at the National Center for Genome Resources was to improve data quality and accessibility. Efforts to increase the quality of data within the database included two major projects; one to identify and remove all vector contamination from sequences in the database and one to create premier sequence sets (including both alignments and discontiguous sequences). Data accessibility was improved during the course of the last year in several ways. First, a graphical database sequence viewer was made available to researchers. Second, an update process was implemented for the web-based query tool, Maestro. Third, a web-based tool, Excerpt, was developed to retrieve selected regions of any sequence in the database. And lastly, a GSDB flatfile that contains annotation unique to GSDB (e.g., sequence analysis and alignment data) was developed. Additionally, the GSDB web site provides a tool for the detection of matrix attachment regions (MARs), which can be used to identify regions of high coding potential. The ultimate goal of this work is to make GSDB a more useful resource for genomic comparison studies and gene level studies by improving data quality and by providing data access capabilities that are consistent with the needs of both types of studies.