Establishing a method of vector contamination identification in database sequences

GA Seluja; A Farmer; M McLeod; C Harger; Peter Schad

Establishing a method of vector contamination identification in database sequences

Seluja, GA., Farmer, A., McLeod, M., Harger, C., & Schad, P. (1999). Establishing a method of vector contamination identification in database sequences. Bioinformatics, 15(2), 106-110. https://doi.org/10.1093/bioinformatics/15.2.106

Copy citation

Abstract

MOTIVATION: The nucleotide sequence databases are invaluable tools both for the private and the academic research communities, from the retrieval of sequences to homology searching. Several issues related to data quality, such as the existence of sequencing artifacts and errors, are facing the databases. We investigated a major source of these errors, i.e. the presence of vector-contaminated sequences. RESULTS: Using a panel of 180 vector polylinker sequences, we found 0.36% or 3029 vector-matching sequences in GenBank Release 95-96, with an average vector-matching length of 72 nucleotides. The number of vector-contaminated sequences has been growing with the database; however, the percent contamination has remained approximately constant at an average of 0.28% from 1982 to 1996. AVAILABILITY: Access to the database of vector polylinker sequences via sequence similarity searching is available at http://seqsim.ncgr.org/vector/ CONTACT: gas@molinfo.com

Publications Info

To contact an RTI author, request a report, or for additional information about publications by our experts, send us your request.

publications@rti.org

RTI shares its evidence-based research - through peer-reviewed publications and media - to ensure that it is accessible for others to build on, in line with our mission and scientific standards.

Recent Publications

Article

Factors influencing wasting in children under 5 in arid regions of Kenya

March 2026

Article

Psychometric evaluation of the weekly version of the PTSD checklist for DSM-5

March 2026

Article

Uptake of newly licensed influenza vaccine formulations among patients receiving chronic hemodialysis during the 2010/2011 to 2021/2022 influenza seasons

March 2026

Article

Multi-ancestry genome-wide association study and meta-analysis of lung function decline

February 2026

Article

A microsimulation model to assess the cost-effectiveness of physical activity policies among US adults: The physical activity, diabetes, and cardiovascular disease model

February 2026

Article

Estimating lifetime drinking trajectories for alcohol use from adolescence to older adulthood in the United States: A three-step approach

February 2026

Article

Challenging behaviors across COVID-19 in young children with rare neurogenetic conditions: A seven-year, cross-syndrome analysis

February 2026

Article

Hypocretin receptor 1 blockade early in abstinence reduces future demand for cocaine

February 2026

View All Publications