The biomedical research community at large is increasingly employing shotgun proteomics for large-scale identification of proteins from enzymatic digests. Typically, the approach used to identify proteins and peptides from tandem mass spectral data is based on the matching of experimentally generated tandem mass spectra to the theoretical best match from a protein database. Here, we present the potential difficulties of using such an approach without statistical consideration of the false positive rate, especially when large databases, as are encountered in eukaryotes are considered. This is illustrated by searching a dataset generated from a multidimensional separation of a eukaroytic tryptic digest against an in silico generated random protein database, which generated a significant number of positive matches, even when previously suggested score filtering criteria are used
Article