This index is then used as a reference to generate a fragment index that allows for rapid retrieval of theoretical spectra that contains a fragment of a query mass. (b) MSFragger digests a protein database and generates a non-redundant set of peptides that are arranged in a peptide index. ( a) Conventional database search involves in-silico digestion of a protein database into candidate peptides from which theoretical spectra are sequentially generated and compared against experimental spectra one at a time. MSFragger is platform independent, not limited to data from a particular MS instrument, and can be easily incorporated into most existing data analysis pipelines.ĭatabase search strategies and the MSFragger algorithm Finally, open searching uncovers, and provides a potential solution to the problem of inaccurate false discovery rate (FDR) estimates in traditional narrow window searches due to unaccounted peptide modifications. We further demonstrate MSFragger’s utility in the analysis of protein-RNA crosslinked peptides and affinity purification mass spectrometry (AP-MS) data. It is capable of performing open searches with variable modifications, making it applicable to data from labeling-based quantitative proteomics experiments. MSFragger makes open searches feasible even for very large datasets containing millions of MS/MS spectra, helping to reconstruct modification profiles and to uncover dramatic differences in the modification rates across different experiments. We implemented this method in a new database search tool MSFragger. In our quest to develop a broadly applicable and fast computational strategy for open database search we designed a novel fragment ion indexing method that provides orders of magnitude improvement in speed over existing tools. using wide precursor mass tolerance of hundreds of Daltons allowing for identification of modified peptides) searches using conventional database search tools. These efforts are exemplified by a recent report 4 exploring the feasibility of “open” (i.e. However, the proteomics community continues to search for practical computational tools for this task. A number of computational strategies emerged for the detection of such peptides including multi-step database search 9, 10, curated modifications search 9, 11, spectral-pair based methods screening for modified versions of peptides initially identified in unmodified form 12– 15, sequence tagging 16– 20, and spectral alignment 21, 22 (reviewed in 23). mutations and splice isoforms) – that are unaccounted for in traditional database search and thus remain unidentified 4– 8. We and others have been fascinated by the underlying complexity of the “dark matter” in shotgun proteomics 3 – including the vast diversity of post-translational modifications (PTMs) as well as novel sequences (e.g. However, even given significant improvements in the quality of MS/MS data acquired on modern mass spectrometers, a very significant fraction of spectra remains unexplained. The most commonly used computational strategy is based on searching acquired tandem mass (MS/MS) spectra against a protein sequence database using database search algorithms 2. Peptide identification algorithms have served as a cornerstone of shotgun proteomics for several decades 1. We also discuss the benefits of open searching for improved false discovery rate estimation in proteomics. We further illustrate its utility using protein-RNA crosslinked peptide data, and using affinity purification experiments where we observe on average a 300% increase in the number of identified spectra for enriched proteins. Using some of the largest proteomic datasets to date, we demonstrate how MSFragger empowers the open database search concept for comprehensive identification of peptides and all their modified forms, uncovering dramatic differences in the modification rates across experimental samples and conditions. We present a novel fragment-ion indexing method, and its implementation in peptide identification tool MSFragger, that enables an over 100-fold improvement in speed over most existing tools. There is a need to better understand and handle the “dark matter” of proteomics – the vast diversity of post-translational and chemical modifications that are unaccounted in a typical analysis and thus remain unidentified.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |