What it is
The GVP Finder is an excel spreadsheet that searches through exported peptide data from X!Tandem using the Global Proteome Machine (GPM). The spreadsheet uses logical formulas to find previously-identified genetically variant peptides (GVPs) and reports random match probabilities (RMP) based on information from the thousand genomes consortium. Only GVPs found in hair and skin are currently used here.
How to use it
Data must be first analyzed in the GPM Fury software. Default GPM Fury settings are used in the advanced tab except for exclusion of viruses and prokaryotes, peptide and protein log(e) of -1, fragment mass error of 20 ppm, parent mass error of 100 ppm, and inclusion of point mutations. Note that these settings may change based on the instrumentation used. After analysis, export the peptide data and paste columns A-Q into the first tab of the GVP Finder as shown...
It may take a minute for the spreadsheet to run through the calculations. After the calculations are finished, click on the Results tab in excel to obtain the binary results for the GVP matrix and corresponding RMPs.
How it works
Excel logical formulas are used to search for specific peptide sequences, quality scores, and modifications for every peptide identified in the dataset. Quality scores are set based on false positive detections. RMP is calculated based on allele calls and assumes that the detection of a single allele may either be heterozygous or homozygous. To account for linkage disequilibrium (LD), all GVPs in the same open reading frame are considered to be in the same linkage block. Therefore, the combination of identified alleles in each block is searched for together in the thousand genomes consortium for each of the five populations, and a new genotype frequency is assigned for the linkage block.
A word of caution
Estimation of RMP is conducted assuming LD within each open reading frame and no LD between open reading frames. This is not a complete solution to account for LD, but an intermediate step in the process. Future versions of GVP Finder may incorporate linkage block combinations based on a linkage analysis.
False positive rates have not yet been determined, and may be due to proteomic artifacts, software analysis, or incorrect excel formulas. It is advised that you have matching genomic data to assess false positives in your data. If you notice a problematic GVP or an error in the spreadsheet, we'd love to hear about it and help fix the issue. Email gjparker@ucdavis.edu
Disclaimer
Dr. Glendon Parker has a patent based on the use of genetically variant peptides for human identification (US 8,877,455 B2, Australian Patent 2011229918, Canadian Patent CA 2794248, and European Patent EP11759843.3). The patent is owned by Parker Proteomics LLC. Protein-Based Identification Technologies LLC (PBIT) has an exclusive license to develop the intellectual property and is co-owned by Utah Valley University and GJP. This ownership of PBIT and associated intellectual property does not alter policies on sharing data and materials. These financial conflicts of interest are administered by the Research Integrity and Compliance Office, Office of Research at the University of California, Davis to ensure compliance with University of California Policy.
Citations
Please cite the following manuscripts if you use the GVP Finder for publishable works.
Goecker, Z. C., Salemi, M. R., Karim, N., Phinney, B. S., Rice, R. H., & Parker, G. J. (2020). Optimal Processing for Proteomic Genotyping of Single Human Hairs. Forensic Science International: Genetics, 102314.
Craig, R., & Beavis, R. C. (2004). TANDEM: matching proteins with tandem mass spectra. Bioinformatics, 20(9), 1466-1467.
Craig, R., Cortens, J. P., & Beavis, R. C. (2004). Open source system for analyzing, validating, and storing protein identification data. Journal of proteome research, 3(6), 1234-1242.
Files