mgf_search_result_annotator

The mgf_search_result_annotator embeds identification data in MGF files to be processed by the spectra_cluster algorithm tool suite.

This tool adds search results to an MGF file by adding the identified peptide sequence as the SEQ= field to MGF files. This identification data is picked up by the spectra-cluster tools and added to the .clustering output files. Modification data is currently omitted (this is not the case in the internal PRIDE Cluster pipeline).

The spectra-cluster tools (ie. the spectra-cluster-cli tool) expect identification data to be embedded in the processed MGF files. Even though this method is unorthodox it significantly simplifies the development of clustering tools as these do not have to worry about the used search engine or search result formats. Additionally, when building the PRIDE Cluster resource we have to rely on this technique since the identification and spectrum data is exported from the PRIDE Archive database.

Usage:
mgf_search_result_annotator.py –input=<spectra.mgf> –search=<search_result.mzid> –output=<annotated_spectra.mgf>
[–format=<MSGF+>] [–fdr=<0.01>] [–decoy_string=<REVERSED>]

mgf_search_result_annotator.py (–help | –version)

Options:
-i, --input=<spectra.mgf>
 The original MGF file to use as input.
-s, --search=<search_result.mzid>
 The path to the search result. Note: The search must have been performed on the input MGF file directly. Otherwise, the matching between identification data and spectra may go wrong.
-o, --output=<annotated_spectra.mgf>
 Path to where the annotated MGF file should be written to.
-f, --format <MSGF+>
 The format of the search results. Possible options are “MSGF+”, “MSGF_ident” (MSGF+ mzIdentML files), “MSAmanda”, “Scaffold”, “XTandem”. [default: “MSGF+”]
-d, --fdr=<0.01>
 Define the FDR by which the input search results are filtered. If the FDR is set to ‘2’ for Scaffold output, the original cut-off is used. [default: 0.01]
--decoy_string=<REVERSED>
 The string to use to identify decoy proteins. [default: REVERSED]
-h, --help Print this help message.
-v, --version Print the current version.