spectra-cluster-py - Analysing clustering results

Welcome

The spectra-cluster-py project is a collection of tools and APIs that help analysing and working with MS/MS spectrum clustering results in the .clustering format.

The .clustering format

The .clustering format is currently used by the spectra-cluster algorithm (and API) and the output format of the derived tools:

An up-to-date documentation of the .clustering format can be found at the Java API clustering-file-reader project page.

Tools

The spectra-cluster-py project contains a set of end-user ready tools to analyse MS/MS clustering results in the .clustering format.

The id_transferer_cli transfers identification data to unidentified spectra. This can be used to improve the accuracy of label-free quantitation data.

The clustering_stats creates simply tab-delimited files with basic Q/C measurements of the clustering results. This is only possible if identification data is present in the .clustering file as these are used as gold-standard (see id_transferer_cli).

The cluster_features_cli creates a matrix with the input MGF file names as column headers and the clusters as rows. Each cell contains the number of spectra per file and cluster. This can be used, for example, to run a principal component analysis of the input files based on the clusters.

The protein_annotator can map peptides in a text file to proteins from a fasta file. Additionally, basic protein inference can be performed.

The cluster_result_comparator can be used to compare two clustering result (in the .clustering format). The comparison is performed by creating a network representation where clusters are nodes and edges are created based on shared spectra. If Cytoscape is running before the script is launched, the network is automatically displayed in Cytoscape.

The complete list of tools can be found here.

In the python package the source code of these tools is at spectra_cluster.ui.

Python API

This collection of classes is intended to help you develop your own scripts to analyse MS/MS clustering results in the .clustering format.

You can find the complete API documentation here.