Data Formats and Pre-processing
Mass Spectrometry data analysis is plagued by an overabundance of file formats. The good news is that the Mass Spec community, including many instrument vendors have developed a standard file format for raw data, mzML. The bad news is that many of the old formats are still in widespread use, and most instruments don't produce it natively. The reference implementation of the mzML standard is a software suite called ProteoWizard. ProteoWizard includes a very handy tool called msconvert
that is capable of converting raw data from most instruments into mzML
or into one of many other formats. In addition to format conversion, msconvert
can also perform a wide variety of noise filtering and peak-picking functions to prepare data for analysis. A typical pre-processing involves;
- Conversion from instrument .raw to mzML
- Peak picking on both MS1 and MS2 data using vendor-native peak picking routines (built in to msconvert)
- Denoising of MS2 data either by thresholding or by keeping only the largest peaks withing a moving window
- Convert spectrum identifiers into a standardized format
To convert files from raw instrument native formats to mzML
a windows PC is required. If you need to do this, be sure to download ProteoWizard with vendor reader support . This package comes with MSConvertGUI
which allows conversion of raw files using a graphical interface. Once files are in mzML
or mgf
format they can be converted to various other formats using the msconvert3
tool in Galaxy.