As of May 23, 3006 this has been removed from the standard tutorial, and it is not being updated to reflect more sensitive centroid and population analysis. It is replaced by HD-analysis-script, which does almost everything this does as well as much more.
Other than viewing the spectra, there are two other methods of analysis currently available in the DEX program package, which are centroids and populations of each peak. Both of these methods are utilized by the same script; HD-analysis-script. To run this script use the following format:
./HD-analysis-script-old (auto/cent(roid)/both) filename new-filename (1/2) (view) number
where auto/cent(roid)/both is the option to either have the program automatically determine whether the level of resolution for each peptide has enough signal to determine the individual populations profiles (auto), just output the centroid only (cent), or output the centroids and populations (both). If auto is chosen the program will see if the signal-to-noise is high enough for an accurate populations profile. If it is not high enough, the program will just output the centroid values, if it is high enough, the centroid and population profile will be outputted. It is strongly suggested that the viewer check the output when set on “auto” to see if the program is sensitive to your needs. If it is outputting only the centroids when you wish to have both populations and centroids, then type “both” for output format. Argument 2 is the filename you desire to analyze, and argument 3 is the new filename that will contain the calculations for the information you select in argument 1. Argument 4 is either “1” or “2”. 1 is chosen to have the program calculate the centroid and populations of the deconvoluted weights, while 2 is chosen to have the centroids and populations determined for the experimental spectrum without deconvolution. Argument 5 is “view” which may be typed if you wish to look at each peptide one at a time with the GRACE graph of the result automatically displayed for each peptide. This is the most useful method for exploratory investigation. Argument 6 is the number of the nth sequence in the list that you wish to analyze (Required when argument 5 is “view”). If view is run, then each sequence mass envelope is the file is displayed in a GRACE window one at a time with the populations and centroid values for first the experimental and then the deconvoluted weights are displayed to the standard error of the command line, and the new file will not contain any saved information. The view option is thus most useful for interactive viewing, while leaving argument 5 and 6 empty is best for an automatic analysis.
To calculate the deconvoluted populations for all envelopes in the file mass1260.dat that was created above using Grab-1-peptide-script type:
This will run the HD-populations program and some information about the calculation will display to the screen. To view your output, type “vi examples/mass1260.pop”. You can also check your output with the standard provided by typing: “diff examples/mass1260.pop examples/mass1260check.pop”. The first 11 lines of the new file mass1260.pop is shown below to illustrate the format of the summarized result:
From file: examples/mass1260.dat
Background Std.Dev. min max ALIGN_OFFSET interval
135.12 314.27 -548.54 39.24 1.5000 0.1000
Peak Mass-Range Area Peak/Noise Percent Weights
0 1260.5951 1260.8951 36676.11 67.2 89.08 100.00
1 1261.5951 1261.8951 -312.76 -0.6 -0.76
2 1262.4951 1262.7951 -3589.33 -6.6 -8.72
3 1263.4951 1263.7951 -593.23 -1.1 -1.44
Centroid= 0.00 for all %Weights listed
The first line of the file (above) is a one line file identification of the origin of the data file used for the calculation. Then for each sequence envelope in the file, there is one line for the sequence and original file location followed by the mass/charge of the peptide. The next line is the number of that sequence in the file, while the next line details background information displayed in 4 columns. The fifth column is the alignment offset (ALIGN_OFFSET), and will be the standard 1.5 if there is no misalignment in the m/z values, but may be off if the original MS spectrum was not calibrated properly. The sixth column is the “interval” that the user specified or was automatically chosen by the program. The next set of identifiers for the remaining columns for each sequence lists information about the peak populations. This population information is either for the original MS peaks or the deconvoluted results, depending on the identifier the user chose when running HD-analysis-script. In this example, the deconvoluted peaks were chosen, so the areas, peak/noise and percent are reported for the first peptide are reported above. The “Peak” column is the identifier, with 0 being the monoisotopic mass. The “Mass-Range” columns are the second and third columns, which detail the range over which each peak was taken for the calculation. This can be checked against the graphs to see that the correct peaks were identified and the tails of the peaks were appropriately found. The “Area” column is the area above zero for each mass peak, after removing the average background value shown under the “Background” label. The “Peak/Noise” gives a good estimate of the signal-to-noise and hence the quality of the peak. It is found by dividing the “Area” by the “Background” and by the number of points used from the file for each peak. The “Percent” is the normalized percent for many of the peaks, assuming the negative peaks are treated by their absolute values.
The “Weights” column is a further refinement of the “Percent” column, using an algorithm to improve the characterization of the number of peaks and their real percentage of the total signal. This algorithm will not report the negative peaks, as they are physically unrealistic, and stops when it finds the first negative value after a significant peak. This program can greatly simplify the calculation of the profiles, but it must be checked carefully with visual inspection to see the all the peaks were properly accounted for. The populations of mass envelopes with overlapping spectra or low signal-to-noise (largest Peak/Noise <10-15) are currently not very well calculated by this program, and it is strongly suggested the user check all values for accuracy. If a problem exists, the “Percent” column values can be used to recalculate manually the centroid or the re-normalized populations using only the peaks you desire. In this spectrum, which had no backbone deuteration, 100% of the populations (Weights column) was found in the first peak (the monoisotopic peak). Since this is deconvoluted data, that suggests that all of the peptides observed were without any backbone amide deuteration. See the other time points for evidence of increasing backbone amide deuteration.
The centroid of all the peaks in the “Weights” column is after the population information. If the “Weights” has all the peaks you want, then this centroid needs no adjusting. If the “Weights” column is too short, then a manual centroid will have to be calculated using the values from the “Percent” column. In this case, the centroid is 0.00, which indicates that none of the peptides measured had any backbone
To calculate the centroids for all the deconvoluted envelopes in the file mass1260.dat type:
This will run HD-populations to output only the background information and the centroids into the file mass1260.cent. This file will have the same identifiers as the previous one, but it will not have the peak population columns and information. All the same information applies to this data as it does to the calculation of the centroid with the “both” option.
To view as well as calculate the populations and/or centroids for both the experimental spectrum and the deconvoluted weights for file mass1260.dat starting with the first envelope type:
This will bring up the first envelope’s GRACE window using the program view-HD-spectrum which was also used in View-HD-spectrum-script (above). This script will also calculate the populations and centroids for the original MS spectrum and the deconvoluted results, in that order. It does not matter what the 4th argument is (either 1 or 2) as the script will calculate both the experimental (undeconvoluted) and deconvoluted population data. This option is very well suited to most users’ investigations, and may be extremely helpful. It does not save any of the populations or the centroid values, though, as they are only printed to the command line screen. After typing the line above into the command line, the window will appear with the two profiles. Then scroll up the command line window until you see the line “This is the EXPERIMENTAL (UNDECONVOLUTED) data.” Immediately below this line the population result from HD-populations and centroid value for the original MS experimental values (black solid lines on graph) are displayed for the first sequence mass envelope. Below this there is some diagnostic information and then the population results for the deconvoluted data (red dashed lines on graph) under the line “This is the DECONVOLUTED (POPULATIONS) data.” The centroid for this population of peaks is displayed immediately below.
Near the bottom of the display is the phrase, “Real spectrum is number =” and the number following this is the actual number of this envelope in the file, not what is reported on the GRACE window next to the title. To view the next envelope in the file, simply close the GRACE window, and the next one automatically pops up, with the populations and the centroids all calculated and displayed to the command screen like the first one. Continue clicking through until all the envelopes are viewed and then the automatically program ends.