DEX – Installation/Tutorial
Download – Install – Tutorial
Back to DEX home page
FFTW – Fast Fourier transform libraries
FFTW must be downloaded and installed before the installation of DEX. FFTW 2.1.5 is the version used for DEX, and is available at (www.fftw.org). Follow their instructions for proper installation or use the instructions I found that works in the DEX User’s Guide under the heading “Downloading fftw-2.1.5 directly from the FFTW website.”
GRACE – Graphing package for visualization of results
GRACE (http://plasma-gate.weizmann.ac.il/Grace/) is not required for running the deconvolution engine in DEX, but all of the visualization tools use this graphing software. Installation of GRACE is highly recommended for anyone wanting to analyze H/D MS peptide data. Follow their instructions for proper installation. After installation, add the alias “xmgrace” with the location of the xmgrace executable to your source shell page. You have correctly added the alias when you type “xmgrace” on a command line and you pull up a window with an empty graph in it. For some help, see the DEX User’s Guide under the heading “Downloading GRACE directly from the website.”
DEX – Deconvolution of EXchange Data
DEX must be installed after FFTW (and hopefully GRACE) are successfully installed. See DEX home page for download instructions.
Installation of DEX
If you do not have a UNIX-like environment (e.g. Linux or Macintosh) on your computer, see the DEX User’s Guide for information on how to download and install one before following the instructions below. Create a new folder by typing "mkdir Dex", and "mv DEX.tgz Dex" to move the file to the new folder. See DEX User’s Guide if there is no file called “DEX.tgz”. "cd Dex" to change to the new folder, and expand the file "DEX.tgz", by typing:
Then to extract the tar archive file type:
To compile all the DEX programs (written in C and C++), make sure that FFTW version 2.1.5 is already installed. See the DEX User’s Guide under the title “Downloading fftw-2.1.5 directly from the FFTW website” for more information. You will need the standard gcc and g++ compilers installed for this process. Once fftw-2.1.5 is correctly installed, just type:
where: pathname1 is the pathname to the FFTW include files and
pathname2 is the pathname to the FFTW lib libraries.
Cygwin Users: If you installed FFTW in the suggested place (/home) just type this:
The programs now should all be compiled properly, and you should be able to run DEX and the analysis programs. If you trouble finding the right path to the FFTW files, and need to run it again, just retype the line above, and it will automatically remove the old files. See the DEX User’s Guide for more information on installing FFTW from the web. (The “./” at the beginning of the command is not always necessary and may be deleted if your setup looks for files in the current directory. It doesn’t hurt to include it though.)
For Cygwin or other platform users
If you type “ls” and you see some of the files have a “.exe” at the suffix, then the program has renamed the files for you and you will need to rename them back with the script Filename-change-script by typing:
Where “.exe” can be changed to another suffix if another one is used and “./” indicates to search in the present folder.
Checking programs for correct installation
If you wish to check the installation of the programs, type:
This will run the program to generate the isotopic abundances (called isotopic-fast-profiles) and the main DEX program to deconvolute the provided example data (called DEX-resample-script). This Run-test1 script will print to the screen the first 10 lines of the newly generated file, temp.profile, and compare it with the standard provided, examples/PKA-C-subunit-profiles. These files should be identical, or nearly so. The script also prints out the first ten lines of the newly deconvoluted file Temp-0min.dat with the standard provided called examples/time-0mincheck.dat. They should also be exactly the same. If the files differ, check to make sure that you have properly installed fftw-2.1.5 (see the DEX User’s Guide under the heading “Downloading fftw-2.1.5 …” for more information) and that you have followed the instructions above. A second check to see that you have the optional graphing program GRACE installed is run by typing:
This will run the program that generates the GRACE graph automatically up on the screen. The graph should look like this graph. If the graph does not appear or there is a problem generating the graph, then you know that either GRACE is not installed properly or that the alias “xmgrace” is not linked properly in your environment. Please refer to the “Downloading GRACE …” section in the the DEX User’s Guide for brief instructions on how to download GRACE.
Running DEX (with examples tutorial)
Update: Some sections (viewing multiple graphs, etc.) have been completely redone as of April 2006. Please download the newest version of DEX to make sure you have correct version.
Below is a tutorial for running DEX on real MALTI-TOF Mass Spectrometry (MS) data of the C-subunit of PKA in the free state. It should also work with any other high resolution mass spectrometry data such as that of ESI-TOF and FTMS. Please read the descriptions of the method as written in the journal Protein Science:
When running the scripts, typing only the script name in the command line with no arguments will always cause the script to display information about the proper arguments needed to run each script. This is useful for proper execution and helps reduce the amount of information needed for reference.
As of the beginning of 2006, the DEX software package also supports multiply charged peptides such as those found in ESI-TOF mass spectrometry data. This is accomplished by adding the m/z charge after each sequence before calculating the isotopic profile.
Generating isotopic and fast exchanging profiles
The folder “examples” contains sample data to test the software and become familiar with how each part of the package works. The file PKA-C-subunit-sequences contains 5 sequences from the C-subunit of PKA. The first step is to create the isotopic profiles from sequences using the program isotopic-fast-profiles. The general format should look similar to this:
./isotopic-fast-profiles sequence-file new-filename %-deuteration (1,2,3) abundance-file (optional)
To generate model profiles for the tutorial data provided, in the main DEX folder, type:
The sequence file should have all the single letter amino acids in CAPITAL letters for the peptides desired. The m/z or charge of the peptide should follow the sequence by one space. If no m/z value is given, the program assumes a m/z= +1. Other groups are not currently supported, but may be added in the source code which is provided. Argument 4 has 3 options (1,2,3); for isotopic distribution only (1), fast exchanging side chain profile only (2), or both profiles together (3). The abundance-file is optional, as there is a standard one provided and these values will be displayed on the command screen during execution. See the DEX User’s Guide for more information on how to make your own isotopic abundance file. The %-deuteration is ignored when the isotopic distribution only is selected.
To view the new output file type:
You can use the command “vi filename” to view any file or output for this tutorial. You can escape the vi program by typing “ZZ”. The new output in this case is the combination profile of the isotopic and fast exchanging side chain profiles in the file PKA-C-subunit-profile. This file should be identical to the file names PKA-C-subunit-profiles which is provided and is in a standard format for other programs to use. If you wish to make a profile by another program, follow this format for use with other DEX programs. In the file, the format is sequence first, followed by the m/z charge. Below the sequence, column 1 is the mass units above the monoisotopic mass; column 2 is the mass for each peak, with the first one being the monoisotopic mass. Column 3 is the percent of the population with that mass, and column 4 is normalized to the highest peak in column 3, which is used in DEX. The mass ranges of these 5 peptides ranges from im/z= 1088 to 1267.
Analyzing protein sequences for the number backbone amides and fast exchanging side chains
A separate program and script have been written to automatically calculate the maximum number of backbone amides in each sequence and the number of fast exchanging side chains plus terminal hydrogens that exchange under quench conditions. It will also determine the centroid value for the side chain exchange given a final deuteration percentage in the quench phase. This program is quite helpful to sum all the potential side chain exchanging sites; otherwise the process is laborious and prone to error. To run the script type:
where the first argument is the filename with just the sequences in capital letters (the same file as for generating the model profiles above), the second argument is the new filename for the results and the third argument is the final percentage of deuterium in the quench conditions as a decimal, not a fraction. In this case, the value .045 means 4.5% of the final solution is D2O. The output in the new filename lists the sequence first followed by a number identifying it in the file. The m/z charge is not displayed in the output. The next two lines are the description and value for the maximum number of exchanging backbone amides, the number of side chain and terminal hydrogens that exchange during quench, the centroid of the fast exchanging side chains given the user-chosen percentage of final deuterium in solution, and finally the maximum total centroid value for the peptide given 100% deuterium conditions during the deuteration part of the experiment. This value can be useful for determining the back exchange correction factor in the control experiments.
Deconvoluting the experimental MS data
The next step after generating the theoretical model profiles is to deconvoluted the MS data. The five files beginning with “time” in the “examples” folder are 5 time points for the length of deuteration before quench of MALDI-TOF data. Each file contains all the sequences at the time points. The MS data files must contain at most three title lines at the top. The deconvolution program will only read ASCII txt files with two columns of information. The first column containing the m/z and the second column has the value at that specific m/z. Any files with more than 2 columns should be reduced to these two first before running this script. To deconvolute one file (time point) with all the sequences in a model profile on should follow this format:
./DEX-resample-script model-spectrum MS-spectrum new-output-file interval(optional)
To deconvolute time-0min.txt from PKA-C-subunit-profile in the tutorial example, type:
The interval is the spacing between points that determines the resampling level. For data with a m/z charge of +1, an interval value of 0.100 is normal, but it may be made lower. Interval values that are supported in this software include 0.020, 0.025, 0.0333, 0.040, 0.050, 0.100, 0.200 and 0.250. Values that are lower than the spacing in the MS experimental data will not work at the moment. For the average user, the interval value does not need to be explicitly stated, as the program will chose an appropriate one for each m/z charge value of the model peptides. The program will look at the m/z of the model spectrum and automatically determine the correct part of the MS file to extract. The 0 minute quench only file is a good spectrum to start, and the model spectrum is the file PKA-C-subunit-profile generated above.
The new deconvoluted data file “time-0min.dat” can be compared with the correct standard provided in the examples folder by typing:
“diff examples/time-0min.dat examples/time-0mincheck.dat” and the result should be no lines are displayed. The DEX-resample script will resample the MS data on the run according to each peptide’s required interval spacing. In this case, the interval 0.100 was used. This will create the deconvoluted output file time-0min.dat for all five sequences in the model profile. The first 6 lines of the DEX deconvoluted output file is shown below:
M/Z model exp_spec weights
1087.1580 0.000 427.70 0.00
1087.2579 0.000 586.49 0.00
1087.3579 0.000 722.81 0.00
1087.4580 0.000 436.49 0.00
(If numbers look slightly off, you have an old version of DEX, please install the new one as it has
added functionality). It begins with the sequence, a space and then the m/z charge. The next line contains the 4 column headers. Column 1 is the m/z mass values, column 2 is the model (theoretical) spectra which is the 4th column in the model profile file in fraction form. Column 3 is resampled values from the experimental MS spectrum, according to the interval used for that peptide, and column 4 is the deconvoluted values (populations) at the same m/z, on the same vertical scale as column 3. This output file format is read by all the analysis programs. After running this script, the last two lines will be options to automatically view all the results and analytically analyze them. Copying these commands into the command line should allow you to easily check the results, but these features will be explained later in this tutorial.
Instead of running the DEX-resample-script each time to deconvolute one file, DEX-master-script will run all the files in a folder with a given model sequence. To run the rest of the experimental MS .txt files in the “examples” folder type:
where txt is the suffix of the files you want to deconvolute, “examples” is the folder where the experimental MS spectra are located and examples/PKA-C-subunit-profile is the model profiles. Running this results in the deconvolution of all the files in the folder you, specified with the new filenames being the same as the experimental names but the suffix “dat” replacing the suffix searched for in argument 1. The output files are the same format for both DEX-resample-script and DEX-master-script.
Viewing original and deconvoluted spectra
(Note: GRACE must be installed prior to running any of these scripts for viewing the results. See the top of this page for more information on installing GRACE. You have correctly installed GRACE and can run these scripts when you type “xmgrace” and an empty graph automatically pops up on the screen). (Note: If you are running Cygwin, you should have typed “startx” to bring up the graphical window before proceeding with the following commands in the new window). To view the original MS spectra, without any resampling, type:
where argument 1 and 2 are the lower and upper m/z bound, respectively, for the graphing window, and examples/time-0min.txt is the original MS spectra you want to view. Executing this command will automatically pop up the GRACE window with the spectra with argument 1 and 2 for x axis range. Click here to see an image of the result. You can use the GRACE interface to change the x or y axis (push the “AY” button to the left to auto scale the y axis), and view all the spectra within the original limits of the MS file (push the “As” button on the top left to auto scale in all directions). This will allow the user to see any overlapping regions not shown in the deconvoluted graphs as discussed below.
Viewing the deconvoluted results is of more interest, and there are two ways to view them. You can choose to view one sequence in a window at a time, cycling through each sequence. Conversely, one can choose to view up to 8 graphs at once stacked next to each other in a single window (which is useful for watching changes over a series of experiments).
To view the sequences one at a time, use the script:
where (all/one) identifies if you want to view only one spectrum, or all of them in the deconvoluted spectra file. You can select the sequence you want to view by searching for a specific sequence, mass (rounded down to nearest whole number) or number in the file, and argument 4 must be the correct identifier for the search type. You can automatically print out the files to a user chosen printer, but this is optional. The printer-name is required if “hardcopy” is written for argument 5. If the first argument is “all”, then the user must use the “num” as the sequence identifier in argument 3. It will display the nth sequence in the file (argument 4), and after the user closes the GRACE window, the next sequence in the file will automatically pop up. This is a very efficient way to view all the sequences in the file. You can not search by mass or sequence when choosing to view “all” the files. The user may save the graphs to a GRACE file and create PostScript or many other types of images of the results with the GRACE window interface.
To view all the sequences in the time-0min.dat file, starting with the 2nd sequence in the file, type:
This will automatically bring up the first GRACE window, which has the 2nd sequence in the file examples/time-0min.dat, sequence IYRDLKPEN. Click here to see an image of what the graph should look like. The black solid line represents the original MS spectra resampled by the interval (either user-defined or automatically determined), while the red dashed lines represent the deconvoluted results. If just the isotopic model profile was deconvoluted, then the results are the total deuteration (side chain plus backbone exchange for MALDI data). If the isotopic and fast exchanging side chain profile was the model to extract, then the red dashed lines represents the backbone amide populations for MALDI data. ESI data may not have side chain deuteration, and so only the isotopic model needs to be deconvoluted from the experimental data. The sequence is displayed as the title, next to the sequence is the sequence’s number in the file. Below the sequence the monoisotopic peak m/z is displayed, the m/z is shown, and finally, the filename where the graph came from. The monoisotopic peak is set 1.5 mass units from the x axis minimum, and any peak before this could indicate a contamination. The x axis range is roughly 1.5 times the sequence length, but the values higher than the end of the mass envelope have no mathematical impact on the mass envelope’s values on the lower side. The presence of negative values in the red dashed lines could suggest a mismatch between the theoretical profile and what is observed in the MS file. To observe the next sequence in the file, simply close the GRACE window, and the next mass envelope automatically comes up. The title again displays the information about this sequence, and can be visually analyzed for accuracy. Continue clicking through all the sequences until you have finished, or you can return directly to the command line by typing “Ctrl c”. A message will show on the command line that all envelopes were viewed.
Viewing multiple graphs
This section has been completely redone as of April 1, 2006. Please download the newest version of DEX to make sure you have the correct version.
Viewing the same sequence over a series of different time points and seeing the progression of the deuteration exchange can be very helpful in analyzing the data. Viewing multiple spectra at once is very easy and customizable using many user-specified options. There are many ways to customize the files and spectra used, so this tutorial will only show a few examples. For more information about the options for selecting and viewing multiple spectra, see the webpage for all the Command-Line-Options webpage and look under “View-many-HD-spectra-script”. To view all time points or trials of the same sequence use this format:
where sequence/number/mass is the method of identifying the sequence to display and the identifier is the sequence, monoisotopic mass or number in files to view (like the search method used above). After the sequence identifier, up to 8 filenames must be added to the command line. If more than 8 filenames are added, only the first 8 selected will be used in all cases. To view all five time points in alphabetical order (0, 1, 2, 5 and 10 minutes deuteration) in one stacked window for sequence SKGYNKAVDW, type:
This will result in a GRACE window selecting sequence SKGYNKAVDW (m/z=1167.57) from all the files found to start with “time” and end with “min.dat” in the examples/ folder. Click here for an image of this result. This program will display the files in order of their alphabetical order, with the legend displaying the filename for each graph. The sequence is shown at the top, along with the monoisotopic mass and mass/charge ratio. Notice that the 10 minute time point is before the 1, 2 and 5 minute timepoints. If you want to look this sequence in a chronological order, then you will need to either explicitly add all the files on the command line or make a file that contains a key to all the files. This file is already made for the example data, and is called Examples-time-key.list in the examples/ folder. Here is the contents of the file Examples-time-key.list:
# Filename Time Description Line_num
time-0min.dat 0 F_0m 1
time-1min.dat 1 F_1m 1
time-2min.dat 2 F_2m 1
time-5min.dat 5 F_5m 1
time-10min.dat 10 F_10m 1
The first line must begin with a # if you want it to be a description of the columns, but it may be omitted in your own files. If it is omitted, each column will assumed to be in the order shown above. The 2nd to 4th columns describe the time of deuteration, the arbitrary description of the data, and the line number (for centroid, peak width and population graphs), respectively. Only the filenames need to be in the list, the other information is not needed to select certain files. Suppose you would like to use all the files here in chronological order and have the legend be the description, not the filename as was used earlier. You will need to use the --filelist FILELIST option to specify the script to read the filenames from the file Examples-time-key.list. Do this by typing:
The prefix-for-files examples/ is necessary to tell the program where the filenames in Examples-time-key.list are located. Now the user can view the progression of the deuteration in the sequence SKGYNKAVDW chronological order. Click here for an image. The filenames has been replaced by Description column in the file list for the legend of each graph. In this example, the backbone deuteration (red-dashed lines) shows at least 4 deuterons sites that exchange by the 10 minute time point.
There are many more options the user can choose to automatically view any list of spectra once the filelist is generated. Click here to see the webpage for all the Command-Line-Options webpage and look under “View-many-HD-spectra-script”. Here is a more complex search:
Here is the result, with the second file (--file_num 2) chosen from the Examples-time-key.list, all the spectra that were selected are from just one file (--one_file), the legend is now the sequence (--legend SEQ) and the title is called “1min.spectra”. Click here for an image.
Creating a file of all trials of one sequence
This section has been completely redone as of April 1, 2006. Please download the newest version of DEX to make sure you have the correct version.
Often it is helpful to put all the trials and time points of one sequence into one file. For example, suppose you just want to look at the deconvoluted result from sequence IYRDLKPENL (m/z=1260.70), which is the 5th sequence in each file for each of the five time point files in the examples folder. By using Grab-1-peptide-script, it is possible to copy each deconvoluted mass envelope from all the files into one file for view or analyzing. This is done by typing:
where sequence/number/mass is the method of identifying the sequence to display and the identifier is the sequence, monoisotopic mass or number in files to view (like the search method used above). For Grab-1-peptide-script, you can put many filenames into one file (not just up to 8 as for viewing many spectra). There are several other options that allow you to specify the output, see the Command-Line-Options webpage for more information under the “Grab-1-peptide-script” program.
You can use this program to directly input the files from the command line, or use a filelist key, similar to the process above for viewing many spectra. To just grab all the deconvoluted files with the sequence IYRDLKPENL in the examples/ folder that begins with “time” and ends with “.dat”, type:
With this program, any file that is not found or does not have the correct format is skipped and the next file is automatically read. If you do not specify an output, the program will output the results to the screen. If you want to output the results to a file, either type “> output_file” at the end of the commands, or use the option --file_output FILE_OUTPUT, where FILE_OUTPUT is the name of the file you want to store the results to.
You can used the same prepared filelist key that was used above (Examples-time-key.list) to explicitly use the files you want. To do this, you should use the --filelist options as well as the --prefix-for-files option to specify the location of the files, as done in the viewing many spectra section. Here is an example to look at the same files as above, but in the order specified by the filelist (chronological order) and place the results in a new file called mass1260.dat.
The header line in the Examples-time-key.list file is ignored, as are any other columns other than the first one (which must have the filenames). All the files in the filelist are used in the order of the appearance in the file, as long as they are valid files with the right format.
Analyzing the deconvoluted results
This section has been completely redone as of May 23, 2006. Please download the newest version of DEX to make sure you have the correct version.
Other than viewing the spectra for comparison, an analysis of the centroids (first moment averages), population of each peak and peak widths are useful to quantify the difference between experiments or trials. A script program has been created to help you automatically and reliably determine these values in both text and a visual format. The program contains a list of options that make it customizable and quick to reproduce and vary the calculation for rapid analysis. It relies on many of the same options used in the script View-many-HD-spectra-script and requires that the mass spectrometry data is first processed by DEX-master-script or DEX-resample-script and the results be in the DEX format (such as the file time-1min.dat). For more information about the options for selecting and analyzing many spectra, see the Command-Line-Options webpage for more information under the “HD-analysis-script” program heading.
The basic arguments required for displaying the centroids or peak widths using GRACE and to output the results in a text format is:
./HD-analysis-script --program (cent,width) --sequence/number/mass identifier filenames(up to 100)
./HD-analysis-script --program cent --sequence/number/mass identifier --filelist (FILELIST_NAME) --prefix-for-files (PREFIX)
Here the --program cent option will display the centroids, while the --program width will display the peak widths. For example, to display the centroids for both the original observed mass spectrometry data (resampled to the interval previously chosen) and the deconvoluted backbone amide values for the fourth peptide (M(0)=1194.64) data in the examples/ folder using the provided key list, type:
The resulting graph should automatically pop up displaying this information (click here for an image). Here, the filled squares represent the Observed (resampled) centroids for the sequence DRIKTLGTGSF while the open circles represent the deconvoluted centroids, where the natural isotopic abundance and (in this case) the side chain deuteration is removed. The “1” in the legend refers to the “Line_num” column in the key Examples-time-key-list, and separates these values from other line numbers. On the bottom of the graph, below the x-axis, the filenames of each file show where the data came from, while the sequence and its information is shown at the top in the title. The x-axis is Time in this case, (taken from the “Time” column in the key file) and Deuterons (relative to the M(0) for each value) on the y-axis. Notice the continual difference between the two values, with the deconvoluted results lower by about 1.0 Deuteron at each time point. This is due to the removal of the offsets due to both the natural heavier isotopes as well as the side chain deuteration (explained above). This offset is approximately 1.35 D. Also notice that the 0 minute deuteration (quench only) is 1.24 for the observed, but 0.03 for the deconvoluted values, which is consistent with our model within error. To see and change the exact values shown on the graph, double click on a line and then double click on one of the sets. You can change the values if you find one incorrect by retyping a value and clicking on another value to save it.
The numerical data that is displayed in the graph is stored in a file called “x.env_temp” where “x” is the Line_num from the key list. In this case, the centroids are displayed in the file “1.env_temp”. Type “vi 1.env_temp” to view it. The file and sequence from which each centroid was derived is preserved as a comment. The file that contains all the Grace parameters for the Grace program to read is called “grace_param_analysis”, but it may not be of much value to those only interested in the output values. The file that contains all the information used to generate the centroids is called “HD_analysis_output”, and it shows the lower and upper bounds used to determine each peak population from which the overall centroid for that peptide is determined.
Note: It is important to check to see that the centroids were calculated as you desire, as the program has difficulty getting the correct value with low signal-to-noise or overlapping peptides. (Details about this format are described below under “Calculating the Centroids, Peak Widths and Populations).
If you don’t want to make a key list (highly recommended if you want to graph multiple lines or have complex search options), you can just type in the files and use the --find_time option to have the program find the time in the filename (it must be in the same place from the beginning for every file). To find and graph the same files as above, type:
This reveals the same information as above (shown here again). Suppose, using the key Examples-time-key.list, you want to display the centroid values for just the deconvoluted data on the graph, display the x-axis in logarithmic scale, display the description with the “Description” column in the key list, display the centroid values above each point, and start with the second file in the key (time-1min.dat) and using a back exchange factor of 33%. To do this type:
Click here for an image. Now, the values are almost in a straight line, with the values displayed above each deconvoluted centroid. The description is shown on the x-axis for the label (from the “Description” column in the key list). Simply change the “--number 4” to “--number 3” to view the previous sequence (SKGYNKAVDW) in each of the files. You can see how easy and fast it is to generate centroid plots for all the data you collect, with suitable customization for every application. There are many other options, and a complete list can be seen in the Command-Line-Options webpage under “HD-analysis-script”.
You can also use a single file and display the centroids for all the sequences in that file. This is helpful if you have used Grab-1-peptide-script and have combined all the information from one peptide into one file or you just want to see the disparity between values for different sequences at the same time point. For example, suppose you want to view the centroids for all the peptides in the 0 minute deuteration (quench only) and want to display the sequence of each as the descriptor below the x-axis. Just type:
The result is shown here. Now the centroids of all the sequences of each peptide are shown. There is no time scale for the x-axis, so the files are just displayed using an index starting at 1. The file is displayed on the top, as well as the first sequence in the file (along with its M(0)). Notice that all the “Observed” values are between 1.24 and 1.61, which is approximately the amount the natural isotopic abundance and fast exchanging side chains contribute to the offset under the quench only conditions for each sequence. The “Deconvoluted” values are all at 0.0, showing how the natural offsets have been effectively removed by the deconvolution. Change the file to “time-1min.dat” and you will quickly have all the values for the same sequences after 1 minute deuteration. A comparison shows the variety of exposure levels between different peptides in a protein. Click here for an image.
Peak widths can be displayed in much the same way as the centroids. The peak widths are for looking at EX1 vs. EX2 kinetics, which is not always necessary or important for your system. Substitute the “--program cent” for “--program width” in any of the previous commands and the peak widths will be displayed. For example, taking the first command in this section but change “cent” for “width” as below:
Click here for an image. The peak widths are displayed at 20% of the maximum height (default value), but the x-axis label and title are the same as the first centroid image. Usually, the values are pretty constant after 0 minutes for EX2 kinetics, but rise and fall for EX1 kinetics. In this case, the data is constant from 1-10 minutes, leading to an EX2 explanation of deuteration.
The text output for the values displayed in the graph is again in “x.env_temp” while the text output for all the populations, centroids and peak widths is again in “HD_analysis_output”. This file will be identical to the data for the same spectra except the peak width output will have the additional peak width information below the centroid value. The peak width is determined by the linear interpolation of the populations in the “%Weights” column. (Details about this format are described below under “Calculating the Centroids, Peak Widths and Populations).
The user may specify a percent to calculate the peak width by the option --percent_width x%. For example, taking the same sequence, and files with a peak width calculated at 40%, type:
Click here for an image. Notice the similarity in the pattern for both values at the same time points, except the higher width threshold (40% vs. 20%) has a lower width, as expected. If you are not sure what value is best for you data, you can have the two default peak widths (20% and 50%) as well as the user specified values displayed at once. To do this for the 4th sequence but only for the Observed data, add the “--width_output all” option to the line.
This will show only the “Observed” data for the three values (20%, 40% and 50%) in different colors and symbols. Click here for an image. You can now use this to choose a suitable value, though 20% is suggested unless there is an overlap. To read the complete list of options, click on Command-Line-Options webpage and look under “HD-analysis-script”.
Population graphs (coming soon)
The population graphs will be similar in function to the multiple graphs, but will show only the peak on an adjusted (normalized) scale. This will make it easy to see which populations are present in the data as well as see what values were used for the centroids and peak width calculations.
Calculating the Centroids, Peak Widths and Populations
The centroids and peak widths are determined by first determining the population of each peak from the beginning of the profile to the end of the significant peaks. The beginning almost always starts with the monoisotopic peak, but the end is variable. With good data and no overlapping spectra, the end is where the signal drifts to the baseline. When the signal never drifts to the baseline or there is another profile causing contamination, the program attempts to make the best estimate of the cutoff for the upper end of the profile. At times, the user, who may have extra information, may disagree with the cutoff determined by the program, and may need to compute a manual centroid or peak width. This can be easily done from the population values given in the output HD_analysis_output. For the first centroid command, here are the first 13 lines of the text output from the file HD_analysis_output:
File: time-0min.dat Sequence: DRIKTLGTGSF
OBSERVED DATA is below
Background Std.Dev. min max ALIGN_OFFSET interval
665.23 180.46 287.00 655.41 1.5000 0.1000
Peak Mass-Range Area Peak/Noise Percent %Weights
0 1194.5481 1194.8481 43403.30 16.3 27.60 27.94
1 1195.5481 1195.8481 56205.86 21.1 35.74 36.18
2 1196.5481 1196.8481 35312.77 13.3 22.45 22.73
3 1197.5481 1197.8481 16113.79 6.1 10.25 10.37
4 1198.5481 1198.8481 4332.52 1.6 2.75 2.79
5 1199.4481 1199.7481 1243.08 0.5 0.79
6 1200.4481 1200.7481 659.34 0.2 0.42
Centroid= 1.24 for all %Weights listed
The file and sequence are displayed first, followed by the note the Observed data is below. Information about the average background noise level is given on lines 3 and 4. The background noise is determined by averaging the signal values between the peaks over the entire spectrum. On line 5, the columns are named for lines 6-12. The first column “Peak” is the offset from the M(0). The offset of 0 is the monoisotopic peak. Columns 2-3 are the lower and upper bound, in mass units, for the individual peaks used for integration of each peak. If these numbers are not aligned properly, the resulting values may be off as well. Column 4 is the total “Area” of the peak above the average background noise. It is the integration of the points between the lower and upper bound range minus the average background noise. Column 5 is the “Peak/Noise” which is a signal of the quality of the data. It is the area divided by the average background noise times the number of points. Column 6, “Percent”, is the first attempt at determining the normalized percent weights for each peak offset. This column is quite loose with the upper bound of the envelope by design. It sometimes has all the peaks over the entire envelope for low Peak/Noise data. Column 7, “%Weights”, is the values from which the centroids, peak widths and populations are determined. Using the program’s tuned algorithm, it attempts to determine the proper lower and upper bound of the profile to calculate the accurate centroid value. Line 13 is the calculated “Centroid” value from the %Weights (column 7). This is the value used for the Grace graphs when displaying the centroids. At line 14, there is a break and then the next set of data is shown. If both the observed and deconvoluted data was specified, then the second profile will be the values for the same spectra as the first, but it will be the deconvoluted data (dashed red line on spectra graphs). Otherwise, it will be the next file and sequence.
If the user does not like the cutoff ranges used for the “%Weights” column, then the centroid values will necessarily be off to the extent of the disparity. In this example, if the user would want to include the 5th peak above the M(0), he/she would have to recalculate the centroid using the “Percent” column. Determining the first moment centroid for peaks 0-5 for this data would result in a centroid value of 1.27, which is quite close to the automatically calculated value of 1.24 in this case. Other times, the difference may vary considerably. Generally, the calculated values are very accurate for high to moderate quality data, but this declines with poor quality. It is suggested here that you check all the values both visually and for the lower and upper bound limits where appropriate.
When using the peak width option, there is additional information outputted to the HD_analysis_output file. For the first peak width command above, here is the extra output (lines 15-16) after what is displayed above for the centroids:
20% Percent linear width: From 0.000 to 3.414 width= 3.414 bumps= 0
20% Weights linear width: From 0.000 to 3.414 width= 3.414 bumps= 0
The peak width is found by finding the lower and upper bound where the linear interpolation of the profile is equal to the user-defined percentage of the maximum. Normally, this is about 20% of the maximum peak in the spectra. Line 13 is the linear interpolation of the “Percent” (column 6) showing the lower and upper values and the width. If there is a point within this lower and upper range that is below the threshold, there will be a bump. Here the linear interpolation of the profile is smooth and there is not bumps. Line 14 is the “%Weights” peak width, with the same information as the previous line. Since peak width is a way of removing noise, the two peak widths often have the same width, as is this case here. If the user uses the --width_output option to display other threshold percentages, those will also be displayed here before the next data set is displayed.
Looking for the old version of HD-analysis-script? As of May 23, 3006 it has been removed from the standard tutorial, and it is not being updated to reflect more sensitive centroid and population analysis. The program is still included in the DEX distribution under the name “HD-analysis-script-old”. You can see the documentation for that here.
Congratulations, you have now finished the tutorial! Hopefully all programs worked properly, and you are now on you way to running your own files and discovering new results. Good luck!
Written by Matthew Hotchko and last updated last on May 24, 2006
Please contact: Matthew Hotchko at ccms-help at sdsc.edu for questions regarding the DEX deconvolution package or questions about installing and running DEX or any problems with the software when running the example tutorial provided.
Back to the DEX home page