1 Sedimentation equilibrium programs Programs for Analysis of Sedimentation Equilibrium Data This is a minimalist Text version of the manual, with the figures and special formatting removed. A full version is available as MANUAL.RTF in RTF format, readable by MS WORD on both Macintosh and MS-DOS machines. Greg Ralston, Department of Biochemistry, University of Sydney c G.B. Ralston, 1994 Introduction This group of programs is intended for analysing data from the photoelectric absorbance optical system of the Beckman XL-A Analytical Ultracentrifuge, through the calculation of the distributions of the Omega function and apparent weight average molecular weight. These programs have been designed to be as closely compatible with one another as possible, and also to be compatible with the pre-existing programs for the XL-A. Data files generated by the programs described here have been designed, as far as possible, to be compatible with other programs developed by others for XL-A data. The programs will run on IBM compatible computers (XT, AT, and 386 or 486 machines). Although a maths coprocessor is not mandatory, its use will speed operation considerably. Although the programs will run without a printer connected, a printer is highly desirable. Graphics may be printed if the MS-DOS GRAPHICS program is loaded in advance, and the key combination for printing the graphics screen is used. All data files are saved in TEXT format, and may be viewed or printed with the DOS TYPE or PRINT commands. The hardware display must be at least of CGA resolution, but EGA and VGA are also supported and are preferred because of their higher resolution. Sections of the fitting programs related to assessing goodness of fit were closely based on programs written by Dr. Allen Minton, for whose help I am most grateful. Some test data is available for trying out the programs. The simulated data sets entitled TEST*.RA1, placed on the RASMB network by Dr. Allen Minton, have been used as a basis for preparing secondary data files with the "run number" apmn (e.g. MapmnA1.DAT). Experiment with these files and with real data. These programs are in a very preliminary stage, and I would greatly appreciate comments and feedback. The XL-A Programs Run identification The massive amount of data that can be collected in one experiment requires careful planning of file names. There are three cells, each potentially with three channels. Three different wavelengths can be examined in any one scan, and a number of scans can be made at different times: e.g., at 4 h intervals to check for equilibrium; at different temperatures; at different speeds. One set of data must be distinguished uniquely from another, in regard to the variables: cell, channel, wavelength, time, speed, sample and temperature. The XLA only allows 4 characters to designate a scan, appending 4 of its own to indicate cell number, Abs/Transm, and wavelength. However, once data have been selected for saving and further analysis, the XLA allows 8 characters to distinguish most of the run variables, with the characters in the extension automatically referring to Radial/Wavelength, Abs/Transm, Cell number. Absorbance measured in radial scans will be used exclusively in this work, so these options need not be distinguished. The following is suggested as a possible means for naming files that will be created by XLAMEN or XLAMW. Run number: 4 digits will take us quite a way into the future. At one run per day, it will take 30 years to exhaust the run numbers. After that, we can start again at 0001. Run conditions: All of the condition variables such as time, temperature, wavelength, etc., are going to be much more variable. It would be sensible to assign the letters "a" to "z" to arbitrary "scans" that might have different conditions (e.g., time, temperature, speed...), all of which will be recorded under the run number in the log book. Cell number: An additional digit (1-3) can be used to identify the cell number. Channel: The numerals 1-3 are used to designate the channel number. The data from three separate channels of a given cell may be stored sequentially in a single data file created by XLAMEN or XLAMW. Data from each channel is labelled with the channel number as part of the file header. File name: The overall file name should thus be along the following lines: MnnnnXY (.BLN) or OnnnnXY (.BLN) where 'M' stands for 'Molecular weight data', 'O' stands for 'Omega data', nnnn' designates the run number, 'X' the particular scan, and Y the cell number. The channel number is specified within the file as part of the header. The optional extension indicates that the data have been corrected for a baseline scan. While the M or O and .BLN are added automatically, the rest of the file name is left to the user. Raw Data Files: Considerable latitude exists here. It would be wise, however, to adhere to the principles above. Raw data file names may be built up as follows: DnnnnXY.RAZ where the 'D' indicates 'data', nnnn the run number, X the particular scan, Y the channel number and Z the cell number. (This can be confusing, however, because of the mobile nature of the cell indicator). If all the data scans from a single experiment are stored within a single sub-directory, there will be no need to distinguish the run number, and the conventions can be relaxed e.g.: NameA1.RAZ where name is a user-specified identifier, A can indicate 'Absorbance (uncorrected)', B can indicate 'Baseline correction', and C 'baseline-Corrected'. The Z identifier indicates the cell number, and is provided by the XL-A automatically. The 'Name' identifier should uniquely specify the particular scan conditions (time, temperature, speed, wavelength...). 1. XLABASE The XLABASE program optionally smooths the experimental and baseline scans, subtracts the interpolated, (smoothed) baseline values from the (smoothed) run scan, and optionally smooths the corrected scan. The baseline correction is normally made from A360 data, which is usually collected along with A280 data. The correction interpolates the baseline data when it lies between the run data points. The corrected run data are saved in the output file. The name of this file is left to the user, but the conventions listed above may help to avoid confusion. Inputs: Both run data and baseline data are read in as XL-A- compatible files; i.e., files with the extension '.RA2', etc, where the '2' stands for the cell number. It is advisable to distinguish the three sample channels of a six-channel cell; i.e. NAMEa1.RA2, NAMEb1.RA2 and NAMEc1.RA2 could specify the absorbance, baseline, and baseline- corrected data respectively for channel number 1 in experiment NAME. (See above under raw data files). The program assumes that all three sets of data will be on the same drive/directory. Having to specify the drive and directory for each one makes it awfully long winded when they are all on the same drive. Outputs: The output data from the baseline correction is itself in XL-A format, so that it can be operated on by the programs: XLAMW, XLAMEN, XLAEQ, EQASSOC, etc. Process: The BASELINE program interpolates the A360 data on to the radial positions of the A280 scans and subtracts the optionally smoothed values from the optionally smoothed run file. The baseline corrected values are stored in a designated output file. Operation: The program asks for the names of the files containing the scan (e.g., A280) data, the baseline (e.g., A360) data, and the corrected data. The option of smoothing is offered, but it is probably best not to use it under normal circumstances. Smoothing tends to introduce ripples into the data as noise is propagated through adjacent data. If smoothing is not selected, the program displays graphs of the raw, baseline and corrected data. If smoothing is selected, the raw and smoothed data are each displayed before subtraction. Smoothing may be applied to the run data, the baseline data, or the corrected data, or any combination. After correction, the three files are displayed graphically, and it can be seen that non-random irregularities in both the data and baseline are removed during the correction. The standard errors of the absorbance measurements are preserved in the corrected files. 2. XLAMW This program allows the analysis of sedimentation equilibrium experiments through the use of absorption measurements. Input files may be baseline-corrected or uncorrected. If the input file has been corrected, the output files will automatically be designated by the .BLN extension, otherwise no extension is added by the program. Inputs: Some user-input data are requested by the program: Name, date, v-bar, solvent density, centrepiece thickness (in mm), absorbance coefficient (in units of cm L/g; i.e. the absorbance of a 1 g/L solution in a 1 cm cuvette) and the number of channels. If the absorbance coefficient is not known, use the value of 1.0, and all subsequent data will be in the scale of absorbance (corrected for solution thickness). The individual file names of the data files for the separate channels (e.g., C1.RA2, C2.RA2, C3.RA2) are requested in a separate input screen. These files are required to be in the format of the raw data from the XL- A, although the program is tolerant of the exact format for real numbers. Outputs: The output files comprise the molecular weight data (Mnnnn files) and Omega data (Onnnn files). The output files can contain subfiles for up to three separate channels of a multichannel cell. File names for output are constructed automatically as follows: MnnnnA1 (.BLN) or OnnnnA1 (.BLN) where the 'O' specifies Omega data and 'M' specifies molecular weight data. The identifier nnnn is a user- specified run number. This is best related to the run number in the log books, thereby unambiguously defining the run and all its conditions. The 'A' (or any other character) is a user-specified character that identifies the particular scan in the run. This should also be recorded in the log book. The '1' in this case specifies the cell number (no longer to be part of the extension in subsequent files). The '.BLN' signifies that the data have been baseline- corrected. Operation: The concentrations are used to determine the point-average weight-average molecular weights and the Omega function at each measured point. (Note that 15 points are required for linear regression of ln(c) vs r2. Because of this, the point-average molecular weights of the first and last 7 points are not available). The option of subtracting a constant baseline allows a correction for residual absorbance of solvent components that is not removed by dialysis. The value of this residual absorbance can be determined by a brief spin at high speed to clear the meniscus of macromolecules. Alternatively, prior investigation with one of the programs that fits models to the c vs r distribution may have indicated a baseline offset value. A graph of c vs r is initially displayed, to allow the user to inspect for bad points that may exist, often at the ends of the solution column, and which may arise from attempting to read data before the meniscus and beyond the bottom of the cell. Then a graph of lnc vs r2 is displayed, allowing the user to trim away possible bad data at the ends with the use of the function keys: The function keys are used to move the cursors one at a time; the left cursor is checked first. F1 and F8 allow big steps (100 points) to the left and right, respectively. F2 and F7 allow moderate steps (10 points) to the left and right, respectively, and F3 and F6 allow small steps (1 point at a time) to the left or right, respectively. Pressing Enter accepts the cursor position. If a mistake is made, the user has a chance to recover; the user is asked whether the cursor positions are correct before proceeding. Y allows the process to continue; N allows the user to redefine the cursor positions. NB: The Enter or Return key is required after these entries! Once the valid data have been defined by the cursors, the entire valid set of ln c vs r2 data are used to estimate the weight-average molecular weight for the whole cell. This estimate would only be meaningful for a single, non- associating ideal solute. The point-average weight-average molecular weight data are calculated by means of a sliding 15 point linear fit. Because of the differentiation step, these plots are frequently noisy. Omega Analysis: The operator is asked whether an Omega analysis is required. Reply with Y or N. NB: The Enter or Return key is required after this entry! If an Omega calculation is required, the appropriate protomer molecular weight is needed, together with a value for the reference concentration. Since the reference radius is interpolated from the data, it is important that the reference concentration actually be encompassed by the data. To ensure this, the user is prompted with the range of accessible concentrations. With the input M1 and c data, the Omega distribution is computed and displayed. Note that in order to check for overlap of the Omega function distribution in the three channels, it is essential that the same reference concentration be chosen in all channels (But see Morris, M. Ph.D. thesis, Sydney University, 1985). If the data are satisfactory, the user is prompted to save them. However, it may be that the effects of a different reference concentration or protomer molar mass need to be examined. The user may see the effects of these changes, and may save the new data, or may wish not to save them. Type Y to save, N not to save. NB: The Enter or Return key is required after this entry! Once the data from one channel have been analysed, the next file of data that was specified in the input list of files is loaded, and the process is repeated. 3. XLAMEN This program allows the analysis of meniscus depletion sedimentation equilibrium experiments (Yphantis, 1964). This method is especially useful if the sample contains very low molecular weight material with a significant absorbance (such as oxidised dithiothreitol), that may not be exactly matched in the reference sector. In analysing meniscus depletion experiments, the residual absorbance at the meniscus is subtracted from the data throughout the rest of the cell. Input files may be baseline-corrected or uncorrected. If the input file has been baseline-corrected, the output file will be automatically designated by the .BLN extension; otherwise no extension is added. Inputs Some user-input data are requested by the program: Name, date, v-bar, solvent density, centrepiece thickness (in mm), absorbance coefficient (in units of cm L/g; i.e. the absorbance of a 1 g/L solution in a 1 cm cuvette) and the number of channels. If the absorbance coefficient is not known, use the value of 1.0, and all subsequent data will be in the scale of absorbance (corrected for solution thickness). The individual file names of the data files for the separate channels (e.g., C1.RA2, C2.RA2, C3.RA2) are requested in a separate input screen. These files are required to be in the format of the raw data from the XL- A, although the program is very tolerant of the exact format of real numbers. Outputs: The output files comprise the molecular weight data (Mnnnn files) and Omega data (Onnnn files). The output files can contain subfiles for up to three separate channels of a multichannel cell. File names for output are constructed automatically as follows: MnnnnA1 (.BLN) or OnnnnA1 (.BLN) where the 'O' specifies Omega data and 'M' specifies molecular weight data. The identifier nnnn is a user- specified run number. This is best related to the run number in the log books, thereby unambiguously defining the run and all its conditions. The 'A' (or any other character) is a user-specified character that identifies the particular scan in the run. This should also be recorded in the log book. The '1' in this case specifies the cell number (no longer to be part of the extension in subsequent files). The '.BLN' signifies that the data have been baseline- corrected. Operation The program determines the mean ordinate of the flat section of the concentration distribution at the meniscus, and subtracts this value from all subsequent points. Those points showing significant absorbance are then stored. The meniscus-subtracted concentrations are then used to determine the point-average weight-average molecular weights and the Omega function at each measured point. (Note that 15 points are required for linear regression of ln(c) vs r2. Because of this, the point-average molecular weights of the first and last 7 points are not available). A graph of c vs r is initially displayed, to allow the user to inspect for bad points that may exist, often at the ends of the solution column, and which may arise from attempting to read data before the meniscus and beyond the bottom of the cell. The program allows some correction for the meniscus concentration not being exactly zero (provided that this value is not excessive; Teller, 1973). The curve of ln(c) vs r is extrapolated to the meniscus, to yield a revised value of c0 at the meniscus. This value is added to all concentrations and the process repeated as often as necessary. Usually, two iterations is sufficient; continued iteration may lead to divergence. In choosing which part of the curve to extrapolate, bear in mind that it is the region closest to the meniscus that will be affected most by errors in the meniscus concentration. Do not extend the chosen range too far down into the region near the meniscus showing obvious curvature (see the figure below). Secondly, toward the bottom of the cell, association or high molecular weight species will cause the plot to curve upwards. Avoid allowing this region to influence the extrapolation. Once the meniscus correction is satisfactory, a graph of lnc vs r2 is displayed, allowing the user to trim away possible bad data at the ends with the use of the function keys: The function keys are used to move the cursors one at a time; the left cursor is checked first. F1 and F8 allow big steps (100 points) to the left and right, respectively. F2 and F7 allow moderate steps (10 points) to the left and right, respectively, and F3 and F6 allow small steps (1 point at a time) to the left or right, respectively. Pressing Enter accepts the cursor position. If a mistake is made, the user has a chance to recover; the user is asked whether the cursor positions are correct before proceeding. Y allows the process to continue; N allows the user to redefine the cursor positions. NB: The Enter or Return key is required after this entry. Once the valid data have been defined by the cursors, the entire valid set of ln c vs r2 data are used to estimate the weight-average molecular weight for the whole cell. This estimate would only be meaningful for a single, non- associating ideal solute. The point-average weight-average molecular weight data are calculated by means of a sliding 15 point linear fit. Because of the differentiation step, these plots are frequently noisy. In meniscus depletion experiments, the concentration of solute approaches zero at the meniscus, and the number- average molecular weight may also be obtained (Yphantis, 1964). These data are available in the output files from XLAMEN (i.e., the Mxxxx files). . Omega Analysis: The operator is asked whether an Omega analysis is required. Reply with Y or N. NB: The Enter or Return key is required after this entry. If an Omega calculation is required, the appropriate protomer molecular weight is needed, together with a value for the reference concentration. Since the reference radius is interpolated from the data, it is important that the reference concentration actually be encompassed by the data. To ensure this, the user is prompted with the range of accessible concentrations. With the input M1 and c data, the Omega distribution is computed and displayed. Note that in order to check for overlap of the Omega function distribution in the three channels, it is essential that the same reference concentration be chosen in all channels. If the data are satisfactory, the user is prompted to save them. However, it may be that the effects of a different reference concentration or protomer molar mass need to be examined. The user may see the effects of these changes, and may save the new data, or may wish not to save them. Once the data from one channel have been analysed, the next file of data that was specified in the input list of files is loaded, and the process is repeated. 4. MWMERGE Point-average molecular weight data obtained from XLAMW or XLAMEN may be collected from several channels that employ different loading concentrations; e.g., the use of the Yphantis 6-channel cell gives three sets of data. These three sets of data may all be pooled and sorted in increasing concentration by the program MWMERGE. The output file is designated with the suffix .MGE. The merged data are plotted at the completion of the sorting process, to give an impression of the degree of overlap. 5. OMMERGE Omega data obtained from XLAMW or XLAMEN may be collected from several channels that employ different loading concentrations; e.g., the use of the Yphantis 6- channel cell gives three sets of data. These three sets of data may all be pooled and sorted in increasing concentration by the program OMMERGE. The output file is designated with the suffix .MGE. The merged data are plotted at the completion of the sorting process, to give an impression of the degree of overlap. 6. PREAVMW The pooled and sorted Mw data in the .MGE files can be averaged in groups of 2 to 40 points, and the mean value of Mw, together with the standard error of that mean, is tabulated at the mean value of c for each group. The output file is designated with the suffix .DAT. This approach is only meaningful if there is acceptable overlap of the several channels (as evidenced, for example, from the graphical output of the program MWMERGE or SHOWMW). The number of points averaged for each group may be specified (between 2 and 40 points). Once the pooled and sorted data have been smoothed by box-car averaging, the mean Mw ñ SE at the mean c is saved to file, and a plot is shown on the screen. An output file in a format acceptable to Cricket Graph is also saved, so that the data may be printed in graphical form. 7. PREAVOM The pooled and sorted Omega data from the .MGE files can be averaged in groups of 2 to 40 points, and the mean value of Omega, together with the standard error of that mean, is tabulated at the mean value of c for each group. The output file is designated with the suffix .DAT. This approach is only meaningful if there is acceptable overlap of the several channels (as evidenced, for example, from the graphical output of the program OMMERGE. The number of points averaged for each group may be specified (between 2 and 40 points). Once the pooled and sorted data have been smoothed by box-car averaging, the mean Omega ñ SE at the mean c is saved to file, and a plot is shown on the screen. An output file in a format acceptable to Cricket Graph is also saved, so that the data may be printed in graphical form. 8. PLOTMW The data in the files of preaveraged Mw data may be shown graphically on the screen. This program reads and displays the data in the .MGE and .DAT files. With the data from .DAT files, the plot shows error bars representing ñ the SEM. The data from the .MGE files are shown in only a single colour; once the data are merged, each point loses the identity of the channel from which it came. 9. PLOTOM The data in the files of preaveraged Omega data may be shown graphically on the screen. This program reads and displays the data in the .MGE and .DAT files. With the data from .DAT files, the plot shows error bars representing ñ the SEM. The data from the .MGE files are shown in only a single colour; once the data are merged, each point loses the identity of the channel from which it came. 10. SHOWMW This program displays on the screen the multiple sets of Mw data from different loading concentrations (either from the .BLN files, or from the uncorrected data) so that overlap may be assessed. The different sets are designated with different colours on screens capable of displaying colour. 11. SHOWOM This program displays on the screen the multiple sets of Omega data from different loading concentrations (either from the .BLN files, or from the uncorrected data) so that overlap may be assessed. The different sets are designated with different colours on screens capable of displaying colour. Nonlinear Regression Programs The nonlinear regression programs in this package are based on the Marquardt algorithm, as implemented by Duggleby (1984). The original BASIC program of Duggleby has been rewritten for Turbo Pascal as the core of the present set of programs. Procedures for calculating the functional dependence of the appropriate dependent variable on independent variable have been written for a number of simple models, in which the number of equilibrium constants is kept to a minimum. Nonideality can be taken into account through the use of a single second virial coefficient, assumed to be the same (in units of L mol g-2) for all oligomers (The Adams-Fujita approximation; Adams and Fujita, 1963). DUGOM Fitting to the Omega Function In the case of the Omega function, the reference concentration, as well as the concentration at each of the data points is subject to experimental error. Assuming that the reference point is known with absolute accuracy leads to a propagation of systematic error through the whole data set: the Omega function is thereby constrained to pass through the point (cref, 1.0). In order to avoid this bias, the 'true' value of the reference concentration must be treated as a parameter to be estimated at each round of iteration. For each updated set of parameters (cref, the various K and B), a1,ref, the thermodynamic activity of the protomer at the reference radius, is calculated through the use of the appropriate Func1 procedure (specific for the relevant model). With the updated value of a1,ref, together with the current set of cref,K and B, the set of Omega values is calculated over the domain of c through the use of the procedure Func (also specific for the relevant model). The following chart illustrates the process: Read data Select model Input estimates of parameters Calculate a1,ref, the activity of the protomer at cref, the . current reference concentration Use the current parameter estimates and a1,ref to evaluate the object function for each point (c, Omega) If Chi-squared is reduced, revise the parameter estimates and switch towards the Gauss-Newton algorithm for the next round; if Chi-squared is not reduced, switch towards steepest descent, take a smaller step and re-revise the parameter estimates. For the simpler models, such as Monomer-Dimer and Isodesmic (SEK I), the object function is an explicit function of the independent variable, c, and can be calculated directly. For the cooperative isodesmic (SEK III) model (Adams et al., 1978), however, the protomer concentration is required as a function of the total concentration, but is only available as an implicit function. The Newton-Raphson algorithm may be used to find c1 in these cases, but it is essential that starting estimates of the root be chosen such that the process converges on the correct root. The process may be carried out incrementally. At very low concentrations, c1 will be near zero, and zero is thus an appropriate starting estimate for the root, c1. At subsequent points, provided that the concentration increases monotonically, the previous root is an appropriate starting estimate for the next. Since the object functions for the isodesmic and cooperative isodesmic models are closed only when kiso .c1 <1, it is important to check at each solution whether c1 is indeed less than 1/kiso; if not, it is likely that the correct root has been bypassed; the starting estimate for the root should be reduced, and the Newton-Raphson procedure should be re- started. It has been found that, for the SEK III model, a bisection approach to the root in the interval 0 - 1/k will avoid the incorrect root, but is exasperatingly slow. Operation In running these programs, initial estimates are required for the parameters. In addition, it is possible to fix the values of these parameters ('masking') so that they do not change after each iteration. It may be desirable to constrain possible parameter values to certain limits (e.g., concentrations and equilibrium constants should be positive quantities). However, while a negative value of k or B is physically meaningless, there may be times when the estimated quantity returned from fitting is near zero. In these cases, it may be wise to allow the final parameter value to take on a small negative value (often less in magnitude than its uncertainty) to aid convergence to a final set of parameters. Without allowing a small negative value, the algorithm constantly tries to find only positive solutions, and this slows down convergence and may bias the estimation of other parameters. Recall that the reference concentration is a parameter to be estimated. Its measured value should be a good starting estimate. However, the returned value should not differ by more than a few percent from the experimentally measured value. Larger drifts in the returned value of cref may indicate that the chosen model is inappropriate. It should be noted that the equilibrium constants, k, estimated in these programs are the quantities Ki/M1 ; i.e., the values in the molar scale, expressed in the g/L scale. These values are related to the quantities in the g/L scale, k(L/g): k(L/g) = i Ki/M1 Below is an example run of the program DUGOM. If a .DAT file is selected, the SEM data may be used to provide an estimate of the variance for the purposes of statistical weighting. Do you want to use weighting? [y/n] y Input std. dev. ? [y/n] y Do you have a printer connected? [y/n]y At this point, the relevant model may be selected from the list below. (This list does not match that from DUGMW, either in cardinal or ordinal number!........real soon...) Please select the model... Do you want to test: 1 : Monomer-Dimer ? 2 : Monomer-nmer ? 3 : SEK1 4 : SEK3 ? 5 : SEK4 ? 6 : SEK5 ? 7 : AK1 ? 8 : AK4 ? 9 : Dimer-Tetramer-Hexamer A : Dimer-Tetramer-Hexamer-Octamer B : Monomer-Dimer-Tetramer C : Single Nonideal Species Please indicate [1-C], and press RETURN... 3 The data may be trimmed by moving the left and right cursors with the function keys. F1 moves the cursor 100 points leftwards, F2 moves it 10 points leftwards and F3 moves it one point at a time leftwards. F6 moves the cursor right one point at a time, F7 moves it 10 points at a time, and F8 moves it 100 points at a time. Start with the left cursor, and trim away excessively noisy or otherwise dubious points. Press the return key when you are satisfied with the cursor position. Repeat for the right hand cursor. The program then indicates the number of 'active' parameters, and calculates the initial sum of squares of residuals: 3 parameters Initial sums of squares = 1219.6763 During the fitting process, information on the fitting is provided to the user: the updated parameter values, the current sum of squares (so that it can be checked that the sum of squares is indeed decreasing!), and the Marquardt Lambda, indicating the success of the optimization. For each successful iteration, lambda is reduced. If during an iteration, the sum of squares increases, lambda is increased, and smaller steps are taken in a 'more steepest descent' mode. You have the option of watching successive iterations graphically; it slows down the action a little, but you get to see what is going on. If all is going well, the fitted line should move gradually to fit the points better and better. If it doesn't, then: 1) The algorithm may be stuck in a "local minimum"; this means that in order to find a lower sums of squares, the program will have to allow the sums of squares to increase for a step or two, but this is not allowed. A way around this is to try slightly different sets of initial parameter estimates. Try values both above and below those you first tried. If you eventually find the parameters of "best" fit, you should be able to converge on the same values both from below and from above. 2) The model may not be the most appropriate; the "best" fit may not be a "good" fit. If you can't improve by trying new initial parameter sets, try another model. 3) The parameter value may be stuck in a region that is insensitive. For example, if the equilibrium constant is extremely large, there will be effectively no monomer detectable, and the algorithm cannot distinguish its current large value of K from any other large value of K. Similarly, if B is set to zero, its value may not be able to change; try values like 1e-8 instead. After 10 iterations, the operator has the option of trying new parameter estimates (if the initial ones seem to cause the algorithm to get stuck), continuing with the process (if things seem to keep improving), or giving the whole thing away as a bad job! Stopped after 10 iterations continue [c] new parameters [n] accept current estimates [a] quit [q] Enter letter [c,n,a,q] : c If all is well, several rounds of continuation may result in satisfactory convergence. However, sometimes the process gets to a point where the sum of squares doesn't improve, yet the parameter values don't settle down sufficiently within the set tolerance. This can happen with noisy data and a model in which parameters are highly correlated. It may be best in these circumstances to accept the current set, if the fit looks visually satisfactory. The final parameter values are : k12: 1.764E+00 +/- 3.771E-02 B(L mol/ g^2): 2.404E-07 +/- 5.308E-08 cref: 1.739E-01 +/- 8.463E-04 Sum of squares is : 2.392E+02 Large drifts in the returned value of cref may indicate that the chosen model is inappropriate. After a successful convergence, statistics on the goodness of fit are provided. If a printer is connected, the fitted data, the parameters and their approximate (asymptotic) standard errors, and some statistics about the fit will be printed out. It is also possible to print out the covariance and correlation matrix. Runs test indicates >5% probability of randomly distributed residuals. In our laboratory we find it useful to have an output file that can be read and plotted by Cricket Graph: Do you want a cricket graph output file? [y/n] n WTDUGMW and DUGMW. These two programs have separate existences because of the file structures of .DAT, .MGE and .BLN files. The .BLN files contain information on r2, lnc and Mn that are not carried by the .DAT and .MGE files. Consequently, DUGMW is provided to allow fitting to the individual channels in a .BLN file. For fitting to merged or preaveraged data, WTDUGMW must be used, since this program does not expect to find those additional data. WTDUGMW can also use the SEM data in the .DAT files to perform a weighted fit. The Omega Function Although recent trends justly favour analysis of association reactions in terms of the c vs r distribution, there are good reasons to consider the distribution of apparent weight average molecular weight as a function of concentration. Firstly, the behaviour of this distribution often (but not always) gives a clear indication of the type of self-association behaviour that is occurring, and whether nonideality is evident; to the eye, the exponential shape of the c vs r distribution does not reveal a great deal. Secondly, in a purely associating system, at equilibrium, and with all macromolecular solute capable of taking part in the reactions, the distribution of Mw,app is a well-defined function of the total concentration of the solute and the various equilibrium constants; the concentration loaded into the cell, the radial position at which measurements were made, and the speed of rotation do not influence the Mw,app distribution. On the other hand, for a system of two or more non-reacting, independent solutes, the distribution of Mw,app with concentration will be influenced by these operational variables. As a consequence, Mw,app data obtained from a purely associating system at equilibrium, over a range of different loading concentration, speeds, and radial positions will all lie on a single continuous curve, whose shape will depend on the values of the various equilibrium constants and virial coefficients. If the reacting system has not attained equilibrium, if there are species present that are for some reason incapable of taking part in the reaction, or if there are gross contaminants present, the Mw,app vs c data will not lie on a single continuous curve. In principle then, the Mw,app distribution from several experiments can give important information on the attainment of equilibrium and the presence of contaminants, as well as the parameters of association. In practice, however, this information is often compromised by experimental error. In the calculation of the point-average weight average molecular weights, the original c vs r data must be transformed and differentiated. In the process, the noise in the original experimental data is increased and its distribution is distorted. This effect is illustrated below with data calculated from simulated c vs r distributions for (a) a purely associating system, and (b) a mixture of two different, non-interacting species. The original simulated distributions were computed by Dr. Allen Minton as TEST1xx.RA1 and TEST2xx.RA1 (where xx designates the loading concentration in g/L) and placed on the RASMB file server. While it appears that the data in (a) might lie on a single continuous curve, the noise inherent in the data make this conclusion unconvincing. Real experimental errors, which tend not to be purely random, but often are correlated or contaminated with systematic errors, make matters even worse. The Omega function was introduced by Milthorpe, Jeffrey and Nichol, as a means of avoiding the differentiation required in computing Mw,app. Omega was conceived as the ratio of the actual measured concentration at any point in the centrifuge cell, to the thermodynamic activity of a single, non-associating ideal monomeric species whose activity was equal to the concentration of the real solute at some chosen reference position (ref, c(ref)). Since the distribution of such an ideal species is given by: a1(r) = a1(ref) exp (fM1(r^2 - ref^2)) (1) where f = (1- vr)w^2/2RT, (2) and when a1(ref) is set to c(ref), it follows that: Omega = c(r)/a1(r) = c(r) exp (f M1(ref^2 - r^2)) / c(ref) (3) Since the thermodynamic activity of a monomer in an association equilibrium in a centrifuge cell will be distributed according to eq (1), it follows that, when a1(ref) refers to the actual thermodynamic activity of the real monomer in equilibrium at the reference concentration: Omega = a1(ref)c(r) / a1(r)c(ref) (4) Milthorpe et al. (1975) showed that Omega0, Omega extrapolated to zero concentration, yields a1(ref)/c(ref). Since we are able to choose c(ref), we may calculate a1(ref), the thermodynamic activity of the monomer at the reference point, from which the thermodynamic activity of the monomer at any other point could be determined: a1(r) = a1(ref) exp (f M1 (r^2-ref^2)) = a1(ref) c(r) Omega(r) c(ref) (5) Morris and Ralston (1985) showed that the Omega distribution, like the distribution of Mw,app, was itself a continuous and unique function of the total concentration for a self-associating solute, and that the parameters of self- association could be estimated from fitting the Omega vs c distribution. Importantly, because the Omega distribution is not distorted by differentiation, it is much easier to detect heterogeneity or failure to attain equilibrium. Compare the curves (a) and (b) below, each derived from the same set of simulated data as shown in the corresponding figures above. It is clear, even in the presence of random noise, that the data of curve (a) come from an reversible self-association, and that the data of curve (b) indicate heterogeneity. The Omega function is sensitive to the presence of both high molecular weight and low molecular weight contaminants. Low molecular weight contaminants (e.g. from proteolysis) result in an abrupt turning up of the Omega curve as the concentration approaches zero (Ralston and Morris, 1992). It is interesting to note in this context that, while estimation of the true parameters of self-association requires knowledge of M1, the test of homogeneity does not require such prior knowledge. Any reasonable estimate of M1 may be made, providing that the same value is used for all sets of data that are compared. Extrapolation of the Omega function to zero concentration may be aided by fitting the distribution with appropriate models. Over a limited concentration range near zero, the exact form of the model may not be important, as at low concentrations, the contribution of higher oligomers will be small. The parameters returned from a good fit allow computation of a1(ref) - equivalent to extrapolation to zero concentration. With this value available, it is possible to calculate the distribution of thermodynamic activity as a function of total concentration. This has recently been done in the case of human erythrocyte spectrin (Ralston, 1994), for which independent experiments had provided concentrations of several associated species as a function of total concentration. Combination of the two sets of data allowed an assessment of the activity coefficient of the protomer of spectrin as a function of the total concentration. The present group of programs provides the means of fitting the Omega distributions (calculated from the radial distribution of concentration) with various reaction models, both discrete (i.e., a limited number of species) and indefinite (i.e., reaction schemes in which monomer can add to pre-existing oligomers without limitation in the size of oligomer). The specific models and the relevant equations are listed below. The Models Omega For all models, the monomer concentration is calculated from the total concentration of the macromolecular component and the various equilibrium constants of the reactions, i.e. by solving the equation: c = f(c1, k1...kn) (Note that this presupposes that the equilibrium constants are unaffected by nonideality - a consequence of the Adams-Fujita approximation). For some of the simpler models, an explicit solution may be derived; for others, c1 must be solved by means of a Newton-Raphson process. The monomer activity is calculated from the monomer concentration, the total concentration of the associating component and the second virial coefficient, by assuming the Adams-Fujita approximation: a1 = c1 exp(BM1c) From the first pass with a new set of parameters, a1ref, the activity of the monomer at the reference point, is calculated from the value of c = cref. With a value of a1ref available, Omega may be calculated at any other radial position from the relationship (Milthorpe et al., 1975): Omega = c a1ref / a1 cref. For each model, then, the specific parameters and equations are those that relate c to c1. Single, nonideal, nonassociating species: At all concentrations, c1 = c. Monomer-Dimer c = c1 + 2k2c1^2 c1 = (-1 + sqrt(8.0 k2 c + 1))/ (4*k2) (k2 = K2/M1) Monomer-n-mer c = c1 + n kn c1^n (kn = Kn /M1) Monomer-dimer-trimer c = c1 + 2k2c1^2 + 3k3 c1^3 (ki = Ki/M1) Monomer-dimer-trimer-tetramer c = c1 + 2k2c1^2 + 3k3 c1^3 + 4 k4c1^4 (ki = Ki/M1) Isodesmic (SEK I) c = c1/(1-kc1)^2 (k = K /M1) c1 = (2kc + 1 - SQRT(4kc +l ))/ 2k^2c Cooperative Isodesmic (SEK III) c = c1 [1 + k2c1((2-kc1)/(1 - kc1)^2)] (if kc1 < 1) SEK IV 2A = A2 K2 A2 + A2 = A4 ; A2 + A4= A6 ; ....... K c = c1 [1 + 2 k2 c1(1 - k2kc1^2)^2] (k = K/M1 , k2 = K2/M1) SEK V 2A = A2 K2 A2 + A2 = A4 K24 A2 + A4 = A6.... K c = c1 + 2k2(1 + k24c)c1^2 - 2 k2k24 c1^3 - k22k242cc1^4 + k22k242c1^5 Attenuated Indefinite (AK I) c = c1 exp (k c1) References Adams, E.T. and Fujita, H. (1963) Sedimentation equilibrium in reacting systems: Ultracentrifugal analysis in theory and experiment (J.W. Williams, ed) Academic Press, New York. Adams, E.T., Tang, L.-H., Sarquis, J.L., Barlow, G.H., and Norman, W.M. (1978) Self-association in protein solutions. Physical Aspects of Protein Interactions (N. Catsimpoolas, ed) Elsevier/North-Holland, Amsterdam p 1. Duggleby, R.G. (1984) Regression analysis of nonlinear Arrhenius plots: an empirical model and a computer program. Comp. Biol. Med. 14, 447- 455. Milthorpe, B.K., Jeffrey, P.D. and Nichol, L.W. (1975) The direct analysis of sedimentation equilibrium results obtained with polymerising systems. Biophys. Chem. 3, 169-176. Morris, M. and Ralston, G.B. (1985) Analysis of sedimentation equilibrium through direct fitting of the Omega function. Biophys. Chem.. 23, 49- 61. Ralston, G.B. and Morris, M.B. (1992) The use of the Omega function for sedimentation equilibrium analysis. In Analytical Ultracentrifugation in Biochemistry and Polymer Science (S. E. Harding, A.J. Rowe and J.C. Horton, eds, Royal Society of Chemistry, Cambridge) pp 253-274. Ralston, G.B. (1994) The concentration dependence of the activity coefficient of the human spectrin heterodimer. A quantitative test of the Adams- Fujita approximation. Biophys. Chem. in press. Teller, D.C. (1973) Characterisation of proteins by sedimentation equilibrium in the analytical ultracentrifuge. Methods in Enzymology 23, 346-441. Yphantis, D.A. (1964) Equilibrium ultracentrifugation in dilute solutions. Biochemistry 3, 297-317. System Requirements The following system is recommended for running these programs: ù MS-DOS computer (a 286 processor is recommended as a minimum configuration; significant enhancement in speed will be experienced with a 386 or 486 processor). ù A minimum of 640 k RAM ù Hard Disk Drive with at least 1 Mbytes free space. ù Printer connected to the parallel port (Optional). ù A maths co-processor is highly recommended, but not essential. ù Monitor and graphics controller of at least CGA resolution. ù {A mouse is not yet implemented }. ù MS-DOS, version 5 or later. Installation Data are assumed to be located on the hard disk, drive C:, although the user can override this default. No specific directory is assumed. For the graphics, the Borland Graphics Interface should be present in the same directory as the programs. These are the *.BGI files. The programs may be loaded and run individually, or, if they all reside on the same drive with the *.BGI files and the menu program, OMMENU, that menu program will allow the user to select the appropriate programs, one by one, with a brief description of their functions. Control is returned to the menu program after each program finishes, and the menu program can be terminated by the choice of QUIT. The programs were written in Borland's Turbo Pascal version 6.