1		

		
		Sedimentation equilibrium programs

                                                       
Programs for Analysis of 

Sedimentation Equilibrium Data


This is a minimalist Text version of the manual, with the figures and special formatting removed. A full version is available as MANUAL.RTF in RTF format, readable by MS WORD on both Macintosh and MS-DOS machines.


Greg Ralston, 

Department of Biochemistry, 

University of Sydney


c G.B. Ralston, 1994


Introduction


This group of programs is intended for analysing data from the photoelectric 

absorbance optical system of the Beckman XL-A Analytical Ultracentrifuge, 

through the calculation of the distributions of the Omega function and apparent 

weight average molecular weight.


These programs have been designed to be as closely compatible with one 

another as possible, and also to be compatible with the pre-existing programs 

for the XL-A. Data files generated by the programs described here have been 

designed, as far as possible, to be compatible with other programs developed 

by others for XL-A data.


The programs will run on IBM compatible computers (XT, AT, and 386 or 

486 machines). Although a maths coprocessor is not mandatory, its use will 

speed operation considerably. Although the programs will run without a 

printer connected, a printer is highly desirable. Graphics may be printed if the 

MS-DOS GRAPHICS program is loaded in advance, and the key combination 

for printing the graphics screen is used.  All data files are saved in TEXT 

format, and may be viewed or printed with the DOS TYPE or PRINT 

commands. The hardware display must be at least of CGA resolution, but EGA 

and VGA are also supported and are preferred because of their higher 

resolution.


Sections of the fitting programs related to assessing goodness of fit were 

closely based on programs written by Dr. Allen Minton, for whose help I am 

most grateful.


Some test data is available for trying out the programs. The simulated data sets 

entitled TEST*.RA1, placed on the RASMB network by Dr. Allen Minton, 

have been used as a basis for preparing secondary data files with the "run 

number"  apmn (e.g. MapmnA1.DAT).  Experiment with these files and with 

real data. These programs are in a very preliminary stage, and I would greatly 

appreciate comments and feedback.


The XL-A Programs


Run identification 


The massive amount of data that can be collected in one experiment requires 

careful planning of file names. There are three cells, each potentially with 

three channels. Three different wavelengths can be examined in any one scan, 

and a number of scans can be made at different times: e.g., at  4 h intervals to 

check for equilibrium; at different temperatures; at different speeds.  One set 

of data must be distinguished uniquely from another, in regard to the 

variables: cell, channel, wavelength, time, speed, sample and temperature.


The XLA only allows 4 characters to designate a scan, appending 4 of its own 

to indicate cell number, Abs/Transm, and wavelength. However, once data 

have been selected for saving and further analysis, the XLA allows 8 

characters to distinguish most of the run variables, with the characters in the 

extension automatically referring to Radial/Wavelength, Abs/Transm, Cell 

number.


Absorbance measured in radial scans will be used exclusively in this work, so 

these options need not be distinguished.


The following is suggested as a possible means for naming files that will be 

created by XLAMEN or XLAMW.


Run number:  	4 digits will take us quite a way into the future. At one run 

per day, it will take 30 years to exhaust the run numbers. 

After that, we can start again at 0001.


Run conditions: All of the condition variables such as time, temperature, 

wavelength, etc., are going to be much more variable. It  

would be sensible to assign the letters "a" to "z" to arbitrary 

"scans" that might have different conditions (e.g.,  time, 

temperature, speed...), all of which will be recorded under 

the run number in the log book. 


Cell number:  	An additional digit  (1-3) can be used to identify the cell 

number.


Channel:	The numerals 1-3 are used to designate the channel number. 

The data from three separate channels of a given cell may 

be stored sequentially in a single data file created by 

XLAMEN or XLAMW. Data from each channel is 

labelled with the channel number as part of the file header.


File name:	The overall file name should thus be along the following 

lines:


	MnnnnXY (.BLN)  or OnnnnXY (.BLN)

	where 'M' stands for 'Molecular weight data', 'O' stands 

for 'Omega data', nnnn' designates the run number, 'X' the 

particular scan,  and Y the cell number. The channel 

number is specified within the file as part of the header. 

The optional extension indicates that the data have been 

corrected for a baseline scan. While the M or O and .BLN 

are added automatically, the rest of the file name is left to 

the user.


Raw Data Files:	Considerable latitude exists here. It would be wise, 

however, to adhere to the principles above. Raw data file 

names may be built up as follows:


	DnnnnXY.RAZ


	where the 'D' indicates 'data',  nnnn  the run number, X 

the particular scan, Y the channel number and Z the 

	cell number. (This can be confusing, however, because 

of the mobile nature of the cell indicator).


	If all the data scans from a single experiment are stored 

within a single sub-directory, there will be no need to 

distinguish the run number, and the conventions can be 

relaxed e.g.:


	NameA1.RAZ


	where name is a user-specified identifier, A can indicate 

'Absorbance (uncorrected)', B can indicate 'Baseline 

correction', and C 'baseline-Corrected'. The Z identifier 

indicates the cell number, and is provided by the XL-A 

automatically. The 'Name' identifier should uniquely 

specify the particular scan conditions (time, temperature, 

speed, wavelength...).


1. XLABASE	The XLABASE program optionally smooths the 

experimental and baseline scans, subtracts the interpolated, 

(smoothed) baseline values from the (smoothed) run scan, 

and optionally smooths the corrected scan. The baseline 

correction is normally made from A360 data, which is 

usually collected along with A280 data. 


	The correction interpolates the baseline data when it lies 

between the run data points. The corrected run data are 

saved in the output file. The name of this file is left to the 

user, but the conventions listed above may help to avoid 

confusion.

 
Inputs:	Both run data and baseline data are read in as XL-A-

compatible files; i.e., files with the extension '.RA2', etc, 

where the '2' stands for the cell number. It is advisable to 

distinguish the three sample channels of a six-channel cell; 

i.e. NAMEa1.RA2, NAMEb1.RA2 and NAMEc1.RA2 

could specify the absorbance, baseline, and baseline-

corrected data respectively for channel number 1 in 

experiment NAME. (See above under raw data files).


	The program assumes that all three sets of data will be on 

the same drive/directory. Having to specify the drive and 

directory for each one makes it awfully long winded when 

they are all on the same drive.


Outputs:	The output data from the baseline correction is itself in 

XL-A format, so that it can be operated on by the 

programs: XLAMW, XLAMEN, XLAEQ, EQASSOC, 

etc.


Process:	The BASELINE program interpolates the A360 data on to 

the radial positions of the A280 scans and subtracts the 

optionally smoothed values from the optionally smoothed 

run file. The baseline corrected values are stored in a 

designated output file.

Operation:	The program asks for the names of the files containing the 

scan (e.g., A280) data, the baseline (e.g., A360) data, and the 

corrected data. The option of smoothing is offered, but it is 

probably best not to use it under normal circumstances. 

Smoothing tends to introduce ripples into the data as noise 

is propagated through adjacent data. If smoothing is not 

selected, the program displays graphs of the raw, baseline 

and corrected data.


	If smoothing is selected, the raw and smoothed data are 

each displayed before subtraction. Smoothing may be 

applied to the run data, the baseline data, or the corrected 

data, or any combination.


	After correction, the three files are displayed graphically, 

and it can be seen that non-random irregularities in both the 

data and baseline are removed during the correction. The 

standard errors of the absorbance measurements are 

preserved in the corrected files.


2. XLAMW	This program allows the analysis of sedimentation 

equilibrium experiments through the use of absorption 

measurements. Input files may be baseline-corrected or 

uncorrected. If the input file has been corrected, the output 

files will automatically be designated by the .BLN 

extension, otherwise no extension is added by the program. 


Inputs:	Some user-input data are requested by the program: Name, 

date, v-bar, solvent density, centrepiece thickness (in mm), 

absorbance coefficient (in units of cm L/g; i.e. the 

absorbance of a 1 g/L solution in a 1 cm cuvette) and the 

number of channels. If the absorbance coefficient is not 

known, use the value of 1.0, and all subsequent data will be 

in the scale of absorbance (corrected for solution thickness).


	The individual file names of the  data files for the separate 

channels (e.g., C1.RA2, C2.RA2, C3.RA2) are 

requested in a separate input screen. These files are 

required to be in the format of the raw data from the XL-

A, although the program is tolerant of the exact format for 

real numbers. 


Outputs:	The output files comprise the molecular weight data 

(Mnnnn files) and Omega data (Onnnn files). The output 

files can contain subfiles for up to three separate channels 

of a multichannel cell.


	File names for output are constructed automatically as 

follows:


	MnnnnA1 (.BLN)   or OnnnnA1 (.BLN)


	where the 'O' specifies Omega data and 'M' specifies 

molecular weight data. The identifier nnnn  is a user-

specified run number. This is best related to the run 

number in the log books, thereby unambiguously defining 

the run and all its conditions.


	The 'A' (or any other character) is a user-specified 

character that identifies the particular scan in the run. This 

should also be recorded in the log book.


	The '1' in this case specifies the cell number (no longer to 

be part of the extension in subsequent files).


	The '.BLN' signifies that the data have been baseline-

corrected.


Operation:	The concentrations are used to determine the point-average 

weight-average molecular weights and the Omega function 

at each measured point. (Note that 15 points are required 

for linear regression of ln(c) vs r2. Because of this, the 

point-average molecular weights of the first and last 7 

points are not available). 


	The option of subtracting a constant baseline allows a 

correction for residual absorbance of solvent components 

that is not removed by dialysis. The value of this residual 

absorbance can be determined by a brief spin at high speed 

to clear the meniscus of macromolecules. Alternatively, 

prior investigation with one of the programs that fits 

models to the c vs r distribution may have indicated a 

baseline offset value.


	A graph of c  vs r is initially displayed, to allow the user to 

inspect for bad points that may exist, often at the ends of the 

solution column, and which may arise from attempting to 

read data before the meniscus and beyond the bottom of the 

cell.     


	Then a graph of lnc vs r2 is displayed, allowing the user to 

trim away possible bad data at the ends with the use of the 

function keys:


	The function keys are used to move the cursors one at a 

time; the left cursor is checked first.  F1 and F8 allow big 

steps (100 points) to the left and right, respectively. F2 and 

F7 allow moderate steps (10 points) to the left and right, 

respectively, and F3 and F6 allow small steps (1 point at a 

time) to the left or right, respectively. Pressing Enter 

accepts the cursor position. If a mistake is made, the user 

has a chance to recover; the user is asked whether the 

cursor positions are correct before proceeding. Y allows 

the process to continue; N allows the user to redefine the 

cursor positions.  NB: The Enter or Return key is 

required after these entries!


	Once the valid data have been defined by the cursors, the 

entire valid set of ln c vs r2 data are used to estimate the 

weight-average molecular weight for the whole cell. This 

estimate would only be meaningful for a single, non-

associating ideal solute.

	
	The point-average weight-average molecular weight data 

are calculated by means of a sliding 15 point linear fit. 

Because of the differentiation step, these plots are 

frequently noisy.


Omega Analysis:	The operator is asked whether an Omega analysis is 

required. Reply with Y or N.  NB: The Enter or Return 

key is required after this entry!


	If an Omega calculation is required, the appropriate 

protomer molecular weight is needed, together with a value 

for the reference concentration. 

                                                                
Since the reference radius is interpolated from the data, it is 

important that the reference concentration actually be 

encompassed by the data. To ensure this, the user is 

prompted with the range of accessible concentrations. With 

the input M1 and c data, the Omega distribution is computed 

and displayed.


	Note that in order to check for overlap of the Omega 

function distribution in the three channels, it is essential that 

the same reference concentration be chosen in all channels 

(But see Morris, M. Ph.D. thesis, Sydney University, 1985).


	If the data are satisfactory, the user is prompted to save 

them. However, it may be that the effects of a different 

reference concentration or protomer molar mass need to be 

examined. The user may see the effects of these changes, 

and may save the new data, or may wish not to save them. 

Type Y to save, N not to save. NB: The Enter or 

Return key is required after this entry!


	Once the data from one channel have been analysed, the 

next file of data that was specified in the input list of files is 

loaded, and the process is repeated.	


3. XLAMEN	This program allows the analysis of meniscus depletion 

sedimentation equilibrium experiments (Yphantis, 1964). 

This method is especially useful if the sample contains very 

low molecular weight material with a significant absorbance 

(such as oxidised dithiothreitol), that may not be exactly 

matched in the reference sector. In analysing meniscus 

depletion experiments, the residual absorbance at the 

meniscus is subtracted from the data throughout the rest of 

the cell. 


	Input files may be baseline-corrected or uncorrected. If the 

input file has been baseline-corrected, the output file will be 

automatically designated by the .BLN extension; otherwise 

no extension is added. 


Inputs	Some user-input data are requested by the program: Name, 

date, v-bar, solvent density, centrepiece thickness (in mm), 

absorbance coefficient (in units of cm L/g; i.e. the 

absorbance of a 1 g/L solution in a 1 cm cuvette) and the 

number of channels. If the absorbance coefficient is not 

known, use the value of 1.0, and all subsequent data will be 

in the scale of absorbance (corrected for solution thickness).


	The individual file names of the  data files for the separate 

channels (e.g., C1.RA2, C2.RA2, C3.RA2) are 

requested in a separate input screen. These files are 

required to be in the format of the raw data from the XL-

A, although the program is very tolerant of the exact 

format of real numbers. 

	
Outputs:	The output files comprise the molecular weight data 

(Mnnnn files) and Omega data (Onnnn files). The output 

files can contain subfiles for up to three separate channels 

of a multichannel cell.


	File names for output are constructed automatically as 

follows:


	MnnnnA1 (.BLN)   or OnnnnA1 (.BLN)


	where the 'O' specifies Omega data and 'M' specifies 

molecular weight data. The identifier nnnn  is a user-

specified run number. This is best related to the run 

number in the log books, thereby unambiguously defining 

the run and all its conditions.


	The 'A' (or any other character) is a user-specified 

character that identifies the particular scan in the run. This 

should also be recorded in the log book.


	The '1' in this case specifies the cell number (no longer to 

be part of the extension in subsequent files).


	The '.BLN' signifies that the data have been baseline-

corrected.


Operation	The program determines the mean ordinate of the flat 

section of the concentration distribution at the meniscus, 

and subtracts this value from all subsequent points. Those 

points showing significant absorbance are then stored. 


	The meniscus-subtracted concentrations are then used to 

determine the point-average weight-average molecular 

weights and the Omega function at each measured point. 

(Note that 15 points are required for linear regression of 

ln(c) vs r2. Because of this, the point-average molecular 

weights of the first and last 7 points are not available). 


	A graph of c  vs r is initially displayed, to allow the user to 

inspect for bad points that may exist, often at the ends of the 

solution column, and which may arise from attempting to 

read data before the meniscus and beyond the bottom of the 

cell.


	The program allows some correction for the meniscus 

concentration not being exactly zero (provided that this 

value is not excessive; Teller, 1973). The curve of ln(c) vs r 

is extrapolated to the meniscus, to yield a revised value of 

c0 at the meniscus. This value is added to all concentrations 

and the process repeated as often as necessary. Usually, two 

iterations is sufficient; continued iteration may lead to 

divergence. 


	In choosing which part of the curve to extrapolate, bear in 

mind that it is the region closest to the meniscus that will be 

affected most by errors in the meniscus concentration. Do 

not extend the chosen range too far down into the region 

near the meniscus showing obvious curvature (see the 

figure below). Secondly, toward the bottom of the cell, 

association or high molecular weight species will cause the 

plot to curve upwards. Avoid allowing this region to 

influence the extrapolation.


	Once the meniscus correction is satisfactory, a graph of lnc 

vs r2 is displayed, allowing the user to trim away possible 

bad data at the ends with the use of the function keys:


	The function keys are used to move the cursors one at a 

time; the left cursor is checked first.  F1 and F8 allow big 

steps (100 points) to the left and right, respectively. F2 and 

F7 allow moderate steps (10 points) to the left and right, 

respectively, and F3 and F6 allow small steps (1 point at a 

time) to the left or right, respectively. Pressing Enter 

accepts the cursor position. If a mistake is made, the user 

has a chance to recover; the user is asked whether the 

cursor positions are correct before proceeding. Y allows 

the process to continue; N allows the user to redefine the 

cursor positions. NB:  The Enter or Return key is 

required after this entry.


	Once the valid data have been defined by the cursors, the 

entire valid set of ln c vs r2 data are used to estimate the 

weight-average molecular weight for the whole cell. This 

estimate would only be meaningful for a single, non-

associating ideal solute.


	The point-average weight-average molecular weight data 

are calculated by means of a sliding 15 point linear fit. 

Because of the differentiation step, these plots are 

frequently noisy.


	In meniscus depletion experiments, the concentration of 

solute approaches zero at the meniscus, and the number-

average molecular weight may also be obtained (Yphantis, 

1964). These data are available in the output files from 

XLAMEN (i.e., the Mxxxx files). .


Omega Analysis:	The operator is asked whether an Omega analysis is 

required. Reply with Y or N. NB: The Enter or Return 

key is required after this entry.


	If an Omega calculation is required, the appropriate 

protomer molecular weight is needed, together with a value 

for the reference concentration. 

                                                                
Since the reference radius is interpolated from the data, it is 

important that the reference concentration actually be 

encompassed by the data. To ensure this, the user is 

prompted with the range of accessible concentrations. With 

the input M1 and c data, the Omega distribution is computed 

and displayed.


	Note that in order to check for overlap of the Omega 

function distribution in the three channels, it is essential that 

the same reference concentration be chosen in all channels. 


	If the data are satisfactory, the user is prompted to save 

them. However, it may be that the effects of a different 

reference concentration or protomer molar mass need to be 

examined. The user may see the effects of these changes, 

and may save the new data, or may wish not to save them.


	Once the data from one channel have been analysed, the 

next file of data that was specified in the input list of files is 

loaded, and the process is repeated.


4. MWMERGE	Point-average molecular weight data obtained  from 

XLAMW or XLAMEN may be collected from several 

channels that employ different loading concentrations; e.g., 

the use of the Yphantis 6-channel cell gives three sets of 

data. These three sets of data may all be pooled and sorted 

in increasing concentration by the program MWMERGE. 

The output file is designated with the suffix .MGE.	


	The merged data are plotted at the completion of the sorting 

process, to give an impression of the degree of overlap.


5. OMMERGE 	Omega data obtained  from XLAMW or XLAMEN may 

be collected from several channels that employ different 

loading concentrations; e.g., the use of the Yphantis 6-

channel cell gives three sets of data. These three sets of data 

may all be pooled and sorted in increasing concentration by 

the program OMMERGE. The output file is designated 

with the suffix .MGE.


	The merged data are plotted at the completion of the sorting 

process, to give an impression of the degree of overlap.


6. PREAVMW	The pooled and sorted Mw data in the .MGE files can be 

averaged in groups of 2 to 40 points, and the mean value of 

Mw, together with the standard error of that mean, is 

tabulated at the mean value of c for each group. The output 

file is designated with the suffix .DAT.


	This approach is only meaningful if there is acceptable 

overlap of the several channels (as evidenced, for example, 

from the graphical output of the program MWMERGE or 

SHOWMW).

                      	
	The number of points averaged for each group may be 

specified (between 2 and 40 points).

                                   
	Once the  pooled and sorted data have been smoothed by 

box-car averaging, the mean Mw � SE at the mean c is 

saved to file, and a plot is shown on the screen.     

          
	An output file in a format acceptable to Cricket Graph is 

also saved, so that the data may be printed in graphical 

form.


7. PREAVOM	The pooled and sorted Omega data from the .MGE files 

can be averaged in groups of 2 to 40 points, and the mean 

value of Omega, together with the standard error of that 

mean, is tabulated at the mean value of c for each group. 

The output file is designated with the suffix .DAT.


	This approach is only meaningful if there is acceptable 

overlap of the several channels (as evidenced, for example, 

from the graphical output of the program OMMERGE.


	The number of points averaged for each group may be 

specified (between 2 and 40 points).


   	Once the  pooled and sorted data have been smoothed by 

box-car averaging, the mean Omega � SE at the mean c is 

saved to file, and a plot is shown on the screen.               


	An output file in a format acceptable to Cricket Graph is 

also saved, so that the data may be printed in graphical 

form.


8. PLOTMW	The data in the files of preaveraged Mw data may be shown 

graphically on the screen. This program reads and displays 

the data in the .MGE and .DAT files. With the data from 

.DAT files, the plot shows error bars representing � the 

SEM. The data from the .MGE files are shown in only a 

single colour; once the data are merged, each point loses the 

identity of the channel from which it came.

	 
9. PLOTOM	The data in the files of preaveraged Omega data may be 

shown graphically on the screen. This program reads and 

displays the data in the .MGE and .DAT files. With the 

data from .DAT files, the plot shows error bars 

representing � the SEM. The data from the .MGE files are 

shown in only a single colour; once the data are merged, 

each point loses the identity of the channel from which it 

came.


10. SHOWMW	This program displays on the screen the multiple sets of Mw 

data from different loading concentrations (either from the 

.BLN files, or from the uncorrected data) so that overlap 

may be assessed. The different sets are designated with 

different colours on screens capable of displaying colour.


11. SHOWOM	This program displays on the screen the multiple sets of 

Omega data from different loading concentrations (either 

from the .BLN files, or from the uncorrected data) so that 

overlap may be assessed. The different sets are designated 

with different colours on screens capable of displaying 

colour.		


Nonlinear Regression Programs


The nonlinear regression programs in this package are based on the Marquardt 

algorithm, as implemented by Duggleby (1984). The original BASIC program 

of Duggleby has been rewritten for Turbo Pascal as the core of the present set 

of programs. Procedures for calculating the functional dependence of the  

appropriate dependent variable on independent variable have been written for 

a number of simple models, in which the number of equilibrium constants is 

kept to a minimum. Nonideality can be taken into account through the use of a 

single second virial coefficient, assumed to be the same (in units of L mol g-2) 

for all oligomers (The Adams-Fujita approximation; Adams and Fujita, 1963).


DUGOM	Fitting to the Omega Function


In the case of the Omega function, the reference concentration, as well as the 

concentration at each of the data points is subject to experimental error. 

Assuming that the reference point is known with absolute accuracy leads to a 

propagation of systematic error through the whole data set: the Omega 

function is thereby constrained to pass through the point (cref, 1.0). In order to 

avoid this bias, the 'true' value of the reference concentration must be treated 

as a parameter to be estimated at each round of iteration.


For each updated set of parameters (cref, the various K and B), a1,ref, the 

thermodynamic activity of the protomer at the reference radius, is calculated 

through the use of the appropriate Func1 procedure (specific for the relevant 

model). With the updated value of a1,ref, together with the current set of cref,K 

and B, the set of Omega values is calculated over the domain of c  through the 

use of the procedure Func (also specific for the relevant model).


The following chart illustrates the process:


Read data


Select model

Input estimates of parameters

Calculate a1,ref, the activity of the protomer at cref, the 	       	.

current reference concentration						

Use the current parameter estimates and a1,ref to evaluate the     

object function for each point (c, Omega)		


If Chi-squared is reduced, revise the parameter estimates and     

switch towards the Gauss-Newton algorithm for the next round;

if Chi-squared is not reduced, switch towards steepest descent,

take a smaller step and re-revise the parameter estimates.         


For the simpler models, such as Monomer-Dimer and Isodesmic (SEK I), the 

object function is an explicit function of the independent variable, c, and can 

be calculated directly. For the cooperative isodesmic (SEK III) model (Adams 

et al., 1978), however, the protomer concentration is required as a function of 

the total concentration, but is only available as an implicit function. The 

Newton-Raphson algorithm may be used to find c1 in these cases, but it is 

essential that starting estimates of the root be chosen such that the process 

converges on the correct root. The process may be carried out incrementally. 

At very low concentrations, c1 will be near zero, and zero is thus an 

appropriate starting estimate for the root, c1. At subsequent points, provided 

that the concentration increases monotonically, the previous root is an 

appropriate starting estimate for the next. Since the object functions for the 

isodesmic and cooperative isodesmic models are closed only when kiso .c1 <1, 

it is important to check at each solution whether c1 is indeed less than  1/kiso; if 

not, it is likely that the correct root has been bypassed; the starting estimate for 

the root should be reduced, and the Newton-Raphson procedure should be re-

started. 


It has been found that, for the SEK III model, a bisection approach to the root  

in the interval 0 - 1/k will avoid the incorrect root, but is exasperatingly slow. 


Operation


In running these programs, initial estimates are required for the parameters. 

In addition, it is possible to fix the values of these parameters ('masking') so 

that they do not change after each iteration. It may be desirable to constrain 

possible parameter values to certain limits (e.g., concentrations and 

equilibrium constants should be positive quantities). However,  while a 

negative value of k or B  is physically meaningless, there may be times when 

the estimated quantity returned from fitting is near zero. In these cases, it may 

be wise to allow the final parameter value to take on a small negative value 

(often less in magnitude than its uncertainty) to aid convergence to a final set 

of parameters. Without allowing a small negative value, the algorithm 

constantly tries to find only positive solutions, and this slows down 

convergence and may bias the estimation of other parameters.


Recall that the reference concentration is a parameter to be estimated. Its 

measured value should be a good starting estimate. However, the returned 

value should not differ by more than a few percent from the experimentally 

measured value. Larger drifts in the returned value of cref  may indicate that 

the chosen model is inappropriate.


It should be noted that the equilibrium constants, k, estimated in these 

programs are the quantities Ki/M1 ; i.e., the values in the molar scale, 

expressed in the g/L scale. These values are related to the quantities in the g/L 

scale, k(L/g):


	 			k(L/g)   = i Ki/M1 


Below is an example run of the program DUGOM. 


	If a .DAT file is selected, the SEM data may be used to provide 

an estimate of the variance for the purposes of statistical 

weighting.

                       
Do you want to use weighting?  [y/n]  y  

Input std. dev. ?  [y/n]              y               

                                             
Do you have a printer connected? [y/n]y


At this point, the relevant model may be selected from the list 

below. (This list does not match that from DUGMW,  either in 

cardinal or ordinal number!........real soon...)


Please select the model...                 

Do you want to test:                       

1 : Monomer-Dimer ?                        

2 : Monomer-nmer  ?                        

3 : SEK1                                   

4 : SEK3 ?                                 

5 : SEK4 ?                                 

6 : SEK5 ?                                 

7 : AK1 ?                                  

8 : AK4 ?                                  

9 : Dimer-Tetramer-Hexamer                 

A : Dimer-Tetramer-Hexamer-Octamer         

B : Monomer-Dimer-Tetramer                 

C : Single Nonideal Species                                                      

                                           
Please indicate [1-C], and press RETURN... 3


The data may be trimmed by moving the left and right cursors with the 

function keys. F1 moves the cursor 100 points leftwards, F2 moves it 10 

points leftwards and F3 moves it one point at a time leftwards. F6 moves the 

cursor right one point at a time, F7 moves it 10 points at a time, and F8 

moves it 100 points at a time. Start with the left cursor, and trim away 

excessively noisy or otherwise dubious points. Press the return key when you 

are satisfied with the cursor position. Repeat for the right hand cursor.


The program then indicates the number of 'active' parameters, and calculates 

the initial sum of squares of residuals:


3 parameters                          

Initial sums of squares = 1219.6763  


During the fitting process, information on the fitting is provided to the user: 

the updated parameter values, the current sum of squares (so that it can be 

checked that the sum of squares is indeed decreasing!), and the Marquardt 

Lambda, indicating the success of the optimization. For each successful 

iteration, lambda is reduced. If during an iteration, the sum of squares 

increases, lambda is increased, and smaller steps are taken in a 'more steepest 

descent' mode.


You have the option of watching successive iterations graphically; it slows 

down the action a little, but you get to see what is going on. If all is going 

well, the fitted line should move gradually to fit the points better and better. 

If it doesn't, then:


1) The algorithm may be stuck in a "local minimum"; this means that in 

order to find a lower sums of squares, the program will have to allow the 

sums of squares to increase for a step or two, but this is not allowed. A way 

around this is to try slightly different sets of initial parameter estimates. 

Try values both above and below those you first tried. If you eventually 

find the parameters of "best" fit, you should be able to converge on the 

same values both from below and from above.


2) The model may not be the most appropriate; the "best" fit may not be a 

"good" fit. If you can't improve by trying new initial parameter sets, try 

another model.


3) The parameter value may be stuck in a region that is insensitive. For 

example, if the equilibrium constant is extremely large, there will be 

effectively no monomer detectable, and the algorithm cannot distinguish its 

current large value of K from any other large value of K. Similarly, if B is 

set to zero, its value may not be able to change; try values like 1e-8 instead.


After 10 iterations, the operator has the option of trying new parameter 

estimates (if the initial ones seem to cause the algorithm to get stuck), 

continuing with the process (if things seem to keep improving), or giving the 

whole thing away as a bad job!                                                                                        

            
Stopped after 10 iterations                                      

continue [c]                                                     

new parameters [n]                                               

accept current estimates [a]                                     

quit [q]                                                         

Enter letter [c,n,a,q] :  c                                       


If all is well, several rounds of continuation may result in satisfactory 

convergence. However, sometimes the process gets to a point where the sum of 

squares doesn't improve, yet the parameter values don't settle down 

sufficiently within the set tolerance. This can happen with noisy data and a 

model in which parameters are highly correlated. It may be best in these 

circumstances to accept the current set, if the fit looks visually satisfactory.                                   


	The final parameter values are :   

                  
              k12:       1.764E+00 +/-  3.771E-02        

    B(L mol/ g^2):  2.404E-07 +/-  5.308E-08        

             cref:        1.739E-01 +/-  8.463E-04        

                                                     
    Sum of squares is :  2.392E+02                       

                                                     
Large drifts in the returned value of cref  may indicate that the chosen model 

is inappropriate.


After a successful convergence, statistics on the goodness of fit are provided. 

If a printer is connected, the fitted data, the parameters and their approximate 

(asymptotic) standard errors, and some statistics about the fit will be printed 

out. It is also possible to print out the covariance and correlation matrix. 

    
Runs test indicates >5% probability of randomly 

distributed residuals. 

    
In our laboratory we find it useful to have an output file that can be read and 

plotted by Cricket Graph:    


Do you want a cricket graph output file?  [y/n]   n  


WTDUGMW and DUGMW.  


These two programs have separate existences because of the file structures of 

.DAT, .MGE and .BLN files. The .BLN files contain information on r2, 

lnc and Mn that are not carried by the .DAT and .MGE files. Consequently, 

DUGMW is provided to allow fitting to the individual channels in a .BLN 

file. For fitting to merged or preaveraged data, WTDUGMW must be used, 

since this program does not expect to find those additional data. WTDUGMW 

can also use the SEM data in the .DAT files to perform a weighted fit.


The Omega Function


Although recent trends justly favour analysis of association reactions in terms 

of the c vs r distribution, there are good reasons to consider the distribution of 

apparent weight average molecular weight as a function of concentration. 

Firstly, the behaviour of this distribution often (but not always) gives a clear 

indication of the type of self-association behaviour that is occurring, and 

whether nonideality is evident; to the eye, the exponential shape of the c vs r 

distribution does not reveal a great deal.


Secondly, in a purely associating system, at equilibrium, and with all 

macromolecular solute capable of taking part in the reactions, the distribution 

of Mw,app is a well-defined function of the total concentration of the solute and 

the various equilibrium constants; the concentration loaded into the cell, the 

radial position at which measurements were made, and the speed of rotation do 

not influence the Mw,app distribution. 


On the other hand, for a system of two or more non-reacting, independent 

solutes, the distribution of Mw,app with concentration will be influenced by 

these operational variables. As a consequence, Mw,app data obtained from a 

purely associating system at equilibrium, over a range of different loading 

concentration, speeds, and radial positions will all lie on a single continuous 

curve, whose shape will depend on the values of the various equilibrium 

constants and virial coefficients. If the reacting system has not attained 

equilibrium, if there are species present that are for some reason incapable of 

taking part in the reaction, or if there are gross contaminants present, the 

Mw,app vs c data will not lie on a single continuous curve. In principle then, 

the Mw,app distribution from several experiments can give important 

information on the attainment of equilibrium and the presence of 

contaminants, as well as the parameters of association. In practice, however, 

this information is often compromised by experimental error. 


In the calculation of the point-average weight average molecular weights, the 

original c vs r data must be transformed and differentiated. In the process, the 

noise in the original experimental data is increased and its distribution is 

distorted. This effect is illustrated below with data calculated from simulated c 

vs r distributions for (a) a purely associating system, and (b) a mixture of two 

different, non-interacting  species. The original simulated distributions were 

computed by Dr. Allen Minton as TEST1xx.RA1 and TEST2xx.RA1 (where 

xx designates the loading concentration in g/L) and placed on the RASMB file 

server. While it appears that the data in (a) might lie on a single continuous 

curve, the noise inherent in the data make this conclusion unconvincing. Real 

experimental errors, which tend not to be purely random, but often are 

correlated or contaminated with systematic errors, make matters even worse.


The Omega function was introduced by Milthorpe, Jeffrey and Nichol, as a 

means of avoiding the differentiation required in computing Mw,app. Omega 

was conceived as the ratio of the actual measured concentration at any point in 

the centrifuge cell, to the thermodynamic activity of a single, non-associating 

ideal monomeric species whose activity was equal to the concentration of the 

real solute at some chosen reference position (ref, c(ref)).  Since the 

distribution of such an ideal species is given by:


			a1(r) = a1(ref) exp (fM1(r^2 - ref^2))				(1)


where f  = (1- vr)w^2/2RT,								(2)


and when a1(ref) is set to  c(ref),  it follows that:


			Omega  = c(r)/a1(r)  

			     = c(r) exp (f M1(ref^2 - r^2)) / c(ref)			(3)


Since the thermodynamic activity of a monomer in an association equilibrium 

in a centrifuge cell will be distributed according to eq (1), it follows that, 

when a1(ref) refers to the actual thermodynamic activity of the real monomer 

in equilibrium at the reference concentration:


			Omega  = a1(ref)c(r)  / a1(r)c(ref)  				(4)


Milthorpe et al. (1975) showed that Omega0, Omega extrapolated to zero 

concentration,  yields a1(ref)/c(ref).  Since we are able to choose c(ref), we may 

calculate a1(ref), the thermodynamic activity of the monomer at the reference 

point, from which the thermodynamic activity of the monomer at any other 

point could be determined:


			a1(r) = a1(ref) exp (f M1 (r^2-ref^2))	


			        = a1(ref) c(r) Omega(r) c(ref)				(5)


Morris and Ralston (1985) showed that the Omega distribution, like the 

distribution of Mw,app, was itself a continuous and unique function of the total 

concentration for a self-associating solute, and that the parameters of self-

association could be estimated from fitting the Omega vs c distribution. 

Importantly, because the Omega distribution is not distorted by differentiation, 

it is much easier to detect heterogeneity or failure to attain equilibrium. 

Compare the curves (a) and (b) below, each derived from the same set of 

simulated data as shown in the corresponding figures above. It is clear, even in 

the presence of random noise, that the data of curve (a) come from an 

reversible self-association, and that the data of curve (b) indicate 

heterogeneity. The Omega function is sensitive to the presence of both high 

molecular weight and low molecular weight contaminants. Low molecular 

weight contaminants (e.g. from proteolysis) result in an abrupt turning up of 

the Omega curve as the concentration  approaches zero (Ralston and Morris, 

1992). 


It is interesting to note in this context that, while estimation of the true 

parameters of self-association requires knowledge of M1, the test of 

homogeneity does not require such prior knowledge. Any reasonable estimate 

of M1 may be made, providing that the same value is used for all sets of data 

that are compared.


Extrapolation of the Omega function to zero concentration may be aided by 

fitting the distribution with appropriate models. Over a limited concentration 

range near zero, the exact form of the model may not be important, as at low 

concentrations, the contribution of higher oligomers will be small. The 

parameters returned from a good fit allow computation of a1(ref)  - equivalent 

to extrapolation to zero concentration. With this value available, it is possible 

to calculate the distribution of thermodynamic activity as a function of total 

concentration. This has recently been done in the case of human erythrocyte 

spectrin (Ralston, 1994), for which independent experiments had provided 

concentrations of several associated species as a function of total concentration. 

Combination of the two sets of data allowed an assessment of the activity 

coefficient of the protomer of spectrin as a function of the total concentration.


The present group of programs provides the means of fitting the Omega 

distributions (calculated from the radial distribution of concentration) with 

various reaction models, both discrete (i.e., a limited number of species) and 

indefinite (i.e., reaction schemes in which monomer can add to pre-existing 

oligomers without limitation in the size of oligomer). The specific models and 

the relevant equations are listed below.

 
The Models


Omega


For all models, the monomer concentration is calculated from the total 

concentration of the macromolecular component and the various equilibrium 

constants of the reactions, i.e. by solving the equation:


	c = f(c1, k1...kn)


(Note that this presupposes that the equilibrium constants are unaffected by 

nonideality - a consequence of the Adams-Fujita approximation).


For some of the simpler models, an explicit solution may be derived; for 

others, c1 must be solved by means of a Newton-Raphson process. 


The monomer activity is calculated from the monomer concentration, the total 

concentration of the associating component and the second virial coefficient, 

by assuming the Adams-Fujita approximation:


	a1 = c1 exp(BM1c)


From the first pass with a new set of parameters, a1ref, the activity of the 

monomer at the reference point, is calculated from the value of c = cref.  With 

a value of a1ref available, Omega may be calculated at any other radial position 

from the relationship (Milthorpe et al., 1975):


	Omega = c a1ref  /  a1 cref.


For each model, then, the specific parameters and equations are those that 

relate c to c1.


Single, nonideal, nonassociating species:


  	At all concentrations, c1 = c.


Monomer-Dimer


	c  = c1 + 2k2c1^2


 	c1 = (-1 + sqrt(8.0 k2 c + 1))/ (4*k2)		(k2 = K2/M1)

   
Monomer-n-mer


	c  = c1 + n kn c1^n					(kn = Kn /M1)


Monomer-dimer-trimer


	c =  c1 + 2k2c1^2 + 3k3 c1^3			(ki = Ki/M1)


Monomer-dimer-trimer-tetramer

	
	c =  c1 + 2k2c1^2 + 3k3 c1^3 +  4 k4c1^4		(ki = Ki/M1)


Isodesmic (SEK I)


	c = c1/(1-kc1)^2					(k = K /M1)


	c1 =  (2kc + 1 - SQRT(4kc +l ))/ 2k^2c


Cooperative Isodesmic (SEK III)


	c = c1 [1 + k2c1((2-kc1)/(1 - kc1)^2)]		(if kc1 < 1)


SEK IV


	2A = A2						K2


	A2 + A2 = A4 ;   	A2  + A4= A6 ;  .......	K


	c = c1 [1 +  2 k2 c1(1 - k2kc1^2)^2]		(k = K/M1 , k2 = K2/M1) 


SEK V


	2A = A2		K2


	A2 + A2 = A4	K24


	A2 + A4 = A6....    K


 	c = c1 + 2k2(1 + k24c)c1^2  - 2 k2k24 c1^3  - k22k242cc1^4  + k22k242c1^5


Attenuated Indefinite (AK I)


	c = c1 exp (k c1)


References


Adams, E.T. and Fujita, H. (1963)  Sedimentation equilibrium in reacting 

systems: Ultracentrifugal analysis in theory and experiment  (J.W. 

Williams, ed) Academic Press, New York.

Adams, E.T., Tang, L.-H., Sarquis, J.L., Barlow, G.H., and Norman, W.M. 

(1978) Self-association in protein solutions.  Physical Aspects of Protein 

Interactions (N. Catsimpoolas, ed) Elsevier/North-Holland, Amsterdam  

p 1.

Duggleby, R.G. (1984) Regression analysis of nonlinear Arrhenius plots: an 

empirical model and a computer program. Comp. Biol. Med. 14, 447-

455.

Milthorpe, B.K., Jeffrey, P.D. and Nichol, L.W. (1975) The direct analysis of 

sedimentation equilibrium results obtained with polymerising systems. 

Biophys. Chem. 3, 169-176.

Morris, M. and Ralston, G.B. (1985) Analysis of sedimentation equilibrium 

through direct fitting of the Omega function. Biophys. Chem..  23, 49-

61.

Ralston, G.B. and Morris, M.B. (1992) The use of the Omega function for 

sedimentation equilibrium analysis. In Analytical Ultracentrifugation in 

Biochemistry and Polymer Science (S. E. Harding, A.J. Rowe and J.C. 

Horton, eds, Royal Society of Chemistry, Cambridge) pp 253-274.

Ralston, G.B. (1994) The concentration dependence of the activity coefficient 

of the human spectrin heterodimer. A quantitative test of the Adams-

Fujita approximation. Biophys. Chem.  in press.

Teller, D.C. (1973) Characterisation of proteins by sedimentation equilibrium 

in the analytical ultracentrifuge. Methods in Enzymology  23, 346-441.

Yphantis, D.A. (1964) Equilibrium ultracentrifugation in dilute solutions. 

	Biochemistry  3, 297-317.


System Requirements


The following system is recommended for running these programs:


�	MS-DOS computer (a 286 processor is recommended as a minimum 

configuration; significant enhancement in speed will be experienced with 

a 386 or 486 processor).


�	A minimum of 640 k RAM 


�	Hard Disk Drive with at least 1 Mbytes free space.


�	Printer connected to the parallel port (Optional).


�	A maths co-processor is highly recommended, but not essential.


�	Monitor  and graphics controller of at least CGA resolution.


�	{A mouse is not yet implemented }.


�	MS-DOS, version 5 or later.


Installation


Data are assumed to be located on the hard disk, drive C:, although the user 

can override this default. No specific directory is assumed.


For the graphics, the Borland Graphics Interface should be present in the same 

directory as the programs. These are the *.BGI files.


The programs may be loaded and run individually, or, if they all reside on the 

same drive with the *.BGI files and the menu program, OMMENU, that 

menu program will allow the user to select the appropriate programs, one by 

one, with a brief description of their functions. Control is returned to the 

menu program after each program finishes, and the menu program can be 

terminated by the choice of QUIT.


The programs were written in Borland's Turbo Pascal version 6.