Statistical Approach for Comparison of Two Analytical Assay Methods: A Review
Abstract
The objective of the article is to illustrate how to establish comparability of two analytical test methods of the same analyte in pharmaceutical formulation involving small number of data with appropriate statistical tools. There are two common statistical tools for comparison, Students t- test and the variance ratio test (F-test). t- ratio and f- ratio are calculated and compared with standard t- distribution and f-distribution charts. Student t-test is adopted when data is generated through repeated tests with a single commercial batch and paired t-test is adopted when data is obtained by analyzing successive batches simultaneously with the two methods. Results of the presented case study shows that two analytical methods are comparable, or in other words their mean and precision values are similar. From the results and interpretation of presented case study it is concluded that applied statistical tools can be easily adopted in any analytical laboratory, particularly for any exploratory investigations to get a good notion about two test methods.
Key Words: Comparison of two analytical test methods, Statistical tools, Students t- test, paired t-test, Variance ratio test (F-test)
Introduction
In a quality control laboratory, often situations may arise when an old assay method may require replacement with a new one. Various situations may be responsible for this, namely inclusion of newer test method in the Pharmacopoeia, development of stability indicating test methods, quicker release of the drug with the new method as in case of replacement of a microbial method with a HPLC method, avoiding usage of poisonous chemicals like pyridine, which was a requirement for the old method, lessening cost of chemicals or usage of resources by adopting a new method, etc.
As per pharmaceutical regulatory bodies1,2, no requirement for any comparison study is stated in case of such replacement, but they advise to perform method validation study prior adoption. Particularly for exploratory investigations, before performing such elaborate method validation studies, it may sometimes require to establish that at least the new method is capable of generating results similar to those provided by the old method. Such decisions can be reached using small number of test results. Mere averaging the sample results cannot do such comparisons. It is a statistical task to find a measure for the results and to ascertain the chance of occurrence of various values of this measure when there is no difference. These decisions are called statistical decisions3.
With the help of an example, the article cites two models along with relevant statistical calculations and inferences that can be followed for the purpose.
Methods
An established method (A) for estimation of L-Lysine Hydrochloride4 in a formulation is to be replaced with a new one (B). The new method is based on a separate scientific principle. The old method needs separation of the ingredient using Thin Layer chromatography (TLC) followed with an extraction step with a solvent and its subsequent quantification with spectrophotometer. The separation step involving TLC and its extraction was time consuming, as the development of the plate required overnight keeping resulting in about 2 days work. The new method utilizes reflux condensation with a suitable reagent and then determination with spectrophotometer. This method could be completed within four hours time. Adoption of the new method would result in quicker release of the batch.
In attempting to reach statistical decisions, the steps5-11 followed are:
1. We make assumptions or guesses about the population of the data generated. Thus, if we want to decide whether one method comparable with the other, we assume that there is no difference between the methods (i.e. any observed differences are merely due to fluctuations in sampling from the same batch, etc). This is called null hypotheses and is denoted by H. Any hypothesis, which differs from H,is called an alternate hypothesis and is denoted by H1.
2. Then we perform suitable calculations (student t- test, paired t- test, or f- test) either to accept this assumption or reject it.
If we reject this hypothesis when it should be accepted, we say that a Type I error has been made. If, on the other hand, we accept this hypothesis when it is to be rejected, we say that a Type II error has been made. In either case a wrong decision has occurs. In order for the test of hypotheses to be good, they must be designed so as to minimize errors of decision. The only way to reduce both types of error is to increase the sample size, which may or may not be possible.
In testing a hypothesis, the maximum probability with which we would be willing to risk a Type I error is called the level of significance (α) of the test. In practice, a level of significance of 5% is customary for a pharmaceutical analysis.In other words we test the hypotheses that the methods are comparable at about 95% confidence level (or 0.05 level of significance) mean that the there are about 5 chances in 100 that we would reject the hypothesis when it should be accepted, i.e. we are about 95% confident that we have made the right decision.
For the comparison, any of the following two models can be used: -
Model I: Tests may be performed using on a single commercial batch and analyzing it separately with the two methods
Model II: Tests may be performed by using different commercial batches, analyzing each batch with the two methods.
Model I: A single commercial batch is chosen and analyzed for L-Lysine Hydrochloride ten times separately with the two methods. Data obtained is tabulated in Table No. 1.
Comparison of precision values of two test methods:
The first step for comparison is to find any existence of significant difference between the precision values. Here we make assumption that there is no significant difference between the precision values of the two methods. This hypothesis is represented as follows:
Null Hypothesis: F : σ12 = σ22, where σ2 represents the variance of the two methods.
Alternate hypothesis is F1: σ12 ≠ σ22
To test the hypothesis we calculate F-ratio5-11, given by the formula below,
F = S1 2 / S2 2 ...........(1)
S1 and S2 are the sample standard deviations of the old method and the new method respectively. The larger value of S is to be used as numerator so that the value of F is always greater than unity.
The value obtained for F is then compared with the values in the F- table corresponding to the numbers of degree of freedom (namely 9,9) for the two sets of data at 5% probability level. If the calculated value of F is lesser than the tabulated value, the null hypothesis cannot be rejected.
Comparison of the mean values of two test methods
For the comparison of the means of two test methods, student t-test5-10 is to be applied. As stated earlier we make assumption that there is no difference in the average values (µ1 and µ2 respectively) of the two methods or in other words, testing hypothesis: H: µ1 = µ2, which is equivalent to testing H0: µd = 0, where µd is the difference in average values (µ1 - µ2) of the test results.
Alternate Hypothesis: H1: µd ≠ 0
In this case, the value of t5-11 is given by:
Where, x1 and x 2are the mean value of the analyte tested by the two methods, n1 and n2 are the number of replicates for the old method and the new method,
are the mean value of the analyte tested by the two methods, n1 and n2 are the number of replicates for the old method and the new method,
and, Sp5-10, the pooled standard deviation =
The value obtained for t is then compared with the values in the t- table corresponding to the degree of freedom (n1 + n2 –2), i.e. 18 at 5% probability level. If the calculated value of t is lesser than the tabulated value, the null hypothesis cannot be rejected.
Model II: Comparison of two test methods are carried out by analyzing successive production batches for L-Lysine Hydrochloride using two methods. The results are tabulated in Table No.2.
Comparison of the mean values of two test methods
Each of the 10 batches has got different L-Lysine Hydrochloride content due to inherent variations during the production operation. As a result here paired comparison is to be made for each batch. 8,10This type of data are analyzed using paired t- test where we may make inference about the difference in the mean assay readings with the two methods, i.e. (µ1 - µ2) by making inferences about the mean of the differences µd.
Testing Hypothesis: H0 :µ1 = µ2 is equivalent to testing H0: µd = 0
Alternate Hypothesis: H1: µd ≠ 0
The test static 5-11 for this hypothesis is
Where d = (1/n) ∑ dj,is the sample mean of differences.
dj = (d1 – d2), d1 and d2 is value obtained by analyzing the analyte by the old and new method respectively (table no. 2).
The sample standard deviation 5-11 of the differences is represented by:
The value obtained for t is then compared with the values in the t- table corresponding to the degree of freedom 9 at 5% probability level. If the calculated value of t is lesser than the tabulated value, the null hypothesis cannot be rejected
Result
In the Model I, for comparison of precision of the two test methods it was found in the present case, S1 = 2.434 and S2 = 2.016 (table no. 1).
Therefore, F = (2.434) 2/ (2.016) 2 = 1.457 calculated by using equation (1)
Degrees of freedom for the new and old method (ν1 and ν2) = (no. of replicates – 1) = (10 -1) = 9.
For comparison of precision of the two methods, referring to the F- distribution table5-10, at 5% probability level, the corresponding chart of Fα, ν1, ν2 or F0.05, 9, 9 indicates a value = 3.18. Any F value above 3.18 would lead to the conclusion that the precision of the two methods is different.
Since in the present case FCalculated (1.457) < tabulated F value, hence under these conditions there is less than 5 chances in 100 that these precisions of the two methods are different. 6
In the Model I, for comparison of mean of the two test methods it was found in the present case,
n1 = 10, n2 = 10, S1 = 2.434, S1 = 2.016 (table no. 1)
Therefore, Sp = ± 2.235 calculated by using equation (3)
Hence, t = 0.545 calculated by using equation (2)
For comparison of mean of the two methods, referring to the t -distribution table8, at 5% probability level the tabulated value of t for two-tailed distribution and (n1 + n2 –2), i.e. 18 degrees of freedom is 2.101. Note that this value refer to what is known as a two- tailed distribution as the alternative hypothesis specified here concerns with the probabilities of occurrences of either of the two cases, µ1 > µ2 orµ1 < µ2. In these cases refer to the table corresponding to 2.5% probability level 4.
Since here calculated t- value (0.545) is less than the tabulated t- value, there is no significant difference between the mean values of the two methods. Hence, the two methods have comparable precision and comparable mean values.
In the Model II, for comparison of the two test methods, it was found in the present study
n = 10, and d = - 0.8033 (table no.2)
Therefore, Sd = 3.50 calculated by using equation (5)
Therefore, t = (- 0.8033 * √ 10)/ (3.50) = - 0.688 calculated by using equation (4).
Considering the modulus value t = 0.688.
For paired comparison as shown under Model II, referring to the t -distribution table, at 5% probability level the tabulated value of t for two-tailed distribution and (10-1) = 9 degrees of freedom is 2.262.
Since the calculated t- value (0.688) is less than the tabulated t- value, we cannot reject the null hypothesis H0; in other words, there is no significant difference between the means of the two methods.
Comparison of precision of the two methods is not possible in this model as the variation of data involved here is not purely due to variation in the method, but also due to batch-to-batch variation as a result of inherent variability in the production process.
Discussion
The formulae for performing t- test are different in the two models. One utilizes the pooled standard deviation, Sp, of the samples whilst the other uses the sample standard deviation of the differences Sd. In any of the cases, the approach is to calculate the t- value and to compare it with the tabulated value from the t-table at α = 0.025 (considering two tailed distribution as a result of which α = 0.025 instead of α = 0.050) or at 95% confidence level. If the calculated value is lesser than the tabulated value, this means that the there is less than 5 chances in 100 that the methods disagree with each other.
Though books on statistics prefer making comparisons within matched pairs7 as in Model II, still in this particular problem with L-Lysine Hydrochloride in a pharmaceutical formulation, Model I is preferred over the Model II. A little reflection will reveal a serous disadvantage in the paired comparison method. Inherent variability in the production process (even with a validated production process) will contribute to the variability in assay results and will tend to inflate the experimental error, thus making a true difference between the results of the two testing methods harder to detect.
Future Prospects
In case of any replacement of an old method with a new one the two models cited here along with their statistical calculations and inferences could prove to be useful in any analytical testing laboratory.
Acknowledgement:
The authors are thankful to Sri. Avijit Gupta, Head Of Department, SQC and Operational Research, Indian Statistical Institute, Kolkata and Sri. M.M.Das, Manager, Quality Assurance, East India Pharmaceutical Works Limited, Kolkata for providing valuable suggestions during the study on L-Lysine Hydrochloride at East India Pharmaceutical Works Limited.
References:
1. Quality assurance of pharmaceuticals, A compendium of guidelines and related materials, Volume 2, 2nd updated edition, Good manufacturing practices and inspection, World Health Organization: 129-131.
2.Guidance for Industry, Q2 (R1) Validation of Analytical Procedures: Methodology, U.S. Department of Health and Human Services, Food and Drug Administration, Center for Drug Evaluation and Research (CDER), Center for Biologics Evaluation and Research (CBER), International Conference on Harmonization, Geneva, November 2005.
3.Youden WJ. Statistical Methods for Chemists, 1st ed. New York- John Wiley & Sons, Inc. London ; 1961:24-32.
4.Martindale Reynolds JE., ed. Kathleen Parfitt. The complete drug reference, 32nd ed ., published by the pharmaceutical press, London SEI 7JN, UK , 1999:p. 1350.
5.Spiegel Murray R. Theories and problems of Statistics –Schaum’s Outline Series, 1st ed., McGraw Hill Book Company, 1961: 167-171.
6.Mendhan J, Denney RC.Vogel’s textbook of quantitative analysis- by, , 6th edition, Pearson education, 2002:128-132.
7.Montgomery Douglas C. Design and analysis of experiments, 4th ed., John Wiley & Sons Publication, 1996:20-56.
8.Montgomery Douglas C. Introduction to Statistical Quality Control, 3rd ed., John Wiley & Sons Publication; 1996:77-119.
9.Himmelblau David M. Process Analysis by statistical methods, 1st ed. John Wiley & Sons, Inc. Publication 1969: 49-94.
10.Box George E.P, Connor Lewis R., et al., ed. Davies. Owen L. The design and analysis of industrial experiments, 2nd ed. Published for Imperial Chemical Industries Limited by Oliver and Boyd, 1967:12-56.
11. Dux James P. Handbook of Quality Assurance for the analytical chemistry laboratory, Published by: Vannostrand Reinhold, 1986: 7-16.
Table No. 1: Comparison data for L-Lysine Hydrochloride using Model I
|
Sl. No. |
Assay Value with old method (mg/ 5ml) |
Assay Value with new method (mg/ 5ml) |
|
1 |
211.754 |
206.970 |
|
2 |
206.105 |
207.250 |
|
3 |
211.393 |
210.690 |
|
4 |
209.654 |
209.750 |
|
5 |
210.341 |
210.380 |
|
6 |
206.186 |
207.221 |
|
7 |
212.827 |
210.882 |
|
8 |
212.436 |
211.443 |
|
9 |
212.337 |
210.291 |
|
10 |
210.283 |
212.994 |
|
Std. Dev (S) |
2.434 |
2.017 |
|
Mean( ) |
210.332 |
209.787 |
Table No.2: Comparison data for L-Lysine Hydrochloride using Model II
|
Sl. No. |
BATCH NO. |
Value by old method (mg/5ml) (d1) |
Value by new method (mg/5ml) (d2) |
dj = d1- d2 |
|
1 |
x-20 |
215.940 |
209.570 |
6.370 |
|
2 |
x-21 |
206.410 |
206.574 |
-0.164 |
|
3 |
x-22 |
206.280 |
208.514 |
-2.234 |
|
4 |
x-23 |
206.610 |
212.532 |
-5.922 |
|
5 |
x-24 |
210.400 |
211.433 |
-1.033 |
|
6 |
x-25 |
208.904 |
210.437 |
-1.533 |
|
7 |
x-26 |
210.050 |
209.991 |
0.059 |
|
8 |
x-27 |
210.095 |
208.921 |
1.174 |
|
9 |
x-28 |
211.295 |
210.496 |
0.799 |
|
10 |
x-29 |
205.920 |
211.469 |
-5.549 |
|
d = (1/n) ∑ dj = |
-0.8033 |
|||
About Authors:
Tanushri Mukherjee, Debasis Bhattacharjee, and Arup Manna


paired t test
Dear Mrs. Mukherjee
Your paper rightly focuses on a very good and versatile statistical tool that is used in a number of pharmaceutical situations. I want to add that the paired t test is mostly used in situations where on the same subject/ animal at two time points the same parameter is determined. Good examples are the weight of animals before and after a special diet is given to them or cholesterol values of subjects before and after a particular drug is administered to them or marks of students before and after a special coaching is given to them. These are the usual examples in all statistics text books.
Vijaya Ratna
http://www.pharmainfo.net/vijayaratna/biography