Standard Practice for Statistical Analysis of One-Sample and Two-Sample Interlaboratory Proficiency Testing Programs

ABSTRACT
This practice describes methods for the statistical analysis of laboratory results obtained from interlaboratory proficiency testing programs. As in accordance with Practice E1301, proficiency testing is the use of inter-laboratory comparisons for the determination of laboratory testing or measurement performance. The methods provide direction for assessing and categorizing the performance of individual laboratories based on the relative likelihood of occurrence of their test results, and for determining estimates of testing variation associated with repeatability and reproducibility. Assumptions are that a majority of the participating laboratories execute the test method properly and that samples are of sufficient homogeneity that the testing results represent results obtained from each laboratory testing essentially the same material. Each laboratory receives the same instructions or protocol.
SIGNIFICANCE AND USE
5.1 This practice is specifically designed to describe simple robust statistical methods for use in proficiency testing programs.  
5.2 Proficiency testing programs can use the methods in this practice for the purpose of comparing testing results obtained from a group of participating laboratories. The practice describes evaluation of individual laboratory results using the interquartile range and Tukey inner and outer fences.  
5.3 In addition, the data obtained in proficiency testing programs may contain information regarding repeatability (within-lab) and reproducibility (between-lab) testing variation. Repeatability information is possible only if the program uses more than one sample. See Method B. Proficiency testing programs often have a greater number of participants than might be available for conducting an interlaboratory study to determine the precision of a test method (such as described in Practice E691). Precision estimates obtained for the larger number of participants in a proficiency testing program, along with the corresponding wider variation of test conditions, can provide useful information to standards developers regarding the precision of test results that can be expected for a test method when in actual use in the general testing community.  
5.4 To estimate the precision of a test method, the participants must use the same test method to obtain their test results, and testing must be performed under the conditions required for repeatability and reproducibility. The precision estimates are applicable to the property levels and material types included in the testing program. The precision of a test method may vary considerably for different material types and at different property levels.  
5.5 This practice may be useful to proficiency testing program administrators and provides examples of statistical methods along with explanations of some of the advantages of the suggested methods of analysis. The analyses resulting from the applicatio...
SCOPE
1.1 This practice describes methods for the statistical analysis of laboratory results obtained from interlaboratory proficiency testing programs. As in accordance with Practice E1301, proficiency testing is the use of interlaboratory comparisons for the determination of laboratory testing or measurement performance. Conversely, collaborative study (or collaborative trial) is the use of interlaboratory comparisons for the determination of the precision of a test method, as covered by Practice E691.  
1.1.1 Method A covers testing programs using single test results obtained by testing a single sample (each laboratory submits a single test result).  
1.1.2 Method B covers testing programs using paired test results obtained by testing two samples (each laboratory submits one test result for each of the two samples). The two samples should be of the same material or two materials similar enough to have approximately the same degree of variation in test results.  
1.2 Methods A and B are applicable to proficiency test...

General Information

Status
Published
Publication Date
30-Nov-2021
Technical Committee
E11 - Quality and Statistics

Relations

Effective Date
01-Apr-2022
Effective Date
01-Apr-2019
Effective Date
01-Oct-2017
Effective Date
01-Oct-2017
Effective Date
01-Jun-2016
Effective Date
01-Jun-2014
Effective Date
01-May-2014
Effective Date
15-Nov-2013
Effective Date
15-Nov-2013
Effective Date
15-Nov-2013
Effective Date
15-Nov-2013
Effective Date
01-Oct-2013
Effective Date
15-Aug-2013
Effective Date
01-May-2013
Effective Date
01-May-2013

Overview

ASTM E2489-21: Standard Practice for Statistical Analysis of One-Sample and Two-Sample Interlaboratory Proficiency Testing Programs is a foundational document published by ASTM International for laboratories that participate in proficiency testing. This practice outlines robust, easy-to-apply statistical methods for analyzing data generated from interlaboratory proficiency testing, which is used to evaluate the performance of laboratories and the precision of test methods.

Proficiency testing, as defined in this standard, involves comparing test results from multiple laboratories to assess both individual and collective accuracy. ASTM E2489-21 provides guidelines for assessing laboratory performance and estimating both repeatability (within-laboratory) and reproducibility (between-laboratory) variations using one-sample and two-sample test designs. The approach assumes that all participant labs follow the same protocol and analyze homogeneous samples.

Key Topics

  • One-Sample and Two-Sample Programs:
    • Method A: Each laboratory submits one result from a single sample.
    • Method B: Each laboratory submits results for two similar or essentially identical samples.
  • Robust Statistical Methods:
    • Use of the median as a consensus value.
    • Application of interquartile range (IQR) to estimate data spread, enhancing resistance to the influence of outliers.
    • Categorization of laboratory results via Tukey inner and outer fences (using 1.5×IQR and 3×IQR thresholds).
  • Evaluating Laboratory Performance:
    • Simple procedures to identify typical, unusual, and extremely unusual results.
    • Step-by-step approach for calculating and applying statistical measures to proficiency testing results.
  • Estimating Repeatability and Reproducibility:
    • Methods for estimating within-laboratory and between-laboratory standard deviations from proficiency test data.
    • Clear requirements for minimum numbers of participating labs (typically ten or more).
  • Visual Data Representation:
    • Use of tables, dot plots, and scatter diagrams to illustrate results and distribution patterns.

Applications

ASTM E2489-21 supports a broad range of scientific and industrial laboratory environments where interlaboratory proficiency testing is critical. Practical applications include:

  • Quality Assurance Programs:
    • Regular assessment of laboratory testing performance for compliance, accreditation, and continuous improvement.
  • Method Validation:
    • Determining if a testing method delivers consistent results across multiple labs to support decision-making on method adoption or improvement.
  • Accreditation Bodies:
    • Evaluation of laboratory performance as part of accreditation or certification requirements.
  • Standards Development:
    • Giving standards committees data-driven insights into actual test method performance under diverse conditions.

Testing program administrators, standards developers, and laboratory managers can use this practice to ensure their interlaboratory proficiency testing programs deliver statistically sound and actionable results.

Related Standards

ASTM E2489-21 references and complements several other important ASTM and international documents:

  • ASTM E1301: Guide for Proficiency Testing by Interlaboratory Comparisons (historical reference)
  • ASTM E691: Standard Practice for Conducting an Interlaboratory Study to Determine the Precision of a Test Method
  • ASTM E177: Standard Practice for Use of the Terms Precision and Bias in ASTM Test Methods
  • ASTM E178: Standard Practice for Dealing With Outlying Observations
  • ASTM E456: Terminology Relating to Quality and Statistics
  • ASTM E2586: Practice for Calculating and Using Basic Statistics

These related standards provide additional guidance on terminology, test method precision studies, and statistical analysis approaches for quality and bias evaluation.


Keywords: ASTM E2489-21, proficiency testing, interlaboratory comparison, statistical analysis, laboratory performance, repeatability, reproducibility, quality assurance, test method precision, Tukey fences, interquartile range, laboratory accreditation.

Buy Documents

Standard

ASTM E2489-21 - Standard Practice for Statistical Analysis of One-Sample and Two-Sample Interlaboratory Proficiency Testing Programs

English language (13 pages)
sale 15% off
sale 15% off
Standard

REDLINE ASTM E2489-21 - Standard Practice for Statistical Analysis of One-Sample and Two-Sample Interlaboratory Proficiency Testing Programs

English language (13 pages)
sale 15% off
sale 15% off

Get Certified

Connect with accredited certification bodies for this standard

ECOCERT

Organic and sustainability certification.

COFRAC France Verified

Eurofins Food Testing Global

Global leader in food, environment, and pharmaceutical product testing.

COFRAC Luxembourg Verified

Intertek Bangladesh

Intertek certification and testing services in Bangladesh.

BAB Bangladesh Verified

Sponsored listings

Frequently Asked Questions

ASTM E2489-21 is a standard published by ASTM International. Its full title is "Standard Practice for Statistical Analysis of One-Sample and Two-Sample Interlaboratory Proficiency Testing Programs". This standard covers: ABSTRACT This practice describes methods for the statistical analysis of laboratory results obtained from interlaboratory proficiency testing programs. As in accordance with Practice E1301, proficiency testing is the use of inter-laboratory comparisons for the determination of laboratory testing or measurement performance. The methods provide direction for assessing and categorizing the performance of individual laboratories based on the relative likelihood of occurrence of their test results, and for determining estimates of testing variation associated with repeatability and reproducibility. Assumptions are that a majority of the participating laboratories execute the test method properly and that samples are of sufficient homogeneity that the testing results represent results obtained from each laboratory testing essentially the same material. Each laboratory receives the same instructions or protocol. SIGNIFICANCE AND USE 5.1 This practice is specifically designed to describe simple robust statistical methods for use in proficiency testing programs. 5.2 Proficiency testing programs can use the methods in this practice for the purpose of comparing testing results obtained from a group of participating laboratories. The practice describes evaluation of individual laboratory results using the interquartile range and Tukey inner and outer fences. 5.3 In addition, the data obtained in proficiency testing programs may contain information regarding repeatability (within-lab) and reproducibility (between-lab) testing variation. Repeatability information is possible only if the program uses more than one sample. See Method B. Proficiency testing programs often have a greater number of participants than might be available for conducting an interlaboratory study to determine the precision of a test method (such as described in Practice E691). Precision estimates obtained for the larger number of participants in a proficiency testing program, along with the corresponding wider variation of test conditions, can provide useful information to standards developers regarding the precision of test results that can be expected for a test method when in actual use in the general testing community. 5.4 To estimate the precision of a test method, the participants must use the same test method to obtain their test results, and testing must be performed under the conditions required for repeatability and reproducibility. The precision estimates are applicable to the property levels and material types included in the testing program. The precision of a test method may vary considerably for different material types and at different property levels. 5.5 This practice may be useful to proficiency testing program administrators and provides examples of statistical methods along with explanations of some of the advantages of the suggested methods of analysis. The analyses resulting from the applicatio... SCOPE 1.1 This practice describes methods for the statistical analysis of laboratory results obtained from interlaboratory proficiency testing programs. As in accordance with Practice E1301, proficiency testing is the use of interlaboratory comparisons for the determination of laboratory testing or measurement performance. Conversely, collaborative study (or collaborative trial) is the use of interlaboratory comparisons for the determination of the precision of a test method, as covered by Practice E691. 1.1.1 Method A covers testing programs using single test results obtained by testing a single sample (each laboratory submits a single test result). 1.1.2 Method B covers testing programs using paired test results obtained by testing two samples (each laboratory submits one test result for each of the two samples). The two samples should be of the same material or two materials similar enough to have approximately the same degree of variation in test results. 1.2 Methods A and B are applicable to proficiency test...

ABSTRACT This practice describes methods for the statistical analysis of laboratory results obtained from interlaboratory proficiency testing programs. As in accordance with Practice E1301, proficiency testing is the use of inter-laboratory comparisons for the determination of laboratory testing or measurement performance. The methods provide direction for assessing and categorizing the performance of individual laboratories based on the relative likelihood of occurrence of their test results, and for determining estimates of testing variation associated with repeatability and reproducibility. Assumptions are that a majority of the participating laboratories execute the test method properly and that samples are of sufficient homogeneity that the testing results represent results obtained from each laboratory testing essentially the same material. Each laboratory receives the same instructions or protocol. SIGNIFICANCE AND USE 5.1 This practice is specifically designed to describe simple robust statistical methods for use in proficiency testing programs. 5.2 Proficiency testing programs can use the methods in this practice for the purpose of comparing testing results obtained from a group of participating laboratories. The practice describes evaluation of individual laboratory results using the interquartile range and Tukey inner and outer fences. 5.3 In addition, the data obtained in proficiency testing programs may contain information regarding repeatability (within-lab) and reproducibility (between-lab) testing variation. Repeatability information is possible only if the program uses more than one sample. See Method B. Proficiency testing programs often have a greater number of participants than might be available for conducting an interlaboratory study to determine the precision of a test method (such as described in Practice E691). Precision estimates obtained for the larger number of participants in a proficiency testing program, along with the corresponding wider variation of test conditions, can provide useful information to standards developers regarding the precision of test results that can be expected for a test method when in actual use in the general testing community. 5.4 To estimate the precision of a test method, the participants must use the same test method to obtain their test results, and testing must be performed under the conditions required for repeatability and reproducibility. The precision estimates are applicable to the property levels and material types included in the testing program. The precision of a test method may vary considerably for different material types and at different property levels. 5.5 This practice may be useful to proficiency testing program administrators and provides examples of statistical methods along with explanations of some of the advantages of the suggested methods of analysis. The analyses resulting from the applicatio... SCOPE 1.1 This practice describes methods for the statistical analysis of laboratory results obtained from interlaboratory proficiency testing programs. As in accordance with Practice E1301, proficiency testing is the use of interlaboratory comparisons for the determination of laboratory testing or measurement performance. Conversely, collaborative study (or collaborative trial) is the use of interlaboratory comparisons for the determination of the precision of a test method, as covered by Practice E691. 1.1.1 Method A covers testing programs using single test results obtained by testing a single sample (each laboratory submits a single test result). 1.1.2 Method B covers testing programs using paired test results obtained by testing two samples (each laboratory submits one test result for each of the two samples). The two samples should be of the same material or two materials similar enough to have approximately the same degree of variation in test results. 1.2 Methods A and B are applicable to proficiency test...

ASTM E2489-21 is classified under the following ICS (International Classification for Standards) categories: 71.040.40 - Chemical analysis. The ICS classification helps identify the subject area and facilitates finding related standards.

ASTM E2489-21 has the following relationships with other standards: It is inter standard links to ASTM E456-13a(2022)e1, ASTM E2586-19e1, ASTM E456-13A(2017)e1, ASTM E456-13A(2017)e3, ASTM E178-16, ASTM E2586-14, ASTM E177-14, ASTM E456-13a, ASTM E456-13ae1, ASTM E456-13ae3, ASTM E456-13ae2, ASTM E2586-13, ASTM E456-13, ASTM E177-13, ASTM E691-13. Understanding these relationships helps ensure you are using the most current and applicable version of the standard.

ASTM E2489-21 is available in PDF format for immediate download after purchase. The document can be added to your cart and obtained through the secure checkout process. Digital delivery ensures instant access to the complete standard document.

Standards Content (Sample)


This international standard was developed in accordance with internationally recognized principles on standardization established in the Decision on Principles for the
Development of International Standards, Guides and Recommendations issued by the World Trade Organization Technical Barriers to Trade (TBT) Committee.
Designation:E2489 −21 An American National Standard
Standard Practice for
Statistical Analysis of One-Sample and Two-Sample
Interlaboratory Proficiency Testing Programs
This standard is issued under the fixed designation E2489; the number immediately following the designation indicates the year of
original adoption or, in the case of revision, the year of last revision.Anumber in parentheses indicates the year of last reapproval.A
superscript epsilon (´) indicates an editorial change since the last revision or reapproval.
1. Scope* priate safety, health, and environmental practices and deter-
mine the applicability of regulatory limitations prior to use.
1.1 Thispracticedescribesmethodsforthestatisticalanaly-
1.5 This international standard was developed in accor-
sis of laboratory results obtained from interlaboratory profi-
dance with internationally recognized principles on standard-
ciency testing programs. As in accordance with Practice
ization established in the Decision on Principles for the
E1301, proficiency testing is the use of interlaboratory com-
Development of International Standards, Guides and Recom-
parisons for the determination of laboratory testing or mea-
mendations issued by the World Trade Organization Technical
surement performance. Conversely, collaborative study (or
Barriers to Trade (TBT) Committee.
collaborativetrial)istheuseofinterlaboratorycomparisonsfor
the determination of the precision of a test method, as covered
2. Referenced Documents
by Practice E691.
2.1 ASTM Standards:
1.1.1 Method A covers testing programs using single test
E177Practice for Use of the Terms Precision and Bias in
results obtained by testing a single sample (each laboratory
ASTM Test Methods
submits a single test result).
E178Practice for Dealing With Outlying Observations
1.1.2 Method B covers testing programs using paired test
E456Terminology Relating to Quality and Statistics
results obtained by testing two samples (each laboratory
E691Practice for Conducting an Interlaboratory Study to
submits one test result for each of the two samples). The two
Determine the Precision of a Test Method
samplesshouldbeofthesamematerialortwomaterialssimilar
E1301Guide for Proficiency Testing by Interlaboratory
enough to have approximately the same degree of variation in
Comparisons (Withdrawn 2012)
test results.
E2586Practice for Calculating and Using Basic Statistics
1.2 Methods A and B are applicable to proficiency testing
3. Terminology
programs containing a minimum of 10 participating laborato-
ries.
3.1 Definitions—Unlessotherwisenotedinthisstandard,all
terms relating to quality and statistics are defined in Terminol-
1.3 Themethodsprovidedirectionforassessingandcatego-
ogy E456.
rizing the performance of individual laboratories based on the
3.1.1 collaborative study, n—interlaboratory study in which
relative likelihood of occurrence of their test results, and for
each laboratory uses the defined method of analysis to analyze
determining estimates of testing variation associated with
identical portions of homogeneous materials to assess the
repeatability and reproducibility. Assumptions are that a ma-
performance characteristics obtained for that method of
jority of the participating laboratories execute the test method
analysis. Horwitz
properlyandthatsamplesareofsufficienthomogeneitythatthe
testing results represent results obtained from each laboratory
3.1.2 collaborative trial, n—see collaborative study.
testing essentially the same material. Each laboratory receives
3.1.3 interlaboratory comparison, n—organization,
the same instructions or protocol.
performance,andevaluationoftestsonthesameorsimilartest
1.4 This standard does not purport to address all of the
items by two or more laboratories in accordance with prede-
safety concerns, if any, associated with its use. It is the
termined conditions.
responsibility of the user of this standard to establish appro-
For referenced ASTM standards, visit the ASTM website, www.astm.org, or
contact ASTM Customer Service at service@astm.org. For Annual Book of ASTM
ThispracticeisunderthejurisdictionofASTMCommitteeE11onQualityand Standards volume information, refer to the standard’s Document Summary page on
Statistics and is the direct responsibility of Subcommittee E11.20 on Test Method the ASTM website.
Evaluation and Quality Control. The last approved version of this historical standard is referenced on
Current edition approved Dec. 1, 2021. Published December 2021. Originally www.astm.org.
approved in 2006. Last previous edition approved in 2016 as E2489–16. DOI: Horwitz, W., “Protocol for the Design, Conduct and Interpretation of Collab-
10.1520/E2489-21. orative Studies,” Pure and Applied Chemistry, Vol 60, No. 6, 1988, pp. 855–864.
*A Summary of Changes section appears at the end of this standard
Copyright © ASTM International, 100 Barr Harbor Drive, PO Box C700, West Conshohocken, PA 19428-2959. United States
E2489−21
th
˜
3.1.4 median, X,n—the 50 percentile in a population or 4.4 The upper and lower ends of the interquartile range are
sample. E2586 referredtoasthehinges.Thelimitsforcategorizinglaboratory
results lying outside of the interquartile range are determined
3.1.4.1 Discussion—The sample median is the [(n + 1)⁄2]
bymultiplyingtheextentoftheinterquartilerangebythefixed
order statistic if the sample size n is odd and is the average of
factors of 1.5 and 3.0. The upper and lower limits lying a
the [n/2] and [n/2 + 1] order statistics if n is even.
distance of 1.5 times the range of the IQR beyond the hinges
3.1.5 outlier, n—see outlying observation. E178
are referred to as the inner fences. The upper and lower limits
3.1.6 outlying observation, n—observation that appears to
for results lying at 3.0 times the range of the IQR beyond the
deviatemarkedlyinvaluefromothermembersofthesamplein
hinges are referred to as the outer fences.
which it appears. E178
4.5 Guidance is provided for proficiency testing programs
3.1.7 proficiency testing, n—determination of laboratory
wishing to establish additional limits (or fences). The user is
testing performance by means of interlaboratory comparisons.
directed to Guide E1301 for additional guidance.
3.1.8 repeatability standard deviation (S ), n—standard de-
r
4.6 When using the methods in this practice, the number of
viation of test results obtained under repeatability conditions.
participating laboratories should be at least ten. Since the
E177
degree of confidence is lower for analyses performed on small
3.1.9 reproducibility standard deviation (S ), n—standard
R
sample populations, caution should be used in applying statis-
deviation of test results obtained under reproducibility
tics obtained from small sample populations.
conditions. E177
4.7 When possible, it is generally desirable to have 30 or
3.2 Definitions of Terms Specific to This Standard:
more participants when estimating the precision of test meth-
3.2.1 hinge (upper or lower), n—median of the upper or
ods.
lowerhalfofasetofdatawhenthedataisarrangedinorderof
4.8 Estimatesoftherepeatabilitystandarddeviationandthe
size.
reproducibility standard deviation are determined by dividing
3.2.1.1 Discussion—When there is an odd number of items
the interquartile ranges of appropriate data sets by a factor of
in the data set, the middle value is included in both the upper
1.35.
and lower halves. The upper hinge is an estimate of the 75th
4.8.1 Thenumber1.35usedindeterminingtherepeatability
percentile;thelowerhingeisanestimateofthe25thpercentile.
and reproducibility standard deviations is based on an assump-
3.2.2 inner fence (upper or lower), n—value equal to the
tion of similarity to a normal distribution. Therefore, the
upperorlowerhingeofadatasetplus(upper)orminus(lower)
estimate of the standard deviation using the methods described
1.5 times the difference between upper and lower hinges.
in this practice may not supply the desired accuracy if the
3.2.3 interquartile range, n—distance between the upper
distribution differs too much from the general shape of a
and lower hinges of a data set.
normalcurve.Itisbeyondthescopeofthispracticetodescribe
3.2.4 outer fence (upper or lower), n—value equal to the procedures for determining when the analysis methods de-
scribed in this practice are not applicable.
upperorlowerhingeofadatasetplus(upper)orminus(lower)
three times the difference between upper and lower hinges.
5. Significance and Use
4. Summary of Practice
5.1 This practice is specifically designed to describe simple
4.1 This practice describes methods of displaying interlabo-
robust statistical methods for use in proficiency testing pro-
ratory data that visually show individual laboratory results. grams.
4.2 Themethodsdescribedinthispracticecanbeappliedto
5.2 Proficiencytestingprogramscanusethemethodsinthis
large and small sample populations from any distribution
practice for the purpose of comparing testing results obtained
expected to have a general mound shape. It is recommended
from a group of participating laboratories. The practice de-
thatincasesinwhichitissuspectedthatthedatamaybehighly
scribes evaluation of individual laboratory results using the
unsymmetrical or very unusual in some other manner a
interquartile range and Tukey inner and outer fences.
statistician should be consulted regarding the applicability of
5.3 In addition, the data obtained in proficiency testing
the analysis method.
programs may contain information regarding repeatability
4.2.1 The median is used as the “consensus” value of the
(within-lab) and reproducibility (between-lab) testing varia-
measured test property.
tion. Repeatability information is possible only if the program
4.2.2 The interquartile range (IQR) is used as the basis for
uses more than one sample. See Method B. Proficiency testing
estimating the spread in the data. Because the median and the
programs often have a greater number of participants than
interquartile range are not affected by the magnitude of
might be available for conducting an interlaboratory study to
extremevaluesofadataset,theanalysisapproachpresentedin
determine the precision of a test method (such as described in
thispracticeeffectivelyeliminatestheneedtoidentifyoutlying
Practice E691). Precision estimates obtained for the larger
observations (outliers).
number of participants in a proficiency testing program, along
4.3 Laboratory results are categorized according to how far with the corresponding wider variation of test conditions, can
the results lie outside of the interquartile range. provide useful information to standards developers regarding
E2489−21
TABLE 1 Original Data for a One-Sample Program
the precision of test results that can be expected for a test
method when in actual use in the general testing community. Lab Test Result
1 1.22
5.4 To estimate the precision of a test method, the partici-
2 1.62
pantsmustusethesametestmethodtoobtaintheirtestresults,
3 1.82
4 0.60
and testing must be performed under the conditions required
5 2.75
for repeatability and reproducibility. The precision estimates
6 1.55
are applicable to the property levels and material types
7 1.17
8 1.76
includedinthetestingprogram.Theprecisionofatestmethod
9 1.35
may vary considerably for different material types and at
10 1.18
different property levels.
11 1.19
12 1.71
5.5 This practice may be useful to proficiency testing
13 2.03
14 1.10
program administrators and provides examples of statistical
15 1.84
methods along with explanations of some of the advantages of
16 1.39
thesuggestedmethodsofanalysis.Theanalysesresultingfrom
17 1.13
18 1.66
the application of methods described in this practice may be
19 1.28
used by laboratories as part of their quality control procedures,
20 1.24
accrediting bodies to assist in the evaluation of laboratory
21 0.69
22 1.54
performance, and ASTM International technical committees
23 1.43
(and other organizations charged with the task of writing,
24 0.84
maintaining, or improving test methods) to obtain information
25 0.98
26 1.97
regarding reproducibility and repeatability.
27 4.89
28 1.85
5.6 Therearemanytypesofproficiencytestingprogramsin
29 1.09
existence and many methods exist for analyzing the data
30 1.07
resulting from the interlaboratory testing. It is not the intention
of this practice to call into question the integrity of programs
using other methods of analysis. Testing programs using
replicate testing of one or more samples (each laboratory
three times in the data, plot the test result value three times,
submits two or more results for each sample) are directed to
once with an occurrence of “one,” once with an occurrence of
PracticeE691orotherpracticesforthedescriptionofamethod
“two.” and once with an occurrence of “three.” The conse-
of analysis that may be more suitable to that type of program.
quence is that each laboratory’s test result will be plotted as an
individual dot and no dots will be concealed by being plotted
6. Analysis of a One-Sample Program (Method A)
on top of one another.
6.1 Display of Data:
6.1.3.1 Fig.1showsthedotdiagramforthedatainTable2.
6.1.1 When possible, display the data in a table to show the
There are no repeat values in the test results, so Column 3 of
actual results submitted by each laboratory. This may not be
Table2showsthatthenumberofoccurrencesis“one”foreach
practical if the number of participants is too large.
test result and the dots in Fig. 1 appear in a single horizontal
6.1.1.1 To assist in maintaining confidentiality, give each
row.ThedotdiagraminFig.1alsoshowsthatthetestresultfor
laboratory an identification number if one does not already
Laboratory 5, at (2.75, 1), is slightly removed from the rest of
exist.
the data. The test result for Laboratory 27, at (4.89, 1), is
6.1.1.2 List the laboratory results in increasing order by
farther removed.
laboratory identification number to make it easy to locate the
6.1.3.2 A dot diagram with a different appearance can be
results for a particular laboratory. See Table 1.
obtainedbyclassifyingtheresultsintomultiplecontiguoussize
6.1.2 Sort the laboratory results in decreasing order by test classes such that each class contains a portion of the data, but
resulttoshowtherangeanddistributionofthetestresults.See
together, the classes cover the entire data range. Table 3 shows
Table 2. Besides the laboratory identification number and thenumberofoccurrencesineachsizeclasswhentherangeof
corresponding test results, Table 2 contains columns of addi-
each class is 0.10. When the numbers of occurrences in each
tional information that will be explained in the following size class are plotted versus the corresponding values of the
sections of this practice.
lower ends of each size class (see Fig. 2), the display has the
6.1.3 Display the data in a dot diagram to show the location advantageofbeingmorecompact,anditismoreapparenthow
of each laboratory’s test result in the distribution of all test test results are clustered. The dot diagram in Fig. 2 still shows
results.Foreachtestresult,plotoccurrencenumberofthattest thatthetestresultforLaboratory5isslightlyremovedfromthe
result value versus the value of the test result. As points are rest of the data and that the test result for Laboratory 27 is
plotted from the top of Table 2 to the bottom, the first time a farther removed.
test value occurs assign it an occurrence of “one.” The next 6.1.3.3 Other ranges for the size classes are permitted to be
time that test result value occurs, assign it an occurrence of used to classify the test results. For example, each size class
“two.” If the test result value appears a third time, assign it an could have a range of 0.20 or 0.05. The corresponding dot
occurrenceof“three”andsoforth.Ifatestresultvalueappears diagrams will each have a different appearance.
E2489−21
TABLE 2 Data in Descending Order for One-Sample Program
single point on the diagram. For example, Fig. 2 is similar in
Test Number of appearance to a histogram, but a typical histogram does not
Count of Labs Lab Category
Result Occurrences
show individual data points. Another example is a stem-and-
27 4.89 1 Extremely Unusual
leaf plot.
5 2.75 1 Unusual
13 2.03 1 Typical
6.2 Steps for Evaluating Laboratory Performance:
26 1.97 1 Typical
6.2.1 Visually examine the dot plot (or graphic of the data)
28 1.85 1 Typical
15 1.84 1 Typical
toconfirmthatthedistributionisapproximatelymoundshaped
3 1.82 1 Typical
and unimodal. If either condition is not met, the analysis
8th from Top 8 1.76 1 Typical
prescribed may not be appropriate. See 4.2.
12 1.71 1 Typical
18 1.66 1 Typical
6.2.2 The steps for evaluating a laboratory’s performance
2 1.62 1 Typical
are to determine the median and interquartile range (IQR),
6 1.55 1 Typical
22 1.54 1 Typical locate the inner and outer fences, and then categorize the
23 1.43 1 Typical
laboratories according to where their results lie relative to the
15th from Top 16 1.39 1 Typical
fences.
16th from Top 9 1.35 1 Typical
19 1.28 1 Typical
6.2.3 The method for determining the median depends on
20 1.24 1 Typical
whether there is an odd or even number of results in the data
1 1.22 1 Typical
11 1.19 1 Typical set.
10 1.18 1 Typical
6.2.3.1 Sort the data set into ascending or descending order.
7 1.17 1 Typical
If there is an odd number of results in the data set, after the
8th from Bottom 17 1.13 1 Typical
14 1.10 1 Typical
results are placed in ascending or decreasing order, the median
29 1.09 1 Typical
isthemiddlenumberofthedataset.Forexample,considerthe
30 1.07 1 Typical
five results in the data set 9, 1, 5, 4, 5. When placed in
25 0.98 1 Typical
24 0.84 1 Typical
ascending order, the result is 1, 4, 5, 5, 9. The middle number,
21 0.69 1 Typical
ormedian,istheunderlined5.Itdoesnotmatterthatoneofthe
4 0.60 1 Typical
numbers is repeated.
Shown Below Is Determination of “Fences” for Data Above
6.2.3.2 If there is an even number of results in the data set,
after the results are placed in ascending or descending order,
Median of All Test Results = 1.37
Upper hinge (Median of Top Half) = 1.76 the median is the average of the middle two numbers in the
Lower Hinge (Median of Bottom Half) = 1.13
data set. For example, consider the eight results in the data set
Interquartile Range (IQR) = (1.76 – 1.13) = 0.63
2,8,5,11,4,6,9,4.Whenplacedinascendingorder,theresult
(3×IQR)=1.89
is 2, 4, 4, 5, 6, 8, 9, 11. The middle two numbers are 5 and 6.
Outer Fence (Upper) = (1.76 + 1.89) = 3.65
The average is (5 + 6)/2 or 5.5, so the median is 5.5.
Outer Fence (Lower) = (1.13 – 1.89) = –0.76
6.2.4 The method for determining the interquartile range is
(1.5 × IQR) = 0.945
to determine the middle number (or median) of the top and
Inner Fence (Upper) = (1.76 + 0.945) = 2.705
bottom halves of the data set.
Inner Fence (Lower) = (1.13 – 0.945) = 0.185
6.2.4.1 If there are an odd number of results in the data set,
Reproducibility Standard Deviation = (IQR / 1.35) =
themedianoftheentiredatasetisincludedinbothhalves.For
0.467
example, consider again the data set 1, 4, 5, 5, 9. The
underlined5isincludedinbothhalves.So,themiddlenumber
(or median) of the top half of the data set, 5, 5, 9, is 5. The
6.1.3.4 The range of the size classes used for grouping the
medianofthetophalfofthedatasetisreferredtoastheupper
laboratory test results should be chosen carefully to show as
hinge.Themiddlenumber(ormedian)ofthebottomhalfofthe
much information (regarding individual laboratory test results
dataset,1,4,5,is4.Themedianofthebottomhalfofthedata
andtheoveralldistributionofthetestresults)aspossibleinthe
set is referred to as the lower hinge.
dot diagram. One consideration should be the number of test
6.2.4.2 The IQR is the range from the upper hinge (the
results that must be plotted. Generally, it is desirable to limit
median of the top half of the data set) to the lower hinge (the
the number of classes to be plotted along the x-axis of the dot
median of the bottom half of the data set).
diagram. For larger data sets, the range of each of the classes
6.2.4.3 SincetheIQRofthedataset1,4,5,5,9,istherange
must be wider to contain a larger number of test results.
from the upper hinge, 5, to the lower hinge, 4, the IQR is (5 –
Another consideration should be the overall range of the test
4), or 1.
results in the data set. All size classes should have the same
width and each size class must be sufficiently wide to limit the 6.2.4.4 If there is an even number of results in the data set,
number of classes to be plotted along the x-axis of the dot the data set is simply divided into a top half and a bottom half,
diagram. each containing an equal number of test results. For example,
6.1.3.5 Various computer software programs can be used to considerthedataset2,4,4,5,6,8,9,11.Thetophalfcontains
generate similar types of diagrams. When other types of 6, 8, 9, 11 and the median (or upper hinge) is the average of 8
diagrams are used, it is generally preferable to choose one in and 9, or 8.5. The bottom half contains 2, 4, 4, 5 and the
which each individual laboratory’s result is displayed as a median (or lower hinge) is the average of 4 and 4, or 4.
E2489−21
FIG. 1Dot Diagram for Original Data
TABLE 3 Data Classified by Tenths
have an extremely low likelihood of occurrence. Laboratory
Size Class Range results occurring beyond the outer fence are categorized as
Test Number of
Lab
Lower Upper “extremely unusual.”
Result Occurrences
End End
6.2.6 The inner fence is located 1.5 times the range of the
27 4.89 4.80 # X < 4.90 1
IQR (1.5 × IQR) beyond the upper and lower hinges.
5 2.75 2.70 # X < 2.80 1
13 2.03 2.00 # X < 2.10 1
InnerFence Upper 5 UpperHinge 1 1.5 3IQR (3)
~ ! ~ ! ~ !
26 1.97 1.90 # X < 2.00 1
InnerFence ~Lower! 5 ~LowerHinge! 2 ~1.5 3IQR! (4)
28 1.85 1.80 # X < 1.90 1
15 1.84 1.80 # X < 1.90 2
6.2.6.1 Laboratory test results lying beyond the inner fence,
3 1.82 1.80 # X < 1.90 3
8 1.76 1.70 # X < 1.80 1 butwithintheouterfence,havealowprobabilityofoccurrence
12 1.71 1.70 # X < 1.80 2
when testing is properly performed in accordance with the
18 1.66 1.60 # X < 1.70 1
prescribed testing protocol. Laboratory results occurring be-
2 1.62 1.60 # X < 1.70 2
6 1.55 1.50 # X < 1.60 1
yond the inner fence but within the outer fence are categorized
22 1.54 1.50 # X < 1.60 2
as “unusual.”
23 1.43 1.40 # X < 1.50 1
6.2.7 Mostofthetestresultswillfallwithintheinnerfence.
16 1.39 1.30 # X < 1.40 1
9 1.35 1.30 # X < 1.40 2 Laboratory test results falling at or within the inner fence are
19 1.28 1.20 # X < 1.30 1
categorized as “typical.”
20 1.24 1.20 # X < 1.30 2
6.2.8 If desired, other limits or fences can be used. Table 4
1 1.22 1.20 # X < 1.30 3
11 1.19 1.10 # X < 1.20 1 suggests several intervals that could be used to establish other
10 1.18 1.10 # X < 1.20 2
fences and gives the probabilities for results lying outside of
7 1.17 1.10 # X < 1.20 3
each of the intervals listed in the table.
17 1.13 1.10 # X < 1.20 4
14 1.10 1.10 # X < 1.20 5
6.3 Example for Evaluating Laboratory Performance Using
29 1.09 1.00 # X < 1.10 1
the Data in Table 2:
30 1.07 1.00 # X < 1.10 2
25 0.98 0.90 # X < 1.00 1
6.3.1 Table 2 shows test results for 30 laboratories, an even
24 0.84 0.80 # X < 0.90 1
number of results, in descending order by test result. The
21 0.69 0.60 # X < 0.70 1
median of the data set is the average of the results for the 15th
4 0.60 0.60 # X < 0.70 2
and 16th laboratories from the top of the table. The 15th and
16th laboratories are #16 and #9.The median is the average of
6.2.4.5 Since the IQR of the data set 2, 4, 4, 5, 6, 8, 9, 11,
the results, (1.39 + 1.35)/2 or 1.37. See the analysis at the
istherangefromtheupperhinge,8.5,tothelowerhinge,4,the
bottom of Table 2.
IQR is (8.5 – 4), or 4.5.
6.3.2 Thereare15resultsinthetophalfofthedatainTable
6.2.5 OncetheIQRisdetermined,theouterfenceislocated
2 and 15 in the bottom half. The middle (or median) value of
three times the range of the IQR, (3 × IQR), beyond the upper
the top half is the eighth test result from the top, 1.76. This
and lower hinges. See Fig. 3 and guidance provided in 4.8.1.
value, 1.76, is referred to as the upper hinge. The middle (or
median)ofthebottomhalfistheeighthresultfromthebottom,
OuterFence ~Upper! 5 ~UpperHinge!1~3 3IQR! (1)
1.13. This value, 1.13, is referred to as the lower hinge.
OuterFence ~Lower! 5 ~LowerHinge! 2 ~3 3IQR! (2)
6.3.3 TheIQRistherangefromtheupperhinge(medianof
6.2.5.1 For testing performed in strict accordance with a the top half) to the lower hinge (median of the bottom half),
testing protocol, laboratory test results beyond the outer fence (1.76 – 1.13), or 0.63.
E2489−21
FIG. 2Dot Diagram—Data Classified by Tenths
FIG. 3Explanation of Hinges, Fences, and Categories
TABLE 4 Alternative Intervals for Fences
6.3.4 Since the outer fence is located three times the IQR
Interval
Suggested
Beyond Approx. Number of Approx. Two-Tailed
beyondthehinges,theouterfencefortheupperendofthedata
Descriptive
the Upper Standard Deviations Probability for
set is located at [1.76 + (3 × IQR)], or [1.76 + (3 × 0.63)], or
Label for Results
and from the Consensus Results Outside
Occurring Outside
A A
3.65.Theouterfenceforthelowerendofthedatasetislocated
Lower Value (Median) of the Interval
of the Interval
Hinges
at [1.13 – (3 × 0.63)], or –0.76.
1.0 IQR 2 0.04 typical
6.3.5 Test results greater than 3.65 or less than –0.76 are
1.5 IQR 2.7 0.007 unusual
categorized as “extremely unusual.” Only one test result, 4.89,
2.0 IQR 3.37 0.0008 very unusual
is beyond the outer fence. That test result, for laboratory #27,
3.0 IQR 4.725 0.000 002 extremely unusual
is greater than 3.65 and is categorized as “extremely unusual.” A
The number of standard deviations from the consensus value and the probabili-
ties for being outside of the intervals are based on the assumption of a normal
SeeColumn5ofTable2.Therearenotestresultsbelow–0.76.
distribution. The probabilities may vary for distributions that cannot be approxi-
6.3.6 The inner fence is located 1.5 times the IQR beyond
mated by a normal distribution.
the hinges. The inner fence for the upper end of the data set is
located at [1.76 + (1.5 × 0.63)], or 2.705. The inner fence for
the lower end of the data set is located at [1.13 – (1.5 × 0.63)], 6.3.7 On the upper end of the data, test results lying beyond
or 0.185. the inner fence, but not beyond the outer fence (greater than
E2489−21
TABLE 5 Original Data for a Two-Sample Program
2.705, but less than or equal to 3.65) are categorized as
“unusual.” Test result 2.75, for laboratory #5, falls into that Sample X Sample Y
Lab
Test Result Test Result
range and is categorized as “unusual.” Correspondingly, at the
1 1.22 1.26
lowerendofthedataset,testresultslessthan0.185andgreater
2 1.62 1.91
than or equal to –0.76 are also categ
...


This document is not an ASTM standard and is intended only to provide the user of an ASTM standard an indication of what changes have been made to the previous version. Because
it may not be technically possible to adequately depict all changes accurately, ASTM recommends that users consult prior editions as appropriate. In all cases only the current version
of the standard as published by ASTM is to be considered the official document.
Designation: E2489 − 16 E2489 − 21 An American National Standard
Standard Practice for
Statistical Analysis of One-Sample and Two-Sample
Interlaboratory Proficiency Testing Programs
This standard is issued under the fixed designation E2489; the number immediately following the designation indicates the year of
original adoption or, in the case of revision, the year of last revision. A number in parentheses indicates the year of last reapproval. A
superscript epsilon (´) indicates an editorial change since the last revision or reapproval.
1. Scope Scope*
1.1 This practice describes methods for the statistical analysis of laboratory results obtained from interlaboratory proficiency
testing programs. As in accordance with Practice E1301, proficiency testing is the use of interlaboratory comparisons for the
determination of laboratory testing or measurement performance. Conversely, collaborative study (or collaborative trial) is the use
of interlaboratory comparisons for the determination of the precision of a test method, as covered by Practice E691.
1.1.1 Method A covers testing programs using single test results obtained by testing a single sample (each laboratory submits a
single test result).
1.1.2 Method B covers testing programs using paired test results obtained by testing two samples (each laboratory submits one
test result for each of the two samples). The two samples should be of the same material or two materials similar enough to have
approximately the same degree of variation in test results.
1.2 Methods A and B are applicable to proficiency testing programs containing a minimum of 10 participating laboratories.
1.3 The methods provide direction for assessing and categorizing the performance of individual laboratories based on the relative
likelihood of occurrence of their test results, and for determining estimates of testing variation associated with repeatability and
reproducibility. Assumptions are that a majority of the participating laboratories execute the test method properly and that samples
are of sufficient homogeneity that the testing results represent results obtained from each laboratory testing essentially the same
material. Each laboratory receives the same instructions or protocol.
1.4 This standard does not purport to address all of the safety concerns, if any, associated with its use. It is the responsibility
of the user of this standard to establish appropriate safety safety, health, and healthenvironmental practices and determine the
applicability of regulatory limitations prior to use.
1.5 This international standard was developed in accordance with internationally recognized principles on standardization
established in the Decision on Principles for the Development of International Standards, Guides and Recommendations issued
by the World Trade Organization Technical Barriers to Trade (TBT) Committee.
2. Referenced Documents
2.1 ASTM Standards:
This practice is under the jurisdiction of ASTM Committee E11 on Quality and Statistics and is the direct responsibility of Subcommittee E11.20 on Test Method
Evaluation and Quality Control.
Current edition approved Nov. 15, 2016Dec. 1, 2021. Published November 2016December 2021. Originally approved in 2006. Last previous edition approved in 20112016
as E2489 – 11.E2489 – 16. DOI: 10.1520/E2489-16.10.1520/E2489-21.
For referenced ASTM standards, visit the ASTM website, www.astm.org, or contact ASTM Customer Service at service@astm.org. For Annual Book of ASTM Standards
volume information, refer to the standard’s Document Summary page on the ASTM website.
*A Summary of Changes section appears at the end of this standard
Copyright © ASTM International, 100 Barr Harbor Drive, PO Box C700, West Conshohocken, PA 19428-2959. United States
E2489 − 21
E177 Practice for Use of the Terms Precision and Bias in ASTM Test Methods
E178 Practice for Dealing With Outlying Observations
E456 Terminology Relating to Quality and Statistics
E691 Practice for Conducting an Interlaboratory Study to Determine the Precision of a Test Method
E1301 Guide for Proficiency Testing by Interlaboratory Comparisons (Withdrawn 2012)
E2586 Practice for Calculating and Using Basic Statistics
3. Terminology
3.1 Definitions—The terminology defined in Terminology—Unless otherwise noted in this E456 applies to this practice unless
modified herein.standard, all terms relating to quality and statistics are defined in Terminology E456.
3.1.1 collaborative study, n—interlaboratory study in which each laboratory uses the defined method of analysis to analyze
identical portions of homogeneous materials to assess the performance characteristics obtained for that method of analysis.
Horwitz
3.1.2 collaborative trial, n—see collaborative study.
3.1.3 interlaboratory comparison, n—organization, performance, and evaluation of tests on the same or similar test items by two
or more laboratories in accordance with predetermined conditions.
th
3.1.4 median, X˜, n—the 50 percentile in a population or sample. E2586
3.1.4.1 Discussion—
The sample median is the [(n + 1) ⁄2] order statistic if the sample size n is odd and is the average of the [n/2] and [n/2 + 1] order
statistics if n is even.
3.1.5 outlier, n—see outlying observation. E178
3.1.6 outlying observation, n—observation that appears to deviate markedly in value from other members of the sample in which
it appears. E178
3.1.7 proficiency testing, n—determination of laboratory testing performance by means of interlaboratory comparisons.
3.1.8 repeatability standard deviation (S ), n—standard deviation of test results obtained under repeatability conditions. E177
r
3.1.9 reproducibility standard deviation (S ), n—standard deviation of test results obtained under reproducibility conditions. E177
R
3.2 Definitions of Terms Specific to This Standard:
3.2.1 hinge (upper or lower), n—median of the upper or lower half of a set of data when the data is arranged in order of size.
3.2.1.1 Discussion—
When there is an odd number of items in the data set, the middle value is included in both the upper and lower halves. The upper
hinge is an estimate of the 75th percentile; the lower hinge is an estimate of the 25th percentile.
3.2.2 inner fence (upper or lower), n—value equal to the upper or lower hinge of a data set plus (upper) or minus (lower) 1.5 times
the difference between upper and lower hinges.
3.2.3 interquartile range, n—distance between the upper and lower hinges of a data set.
3.2.4 outer fence (upper or lower), n—value equal to the upper or lower hinge of a data set plus (upper) or minus (lower) three
times the difference between upper and lower hinges.
4. Summary of Practice
4.1 This practice describes methods of displaying interlaboratory data that visually show individual laboratory results.
The last approved version of this historical standard is referenced on www.astm.org.
Horwitz, W., “Protocol for the Design, Conduct and Interpretation of Collaborative Studies,” Pure and Applied Chemistry, Vol 60, No. 6, 1988, pp. 855–864.
E2489 − 21
4.2 The methods described in this practice can be applied to large and small sample populations from any distribution expected
to have a general mound shape. It is recommended that in cases in which it is suspected that the data may be highly unsymmetrical
or very unusual in some other manner a statistician should be consulted regarding the applicability of the analysis method.
4.2.1 The median is used as the “consensus” value of the measured test property.
4.2.2 The interquartile range (IQR) is used as the basis for estimating the spread in the data. Because the median and the
interquartile range are not affected by the magnitude of extreme values of a data set, the analysis approach presented in this practice
effectively eliminates the need to identify outlying observations (outliers).
4.3 Laboratory results are categorized according to how far the results lie outside of the interquartile range.
4.4 The upper and lower ends of the interquartile range are referred to as the hinges. The limits for categorizing laboratory results
lying outside of the interquartile range are determined by multiplying the extent of the interquartile range by the fixed factors of
1.5 and 3.0. The upper and lower limits lying a distance of 1.5 times the range of the IQR beyond the hinges are referred to as
the inner fences. The upper and lower limits for results lying at 3.0 times the range of the IQR beyond the hinges are referred to
as the outer fences.
4.5 Guidance is provided for proficiency testing programs wishing to establish additional limits (or fences). The user is directed
to Guide E1301 for additional guidance.
4.6 When using the methods in this practice, the number of participating laboratories should be at least ten. Since the degree of
confidence is lower for analyses performed on small sample populations, caution should be used in applying statistics obtained
from small sample populations.
4.7 When possible, it is generally desirable to have 30 or more participants when estimating the precision of test methods.
4.8 Estimates of the repeatability standard deviation and the reproducibility standard deviation are determined by dividing the
interquartile ranges of appropriate data sets by a factor of 1.35.
4.8.1 The number 1.35 used in determining the repeatability and reproducibility standard deviations is based on an assumption of
similarity to a normal distribution. Therefore, the estimate of the standard deviation using the methods described in this practice
may not supply the desired accuracy if the distribution differs too much from the general shape of a normal curve. It is beyond
the scope of this practice to describe procedures for determining when the analysis methods described in this practice are not
applicable.
5. Significance and Use
5.1 This practice is specifically designed to describe simple robust statistical methods for use in proficiency testing programs.
5.2 Proficiency testing programs can use the methods in this practice for the purpose of comparing testing results obtained from
a group of participating laboratories. The laboratory comparisons can then be used for practice describes evaluation of individual
laboratory performance.results using the interquartile range and Tukey inner and outer fences.
5.3 In addition, the data obtained in proficiency testing programs may contain information regarding repeatability (within-lab) and
reproducibility (between-lab) testing variation. Repeatability information is possible only if the program uses more than one
sample. See Method B. Proficiency testing programs often have a greater number of participants than might be available for
conducting an interlaboratory study to determine the precision of a test method (such as described in Practice E691). Precision
estimates obtained for the larger number of participants in a proficiency testing program, along with the corresponding wider
variation of test conditions, can provide useful information to standards developers regarding the precision of test results that can
be expected for a test method when in actual use in the general testing community.
5.4 To estimate the precision of a test method, the participants must use the same test method to obtain their test results, and testing
must be performed under the conditions required for repeatability and reproducibility. The precision estimates are applicable to the
E2489 − 21
TABLE 1 Original Data for a One-Sample Program
Lab Test Result
1 1.22
2 1.62
3 1.82
4 0.60
5 2.75
6 1.55
7 1.17
8 1.76
9 1.35
10 1.18
11 1.19
12 1.71
13 2.03
14 1.10
15 1.84
16 1.39
17 1.13
18 1.66
19 1.28
20 1.24
21 0.69
22 1.54
23 1.43
24 0.84
25 0.98
26 1.97
27 4.89
28 1.85
29 1.09
30 1.07
property levels and material types included in the testing program. The precision of a test method may vary considerably for
different material types and at different property levels.
5.5 This practice may be useful to proficiency testing program administrators and provides examples of statistical methods along
with explanations of some of the advantages of the suggested methods of analysis. The analyses resulting from the application of
methods described in this practice may be used by laboratories as part of their quality control procedures, accrediting bodies to
assist in the evaluation of laboratory performance, and ASTM International technical committees (and other organizations charged
with the task of writing, maintaining, or improving test methods) to obtain information regarding reproducibility and repeatability.
5.6 There are many types of proficiency testing programs in existence and many methods exist for analyzing the data resulting
from the interlaboratory testing. It is not the intention of this practice to call into question the integrity of programs using other
methods of analysis. Testing programs using replicate testing of one or more samples (each laboratory submits two or more results
for each sample) are directed to Practice E691 or other practices for the description of a method of analysis that may be more
suitable to that type of program.
6. Analysis of a One-Sample Program (Method A)
6.1 Display of Data:
6.1.1 When possible, display the data in a table to show the actual results submitted by each laboratory. This may not be practical
if the number of participants is too large.
6.1.1.1 To assist in maintaining confidentiality, give each laboratory an identification number if one does not already exist.
6.1.1.2 List the laboratory results in increasing order by laboratory identification number to make it easy to locate the results for
a particular laboratory. See Table 1.
6.1.2 Sort the laboratory results in decreasing order by test result to show the range and distribution of the test results. See Table
2. Besides the laboratory identification number and corresponding test results, Table 2 contains columns of additional information
that will be explained in the following sections of this practice.
E2489 − 21
TABLE 2 Data in Descending Order for One-Sample Program
Test Number of
Count of Labs Lab Category
Result Occurrences
27 4.89 1 Extremely Unusual
5 2.75 1 Unusual
13 2.03 1 Typical
26 1.97 1 Typical
28 1.85 1 Typical
15 1.84 1 Typical
3 1.82 1 Typical
8th from Top 8 1.76 1 Typical
12 1.71 1 Typical
18 1.66 1 Typical
2 1.62 1 Typical
6 1.55 1 Typical
22 1.54 1 Typical
23 1.43 1 Typical
15th from Top 16 1.39 1 Typical
16th from Top 9 1.35 1 Typical
19 1.28 1 Typical
20 1.24 1 Typical
1 1.22 1 Typical
11 1.19 1 Typical
10 1.18 1 Typical
7 1.17 1 Typical
8th from Bottom 17 1.13 1 Typical
14 1.10 1 Typical
29 1.09 1 Typical
30 1.07 1 Typical
25 0.98 1 Typical
24 0.84 1 Typical
21 0.69 1 Typical
4 0.60 1 Typical
Shown Below Is Determination of “Fences” for Data Above
Median of All Test Results = 1.37
Upper hinge (Median of Top Half) = 1.76
Lower Hinge (Median of Bottom Half) = 1.13
Interquartile Range (IQR) = (1.76 – 1.13) = 0.63
(3 × IQR) = 1.89
Outer Fence (Upper) = (1.76 + 1.89) = 3.65
Outer Fence (Lower) = (1.13 – 1.89) = –0.76
(1.5 × IQR) = 0.945
Inner Fence (Upper) = (1.76 + 0.945) = 2.705
Inner Fence (Lower) = (1.13 – 0.945) = 0.185
Reproducibility Standard Deviation = (IQR / 1.35) =
0.467
6.1.3 Display the data in a dot diagram to show the location of each laboratory’s test result in the distribution of all test results.
For each test result, plot occurrence number of that test result value versus the value of the test result. As points are plotted from
the top of Table 2 to the bottom, the first time a test value occurs assign it an occurrence of “one.” The next time that test result
value occurs, assign it an occurrence of “two.” If the test result value appears a third time, assign it an occurrence of “three” and
so forth. If a test result value appears three times in the data, plot the test result value three times, once with an occurrence of “one,”
once with an occurrence of “two.” and once with an occurrence of “three.” The consequence is that each laboratory’s test result
will be plotted as an individual dot and no dots will be concealed by being plotted on top of one another.
6.1.3.1 Fig. 1 shows the dot diagram for the data in Table 2. There are no repeat values in the test results, so Column 3 of Table
2 shows that the number of occurrences is “one” for each test result and the dots in Fig. 1 appear in a single horizontal row. The
dot diagram in Fig. 1 also shows that the test result for Laboratory 5, at (2.75, 1), is slightly removed from the rest of the data.
The test result for Laboratory 27, at (4.89, 1), is farther removed.
6.1.3.2 A dot diagram with a different appearance can be obtained by classifying the results into multiple contiguous size classes
such that each class contains a portion of the data, but together, the classes cover the entire data range. Table 3 shows the number
of occurrences in each size class when the range of each class is 0.10. When the numbers of occurrences in each size class are
plotted versus the corresponding values of the lower ends of each size class (see Fig. 2), the display has the advantage of being
E2489 − 21
FIG. 1 Dot Diagram for Original Data
TABLE 3 Data Classified by Tenths
Size Class Range
Test Number of
Lab
Lower Upper
Result Occurrences
End End
27 4.89 4.80 # X < 4.90 1
5 2.75 2.70 # X < 2.80 1
13 2.03 2.00 # X < 2.10 1
26 1.97 1.90 # X < 2.00 1
28 1.85 1.80 # X < 1.90 1
15 1.84 1.80 # X < 1.90 2
3 1.82 1.80 # X < 1.90 3
8 1.76 1.70 # X < 1.80 1
12 1.71 1.70 # X < 1.80 2
18 1.66 1.60 # X < 1.70 1
2 1.62 1.60 # X < 1.70 2
6 1.55 1.50 # X < 1.60 1
22 1.54 1.50 # X < 1.60 2
23 1.43 1.40 # X < 1.50 1
16 1.39 1.30 # X < 1.40 1
9 1.35 1.30 # X < 1.40 2
19 1.28 1.20 # X < 1.30 1
20 1.24 1.20 # X < 1.30 2
1 1.22 1.20 # X < 1.30 3
11 1.19 1.10 # X < 1.20 1
10 1.18 1.10 # X < 1.20 2
7 1.17 1.10 # X < 1.20 3
17 1.13 1.10 # X < 1.20 4
14 1.10 1.10 # X < 1.20 5
29 1.09 1.00 # X < 1.10 1
30 1.07 1.00 # X < 1.10 2
25 0.98 0.90 # X < 1.00 1
24 0.84 0.80 # X < 0.90 1
21 0.69 0.60 # X < 0.70 1
4 0.60 0.60 # X < 0.70 2
more compact, and it is more apparent how test results are clustered. The dot diagram in Fig. 2 still shows that the test result for
Laboratory 5 is slightly removed from the rest of the data and that the test result for Laboratory 27 is farther removed.
6.1.3.3 Other ranges for the size classes are permitted to be used to classify the test results. For example, each size class could
have a range of 0.20 or 0.05. The corresponding dot diagrams will each have a different appearance.
6.1.3.4 The range of the size classes used for grouping the laboratory test results should be chosen carefully to show as much
information (regarding individual laboratory test results and the overall distribution of the test results) as possible in the dot
diagram. One consideration should be the number of test results that must be plotted. Generally, it is desirable to limit the number
of classes to be plotted along the x-axis of the dot diagram. For larger data sets, the range of each of the classes must be wider
E2489 − 21
FIG. 2 Dot Diagram—Data Classified by Tenths
to contain a larger number of test results. Another consideration should be the overall range of the test results in the data set. All
size classes should have the same width and each size class must be sufficiently wide to limit the number of classes to be plotted
along the x-axis of the dot diagram.
6.1.3.5 Various computer software programs can be used to generate similar types of diagrams. When other types of diagrams are
used, it is generally preferable to choose one in which each individual laboratory’s result is displayed as a single point on the
diagram. For example, Fig. 2 is similar in appearance to a histogram, but a typical histogram does not show individual data points.
Another example is a stem-and-leaf plot.
6.2 Steps for Evaluating Laboratory Performance:
6.2.1 Visually examine the dot plot (or graphic of the data) to confirm that the distribution is approximately mound shaped and
unimodal. If either condition is not met, the analysis prescribed may not be appropriate. See 4.2.
6.2.2 The steps for evaluating a laboratory’s performance are to determine the median and interquartile range (IQR), locate the
inner and outer fences, and then categorize the laboratories according to where their results lie relative to the fences.
6.2.3 The method for determining the median depends on whether there is an odd or even number of results in the data set.
6.2.3.1 Sort the data set into ascending or descending order. If there is an odd number of results in the data set, after the results
are placed in ascending or decreasing order, the median is the middle number of the data set. For example, consider the five results
in the data set 9, 1, 5, 4, 5. When placed in ascending order, the result is 1, 4, 5, 5, 9. The middle number, or median, is the
underlined 5. It does not matter that one of the numbers is repeated.
6.2.3.2 If there is an even number of results in the data set, after the results are placed in ascending or descending order, the median
is the average of the middle two numbers in the data set. For example, consider the eight results in the data set 2, 8, 5, 11, 4, 6,
9, 4. When placed in ascending order, the result is 2, 4, 4, 5, 6, 8, 9, 11. The middle two numbers are 5 and 6. The average is (5
+ 6)/2 or 5.5, so the median is 5.5.
6.2.4 The method for determining the interquartile range is to determine the middle number (or median) of the top and bottom
halves of the data set.
6.2.4.1 If there are an odd number of results in the data set, the median of the entire data set is included in both halves. For
example, consider again the data set 1, 4, 5, 5, 9. The underlined 5 is included in both halves. So, the middle number (or median)
of the top half of the data set, 5, 5, 9, is 5. The median of the top half of the data set is referred to as the upper hinge. The middle
number (or median) of the bottom half of the data set, 1, 4, 5, is 4. The median of the bottom half of the data set is referred to
as the lower hinge.
E2489 − 21
FIG. 3 Explanation of Hinges, Fences, and Categories
6.2.4.2 The IQR is the range from the upper hinge (the median of the top half of the data set) to the lower hinge (the median of
the bottom half of the data set).
6.2.4.3 Since the IQR of the data set 1, 4, 5, 5, 9, is the range from the upper hinge, 5, to the lower hinge, 4, the IQR is (5 – 4),
or 1.
6.2.4.4 If there is an even number of results in the data set, the data set is simply divided into a top half and a bottom half, each
containing an equal number of test results. For example, consider the data set 2, 4, 4, 5, 6, 8, 9, 11. The top half contains 6, 8,
9, 11 and the median (or upper hinge) is the average of 8 and 9, or 8.5. The bottom half contains 2, 4, 4, 5 and the median (or
lower hinge) is the average of 4 and 4, or 4.
6.2.4.5 Since the IQR of the data set 2, 4, 4, 5, 6, 8, 9, 11, is the range from the upper hinge, 8.5, to the lower hinge, 4, the IQR
is (8.5 – 4), or 4.5.
6.2.5 Once the IQR is determined, the outer fence is located three times the range of the IQR, (3 × IQR), beyond the upper and
lower hinges. See Fig. 3 and guidance provided in 4.8.1.
Outer Fence ~Upper!5 ~Upper Hinge!1~33IQR! (1)
Outer Fence Lower 5 Lower Hinge 2 33IQR (2)
~ ! ~ ! ~ !
6.2.5.1 For testing performed in strict accordance with a testing protocol, laboratory test results beyond the outer fence have an
extremely low likelihood of occurrence. Laboratory results occurring beyond the outer fence are categorized as “extremely
unusual.”
6.2.6 The inner fence is located 1.5 times the range of the IQR (1.5 × IQR) beyond the upper and lower hinges.
Inner Fence Upper 5 Upper Hinge 1 1.5 3IQR (3)
~ ! ~ ! ~ !
Inner Fence ~Lower!5 ~Lower Hinge!2 ~1.5 3IQR! (4)
6.2.6.1 Laboratory test results lying beyond the inner fence, but within the outer fence, have a low probability of occurrence when
testing is properly performed in accordance with the prescribed testing protocol. Laboratory results occurring beyond the inner
fence but within the outer fence are categorized as “unusual.”
E2489 − 21
TABLE 4 Alternative Intervals for Fences
Interval
Suggested
Beyond Approx. Number of Approx. Two-Tailed
Descriptive
the Upper Standard Deviations Probability for
Label for Results
and from the Consensus Results Outside
Occurring Outside
A A
Lower Value (Median) of the Interval
of the Interval
Hinges
1.0 IQR 2 0.04 typical
1.5 IQR 2.7 0.007 unusual
2.0 IQR 3.37 0.0008 very unusual
3.0 IQR 4.725 0.000 002 extremely unusual
A
The number of standard deviations from the consensus value and the probabili-
ties for being outside of the intervals are based on the assumption of a normal
distribution. The probabilities may vary for distributions that cannot be approxi-
mated by a normal distribution.
6.2.7 Most of the test results will fall within the inner fence. Laboratory test results falling at or within the inner fence are
categorized as “typical.”
6.2.8 If desired, other limits or fences can be used. Table 4 suggests several intervals that could be used to establish other fences
and gives the probabilities for results lying outside of each of the intervals listed in the table.
6.3 Example for Evaluating Laboratory Performance Using the Data in Table 2:
6.3.1 Table 2 shows test results for 30 laboratories, an even number of results, in descending order by test result. The median of
the data set is the average of the results for the 15th and 16th laboratories from the top of the table. The 15th and 16th laboratories
are #16 and #9. The median is the average of the results, (1.39 + 1.35)/2 or 1.37. See the analysis at the bottom of Table 2.
6.3.2 There are 15 results in the top half of the data in Table 2 and 15 in the bottom half. The middle (or median) value of the
top half is the eighth test result from the top, 1.76. This value, 1.76, is referred to as the upper hinge. The middle (or median) of
the bottom half is the eighth result from the bottom, 1.13. This value, 1.13, is referred to as the lower hinge.
6.3.3 The IQR is the range from the upper hinge (median of the top half) to the lower hinge (median of the bottom half), (1.76
– 1.13), or 0.63.
6.3.4 Since the outer fence is located three times the IQR beyond the hinges, the outer fence for the upper end of the data set is
located at [1.76 + (3 × IQR)], or [1.76 + (3 × 0.63)], or 3.65. The outer fence for the lower end of the data set is located at [1.13
– (3 × 0.63)], or –0.76.
6.3.5 Test results greater than 3.65 or less than –0.76 are categorized as “extremely unusual.” Only one test result, 4.89, is beyond
the outer fence. That test result, for laboratory #27, is greater than 3.65 and is categorized as “extremely unusual.” See Column
5 of Table 2. There are no test results below –0.76.
6.3.6 The inner fence is located 1.5 times the IQR beyond the hinges. The inner fence for the upper end of the data set is located
at [1.76 + (1.5 × 0.63)], or 2.705. The inner fence for the lower end of the data set is located at [1.13 – (1.5 × 0.63)], or 0.185.
6.3.7 On the upper end of the data, test results lying beyond the inner fence, but not beyond the outer fence (greater than 2.705,
but less than or equal to 3.65) are categorized as “unusual.” Test result 2.75, for laboratory #5, falls into that r
...

Questions, Comments and Discussion

Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.

Loading comments...