Statistical process control: How to quantify product quality

Authors

Greg McMillan and Stan Weiner bring their wits and more than 80 years of process control experience to bear on your questions, comments and problems. Write to them at [email protected]. Follow McMillan's Control Talk Blog.

See more Control Talk articles.

Greg: When all is said and done with plant performance metrics, it comes down to how well the product meets customer quality specifications. As with all things in manufacturing, you can only control what you can measure. Measuring product quality poses challenges because observed deviations can occur on different time scales due to batch and other sequential operations, operator actions, control loop performance, equipment performance, sample techniques, at-line analyzer errors, lab data entry mistakes, and changes in raw materials and ambient conditions.

Stan: Knowing the time scale of variability is a critical step in being able to understand the source of the variability and to know the process capability and performance. Fortunately, we have Richard Miller, a retired Monsanto-Solutia Fellow like Greg and me, dedicated to improving process performance, except Ric's expertise is statistical process control. Ric continues to advance our understanding and ability to measure product quality as a senior quality engineer at Ascend Performance Materials. Ric, what are the two primary statistical metrics that you use to control Ascend's many processes?

Ric: We compute the Cpk metric to give us the "capability of the process" and the Ppk metric to tell us "product performance," both essential to understanding the natural voice of the process. For a process with min/max specifications, both Cpk and Ppk metrics use the minimum of the difference between the process population's mean (μ) and "max spec" and "min spec" in the numerator of their equations on slide 1 of the online "Understanding PpK". The key distinction between the metrics Cpk and Ppk is their dependence upon the short-term sigma (σST) and the long-term sigma (σLT), respectively, both of which include the additive effect of the measurement sigma (σm). Note that the term sigma is typically used for population variability whereas the equivalent term, standard deviation, is most often used for the sample or measurement variability where about 68% of the measured values fall within plus and minus one standard deviation about the mean of a normal distribution. Long-term sigma is the conventional standard deviation of a population of samples collected on some regular frequency, while short term sigma is calculated from their absolute two-point moving range.

Greg: How do we get the measurement sigma?

Ric: I have found a relatively quick and easy way is to divide a plant sample into thirds and send each of them to the lab blindly. In other words, the two backup samples are held and sent to the lab at different times. This is repeated five times over a two-week period, hopefully using different lab technicians and lab pieces of equipment. The square root of the pooled variances of the triad results is the standard deviation I refer to as the measurement sigma. Using routine plant rather than special samples enables us to test the lab under commercial operation rather than special lab conditions (see reference 1 for more information on our "5/3" testing for measurement system validation).

Stan: Could you do something similar for at-line analyzers?

Ric: Since an analyzer is probably set up for calibration by the introduction of samples, you could use a plant sample rather than a standard sample and follow a similar procedure to get the measurement sigma for an at-line analyzer.

Greg: What is the importance of the measurement sigma?

Ric: The measurement sigma becomes increasingly important as it approaches size of the short-term and long-term sigma. Even if it is initially negligible, as you improve the process capability and process performance through reduction of the short-term and long-term sigma respectively, the measurement sigma becomes more of an issue. Thus, part of process control improvement involves improving analyzer technology, sample preparation and handling, and automated data entry. The lab result is taken as correct, so improvement of lab procedures and data entry must be done upfront. Significant measurement variability in the lab has occasionally been traced to a mistake made in manual data entry. Thus, lab analyzers that offer the capability of automatically storing results in the plant data historian offer advantages in terms of more accurate and accessible data. Of course, these analyzers can generate data more frequently and provide results immediately, rather than waiting on lab results.

Greg: The automatic inclusion of analysis results in the data historian is also important for visual analysis and automated analysis by data analytics software that is looking at the whole process and doing principle component analysis (PCA) and predictions by partial least squares (PLS). Often, the most valuable predictions are quality variables (QVs) that are directly or indirectly obtained by lab analysis results. For more on how to deal with deluge of information from automation systems, see the four-part series of Control Talk columns that started in February 2010 with "Drowning in Data; Starving for Information - 1"

Stan: How can you improve the analysis of the Product Performance?

Ric: The customer imposes restrictions on a process, which has its own "natural voice" defined by Statistical Process Control (SPC) three-sigma limits (short-term sigma), by means of its product quality specifications "Max Spec" and "Min Spec" – used in the numerator of the "Capability of the Process" and "Product Performance" ratios. However, a centered location of the process mean relative to these specifications improves Ppk (and Cpk) by maximizing the numerator for the "Product Performance" Ppk metric. A Ppk equal to 1.0 and 1.33 translates to a difference of 3 sigma with 2,700 ppm off-spec and 4 sigma with 63 ppm off-spec, respectively, for a centered process. As the process mean shifts off center, the value of the Ppk worsens; as long-term sigma is reduced, Ppk improves. Slide 2 of the online "Understanding Ppk" illustrates the concept. For example, consider a case where a centered process has a Ppk equal to 0.5, which means the distance from the population mean to either specification is 1.5 long-term sigma. If that process' sigma can be reduced by half, Ppk will improve to 1.0 – going from a process producing 133,614 defects per million to one producing only 2,700. Of course, another way to improve Ppk's value is to widen the process' specifications, which should be set by customer needs as opposed to process capabilities.

Greg: What smart sampling techniques have you developed to minimize the number of samples without inflating long-term sigma?

Ric: I have used a variable-interval sampling strategy where the value taken from the most recent three samples determines the frequency with which the next sample is taken. If the last 2 of 3 measurements fall within in the middle 50% of the SPC chart as defined by probabilities, the next sample is taken at twice the sampling time period. If, however, two-of-three samples fall in either outside quarter of the SPC chart, but within its three-sigma limits, I sample at the regular frequency.

Greg: The process deadtime has been used by Joseph Shunta at DuPont and by Bill Bialkowski at Entech to make sure the process capability metric is more realistic. We know from first-principle relationships that the ultimate limit to the peak error and integrated error for a PID controller with the most aggressive tuning settings for exactly known and constant process dynamics is proportional to the dead time and dead time squared, respectively, for a step disturbance at the process input. The equations that derive these fundamental limitations are detailed in Appendix C - Controller Performance of my 2015 Momentum Press book, Tuning and Control Loop Performance 4th Edition. How do you take into account the deadtime in determining the short term sigma?

Ric: The process deadtime decreases the process capability by increasing the short-term sigma. For example, if I'm using an analyzer to characterize a process and want to estimate its measurement-system sigma in a "5/3" test using five groupings of three consecutive measures, the analyzer's deadtime would increase the time required for each sample in each triad, which would add the potential for natural process variation to inflate the measurement's contribution to short-term sigma.

Stan: How do you deal with subjective measurements of quality where humans are doing the analysis?

Ric: Human factor was very evident in grading carpets, e.g., estimating carpet body. Generally, rankings by sales professionals and management are noisier than those of the professional grader, though both agree as to the general direction of the testing, e.g., carpet one has more body than does carpet two. In one case, however, the grading by the manager at a customer for a grouping of 10 carpets was the opposite of the professional grader reporting to him, who also agreed with our professional graders and our sales management. The customer's manager had a very different perception of what was important with regard to carpet body and what he was seeing. A significant tool for improving subjective grading by adding a statistical metric to this type grading is "Quad Analysis." The qualitative measurement of beverage and food taste testing has similar considerations and opportunities.

Each quad (group of 4) is subjected to a systematic paired comparison. For example, consider an analysis of 10 carpets (detailed in Reference 2). The quad analysis progresses through a series of five paired comparisons in each quad evaluation of the 30 quad design for this number of carpets. Unbeknownst to the grader, the carpets actually decrease in body (A > B > C >D) and they need to be sorted in the opposite direction. The presenter pulls four carpets, the quad from the larger group, and presents them to the grader as two pairs (pairs 1 and 2). The grader then takes over and chooses the greater body member of each pair to form pair 3 (pair of winners) and pair 4 (pair of losers). He/she judges these new pairs and then forms a final pair where the loser of pair three is judged against the winner of pair four. Thus, the four are rearranged and ranked A to D, best (1) to worst (4). Larger groupings, say quints or sextuples do not lend themselves to this flow of pairs. In summary, each quad is presented as two pairs to the grader, who in turn judges the initial two, then forms and judges three others. The typical quad takes 1 ½ minutes (45 minutes for the 30 quads in a 10 carpet design) to evaluate. This strategy simplifies the grader's decision from trying to rank order a group of 10 carpets all at once to more accurate decisions based on pairs. It also provides probability estimates for the degree of difference among the 10 carpets.

Stan: Are there some best practices you can share in getting better metrics?

Ric: Every estimate of process variation is inflated by a measurement system. You cannot measure the former without the latter so always evaluate the quality of your measurement systems so you need to know how much variation is being contributed by it to your overall estimate of process variance:

σ 2 Total Process = σ 2 Process + σ 2 Measurement System

As a rule of thumb, the measurement system is acceptable if its ratio to total process variation is less than 30%.

Process capability metrics are simple ratios of position in the numerator and variation in the denominator of the equations on page 1 of the online "Understanding PpK." Improve Ppk and Cpk by: (1) making the numerator larger by increasing the distance between your population mean and its nearest specification or by widening the specifications themselves; or (2) making the denominator smaller by reducing long- and short-term sigma. Accomplish 1 (it's easier) then understand the process' sigma and develop strategies to reduce them.

Greg: Also see the online version for more ideas on how to improve the Ppk analysis, some best practices, and the "Top 10 things you don't want to hear from a statistician." I actually heard something similar to the List's "Number 1 Thing" from a very young engineer in a presentation at recent Users Group Conference who said, "I know everything about how to make measurements work in your plants because I just got my four- year degree in Electrical Engineering."

"Top 10 things you don't want to hear from a statistician"

10. We want a process sample every hour.
9. Put all the control loops in manual so I can see the natural voice of the process.
8. I'm from quality, and I'm here to help.
7. Your Cpk shows that your process needs significant improvement (short-term sigma is too high and it's not because of the measurement system).
6. Your long-term sigma is being inflated by process drifts and shifts.
5. Your process mean is too close to a specification, i.e. it needs centering.
4. Your measurement system is contributing 70% of your short- and long-term variation (your SPC chart is just reflecting the measurement system, not the process).
3. Your Ppk is negative (population mean is on the wrong side of the specification).
2. The customer says he needs tighter specifications (Ppk falls).
1. I just graduated with a four-year degree in math so I know everything.

For more details on the advances presented here are some papers by Ric:

Amin, R.W. and Miller, R.W. (1993), "A Robustness Study of Xbar Charts with Varying Sampling Intervals," Journal of Quality Technology, Vol. 25 (No. 1), pp. 36-44.
Miller, R.W. (2002), "Subjective Property Characterization by ‘Quad Analysis': An Efficient Method for Conducting Paired Comparisons," Textile Research Journal. 72(12), pp. 1041-1051. http://trj.sagepub.com/content/72/12/1041
Miller, Richard (August 2010), "Reducing Sample Costs: Implementing a Variable Sampling Interval Strategy," ISixSigma.com.
Miller, Richard (September 2010), "Gain Continuous Measurement System Validation with a 5/3 Strategy," ISixSigma.com.
Miller, Richard W. (2011), "Quad Folding: a Simple Idea for the Subjective Property Characterization of Large Sample Sets," Journal of Applied Statistics. DOO: 1080/02664763.2011.604307.