The presence of autocorrelation on the T2 control chart of Harold Hotelling

Roberto Campos Leoni1,2, Antônio Fernando Branco Costa1, Marcela Aparecida Guerreiro Machado1

1 São Paulo State University (UNESP), School of Engineering, Guaratinguetá

2 Military Academy of Agulhas Negras (AMAN)


ABSTRACT

The presence of autocorrelation violates the hypothesis of data independence used in statistical control charts in the manufacturing environment. This article examines graphically, using the Mahalanobis distance, the effect of autocorrelation in two measurable quality characteristics of X and Y, whose correlation and autocorrelation structures are from a VAR model(1). With the graphical evaluation, it is possible to understand that the presence of autocorrelation cannot be neglected by the users who use as statistical tool the control charts to monitor processes.

Keywords: Autocorrelation; Control Chart; T2 of Hotelling.


INTRODUCTION

The products of an industrial process have quality requirements that are defined by means of variables, that is, measurable quantities. Thanks to the existence of a system composed of numerous random causes, economically unfeasible to be eliminated, it is necessary to control the process by means of information extracted from samples collected during manufacturing. The state of the process is judged by means of this information: whether in statistical control, that is, only under the influence of random causes, or out of statistical control, that is, under the influence not only of random causes, but also of special causes that alter the characteristics of the product, however, Likely to be eliminated (Costa et al., 2005).

The monitoring of the various characteristics of a process stands out in the industrial scenario, as it can affect the final quality of the product. These processes are called multivariate processes. One of the most used tools in this type of monitoring is the control charts, which are statistical tools that signal changes in the process based on the behavior of one or several quality characteristics of interest. Hotelling (1947) was a precursor in introducing techniques to simultaneously monitor two or more quality features from control charts.

Monitoring these characteristics individually is not effective when there is dependency between them. The use of univariate control charts for each variable of a process is a possible solution; however, it may not have the same efficiency as the use of a multivariate control chart, a technique in which there is simultaneous monitoring and control of several related variables (Montgomery, 2004).

Although widely known in the manufacturing environment, the conditions for the use of control charts may be breached in some cases. Montgomery (2004) describes that basically all processes are governed by inertial elements and, when the interval between the withdrawal of the samples presents small intervals with respect to these forces, the observations show correlation over time. According to Mason et Young (2002), many industrial operations of continuous flow have autocorrelation and one of the possible causes is the gradual erosion of critical components of the process. Kim et al. (2010) state that the hypothesis of independence between observations of a variable can be violated by the high production rates that generate correlation and dependence between the observations of neighboring products, according to the time of manufacture.

The monitoring of multivariate processes whose observations are autocorrelated appears in recent publications. Mastrangelo et Forrest (2002) provided a program to generate autocorrelated data where it is possible to simulate displacement in the mean value of the variable under monitoring. Kalgonda et Kulkarni (2004) presented the control chart of Z to monitor observations that follow a VAR model (1). The advantage of the Z chart is that it identifies the quality characteristic that undergoes a change in its mean value, that is, the graph indicates which of the quality characteristics has been affected by a special cause that led to a change in the mean value. Pan et Jarrett (2007) and Jarrett et Pan (2007) proposed the use of VAR (p) model residues to monitor autocorrelated processes. The technique requires the adjustment of the model to the process data for later use of the residues in the Símbolo chart. Arkat et al. (2007) use artificial neural networks to monitor autocorrelated multivariate processes. Issam et Mohamad (2008) propose the use of the SVR (support vector regression) method to monitor changes in the mean vector in autocorrelated processes from the MCUSUM control chart. Hwarng et Wang (2010) establishe the use of neural networks that are able to identify displacements in the vector of the means of autocorrelated processes. There are several other papers on monitoring autocorrelated processes; Apley et Tsung (2002), Jiang (2004), Vargas et al. (2009) and Chen et Nembhard (2011) are some of them.

Therefore, this article aims to graphically evaluate the effect of autocorrelation on two measurable quality characteristics X and Y when there is a correlation between the observations of X and Y and there is a time dependence between the observations of X and also between the observations of Y and this correlation and autocorrelation structure is of a VAR(1) model. It was considered in the evaluation that the displacement in the mean is the most important in the whole process and that the vector of means and the covariance matrix are known or estimated with precision.

MODEL DESCRIBING THE QUALITY CHARACTERISTICS

The classic control procedures in multivariate processes consider the basic hypothesis that the observations follow normal multivariate distribution and are independent, with mean vector Símbolo and variance-covariance matrix Símbolo.

Equação

In which Símbolo represents the observations from a vector of order p x 1 (p is the number of variables); Símbolo are independent random vectors of order p x 1 with normal multivariate distribution, whose mean is zero and variance-covariance matrix Símbolo.

The independence hypothesis is violated in many manufacturing processes, which makes equation (1) inadequate to represent such observations. First-order autoregressive vectors, or VAR (1), equation (2), have been used to model multivariate processes with temporal correlation between observations of the same variable and correlation between observations of different quality characteristics (Mastrangelo et Forrest, 2002; And Nelson, 2003, Kalgonda et Kulkarni, 2004, Arkat et Niaki, 2007, Jarrett et Pan, 2007, Issam et Mohamad, 2008, Pfaff, 2008, Niaki et Davoodi, 2009, Hwarng et Wang, 2010, Kim et al. 2010; Kalgonda, 2012).

In autocorrelated multivariate processes, the VAR model (1) is represented by:

Equação

In which Símbolo is the data vector order p x 1; Símbolo is the mean vector of order p x 1 and Símbolo is a matrix with the autoregressive parameters of order p x p and Símbolo are independent random vectors of order p x 1 cwith normal multivariate distribution, whose mean is zero and variance-covariance matrix is Símbolo.

If Símbolo is a null matrix, the equation (2) is reduced to equation (1), that is, this is the classical model for independent data over time. Otherwise, the data will be dependent over time and the structure of variation of the model is represented by the cross-covariance matrix (Shumway et Stoffer, 2006). Under the assumption that the process is stationary, Símbolo, for all t, the cross-covariance matrix will be:

Equação

Being stationary means that Símbolo is constant for every Símbolo and the cross-covariance matrix does not depend on t, it depends only on h, qwhich represents the interval over time between vector Símbolo and Símbolo.

The matrix Símbolo is formed by the elements Símbolo provided by:

Equação

The cross-covariance matrix for h=0, Símbolo, when Símbolo and Símbolo are known, can be obtained by the relation of Yule-Walker (Ltkepohl, 2005).

Equação

Supposing that Equação is a vector of data with a p-variance distribution and follows the model described in equation (2), according to Kalgonda et Kulkarni (2004) and Kalgonda (2012),

Equação

If the process is in statistical control, Equação follows a normal multivariate distribution with mean vector Equação and cross-covariance matrix Equação.

HOTELLING T2 CONTROL GRAPH

One of the solutions to monitor processes with two or more quality characteristics was proposed by Hotelling (1947) through the use of Equaçãostatistics. Hotelling's Equação graph is a multivariate version of Equação of the Shewhart's control chart (Shewhart, 1931), making it the most used control device in monitoring the average process vector. The statistic Equação can be calculated with a single observation of each quality characteristic or from the average of the samples of several quality characteristics simultaneously monitored. By means of the distribution of Equação probability, it is possible to establish adequate control limits for Hotelling’s Equação chart (Mason et Young, 2002; Bersimis et al., 2007).

Assuming that the mean vector (Equação) eand the covariance matrix (Equação), the Equação control chart uses the statistical distance Equação, equation (7), which has a chi-square distribution with p degrees of freedom Equação when the process is in statistical control (Alt, 1985).

Equação

where n is the size of the t-th rational subgroup and Equação is the vector of the sample means of the p variables for the t-th rational subgroup. When n=1, the Equação statistic is reduced to:

Equação

In the Equação control graph, when the Equação statistic is less than the upper control limit (UCL), the process remains in statistical control, that is,

Equação

When the vector of means (Equação) and the covariance matrix (Equação) are unknown and need to be estimated, the control limits are calculated according to the monitoring phase (Bersimis et al., 2007).

If a special cause acts on the average of the process, moving it to a new threshold, the vector Equação can be represented by:

Equação

in which Equação ndicates the magnitude of the displacement in the mean; thus, the statistic Equação will follow non-central chi-square distribution Equação.

Equação

Some studies dealing with multivariate process control schemes use the parameter of non-centrality Equação as a measure of displacement in the vector of means of the process (Alt, 1985; Aparisi, 1996; Aparisi et Haro, 2001; Mason et Young, 2002).

Equação

This measure has non-central chi-square distribution with p degrees of freedom and non-centrality parameter Equação. The mean number of samples up to the out-of-control signal (NMA) given by the Equação control chart is a function of the non-centrality parameter.

Equação

With the presence of autocorrelation in the process, the control limit of the Equação chart no longer has a chi-square distribution with p degrees of freedom Equação when the vector of means and the covariance matrix are known. Similarly, when there is a deviation in the vector of means, the statistic Equação no longer has a non-central chi-square distribution !Equação.

EFFECT OF AUTOCORRELATION IN BIVARIATED PROCESSES

Hotelling’s Equação chart is one of the best known in the manufacturing environment and the application of this technique is materialized in numerous articles, as can be seen in the multidisciplinary reference database Web of Science which is integrated with the ISI Web of Knowledge base. When searching for the keywords Hotelling and chart in the title of the periodicals available in December 2013, the database presents 28 articles that are cited 162 times in several works, evidencing the importance of this technique as a tool in the scientific and academic environment. Figure 1 shows the distribution of articles per year.

Figure 1. Distribution of articles found in the ISI Web of Knowledge database

Equação

Source: The authors themselves.

Hotelling’s Equação chart was created to be used when the assumption of independence between the observations of one or more quality characteristics is not violated. Disregarding the effect of such hypothesis is quite detrimental to the proper performance of the control chart tool and, for this reason, it has to be evaluated when monitoring a process.

We considered in this paper the distance of vector X to the vector of means Equação called statistical distance or distance of Mahalanobis (Mahalanobis, 1936). This distance is the same used in Hotelling's Equação control chart.

Equação

The relationship between the cross-covariance matrix, Equação, and the elements of the matrices Equação and Equação is obtained using the equation (5). Considering the presence of autocorrelation and correlation from the VAR model (1), the distance from Mahalanobis will be:

Equação

Without loss of generality, considering the bivariate case in which Equação and Equação, when Equação and the vector Equação the distance Equação is equivalent to:

Equação

The equation (16) reveals the influence of Equação in the distance Equação.

If Equação, that is, Equação (there is no autocorrelation), the distance Equação is reduced to:

Equação

When there is no autocorrelation, that is, the data are independent, Equação phas a chi-square distribution with p degrees of freedom Equação. In order to evaluate the effect of autocorrelation, the bivariate and Equação, in which case Equação= 10,5966.

The performance of a control chart can be evaluated because of the number of samples used by the chart to detect an offset in the characteristic that is to be monitored. When there is no displacement, the process is in statistical control. It is expected, in this case, that the signal given by the chart is a false alarm. The value Equação=10,5966 is equivalent to a false alarm, on average, for every 200 samples evaluated when the Hotelling Equação chart is used (Costa et al., 2005).

Based on the VAR (1) model, the vector of process averages when in control (Equação) can undergo shifts from the order of Equação to a new threshold Equação, where Equação is an order vector (px1) and each element represents the magnitude of the displacement in the mean value of the p-th variable. For an idea in terms of what happens in the mean of the process after a displacement, the VAR (1) model is represented here as a function of the error vector (Equação) and the vector of means (Equação).

Equação

If the displacement occurs in the vector of means of the process in control, at some instant of time t = T, then the mean of Equação will change from Equação to:

Equação

Without loss of generality, considering Equação, the change in the means vector can be represented in three stages:

Equação

The chart for best performance will be the one that more rapidly detects, from a instant of time t = T, change in the mean value of the quality characteristics that are being monitored.

In the graphical evaluation of the effect of autocorrelation, it was considered that the displacement is described by equation (19). For example, in a bivariate process, the occurrence of a special cause displaces the vector of means Equação to a new level Equação. In Sections 4.1 and 4.2, the graphical variation is presented along with the under-control process Equação and with the out-of-control process Equação, respectively.

Graphical evaluation of the effect of autocorrelation with the control process

In an autocorrelation-free process, Equação and Equação=0,7, we have Equação. The ellipse representing the distribution level curve for Equação=10,5966 is illustrated in Figure 2.

Figure 2. Ellipse: Equação and Equação=0,7

Equação

Source: The authors.

Figure 3. Ellipse: Equação and Equação=0,7

Equação

Source: The authors.

Generalizing, for Equação, it can be observed in Figure 4 a graphical demonstration in which the greater the autocorrelation, the greater the elliptic region, that is, autocorrelation increases the variability of the variables of the process under monitoring.

Figure 4. Ellipses: Equação e Equação=0,7

Equação

Source: The authors.

If the data are normally distributed, the ellipses of Figure 4 represent all points equidistant, at the distance of Mahalanobis, from the origin. This suggests that all of these points are equally likely to be governed by a multivariate normal distribution centered at (0,0), since Equação=0. In the Hotelling Equação chart, the control limit (UCL) equal to Equação=10,5966, generates, on average, a false alarm for every 200 samples collected when Equação. The same does not occur when Equação, i.e. the average false alarm rate does not correspond to an alarm for every 200 samples collected, even if the UCL value used is 10.5966. In practice, this means that, when we use the Hotelling Equação chart, considering the UCL of the chi-square graph with p degrees of freedom Equação in the presence of autocorrelation, will give us a false alarm rate different from that desired.

Graphical evaluation of the effect of autocorrelation with the out-of-control process

Figure 5 shows a process free of autocorrelation with Equação and Equação=0,7. The dashed ellipse with center in (0,0) represents an under-control process and its equation is Equação=10,5966. The other ellipses represent the occurrence of a special cause that displaces the mean vector Equação to a new level:

Equação

Figure 6 shows a process with autocorrelation with Equação and Equação=0,7. The dashed ellipse with center in (0,0) represents a under-control process and its equation is: Equação. The value 10.06 was used to make a fair comparison that, in the presence of autocorrelation, keeps the false alarm mean rate equal to one alarm per 200 samples. The other ellipses represent the occurrence of a special cause that moves the mean vector Equação to a new level:

Equação

Figure 5. Ellipses: Equação and Equação=0,7

Equação

Source: The authors.

Figure 6. Ellipses: Equação e Equação=0,7

Equação

Source: The authors.

In Figure 5, it can be observed that, in processes without autocorrelation, the displacement in the vector of means caused by a special cause is represented by the ellipses that move away from the center in (0,0), characterizing that the Equação chart, in this case, presents superior performance in relation to the process in which autocorrelation is present. In Figure 6, the ellipses present greater resistance to remain close to the center at (0,0), when displacements that mismatch the mean vector occur, meaning that the performance of the Equação chart is lower when autocorrelation is present.

CONCLUSION

This article evaluated the effect of autocorrelation on the Equação control chart as one of the most popular tools in the academic and industrial environment. The distance from Mahalanobis, the same statistic used in the Equação chart, was used to represent geometrically the behavior of a process in the presence and absence of special causes that affect the average value of the quality characteristics monitored.

The violation of the autocorrelation hypothesis should be taken seriously and verified before the use of the graphic control statistical tool, since the presence of autocorrelation affects the performance of traditional control charts, reducing the ability to detect deviations in the mean vector.

The use of ellipses illustrated how the data of a process behave in the presence of autocorrelation, masking the effect of the displacement that occurs when the quality characteristics said in statistical control shift to the situation of out of statistical control.

It is suggested, in future works, the presentation of statistics or techniques that improve performance of control charts in the presence of autocorrelation.


REFERENCES

Alt, F. B. (1985), Multivariate control charts. Encyclopedia of Statistical Sciences. Kotz. S. Johnson. N. L. Eds., Wiley.

Aparisi, F. (1996), “Hotelling’s T2 control chart with adaptive sample sizes”, International Journal of Production Research, Vol. 34. pp. 2853-2862.

Aparisi, F. et Haro C.L, (2001), “Hotelling’s T2 control chart with variable sampling intervals”, International Journal of Production Research, Vol. 39. pp. 3127-3140.

Apley, D.W. et Tsung F. (2002), “The autoregressive T2 chart for monitoring univariate autocorrelated processes”, Journal of Quality Technology, Vol. 34. pp. 80-96.

Arkat. J., Niaki. S.T.A., Abbasi. B. (2007), “Artificial neural networks in applying MCUSUM residuals charts for AR(1) processes”, Applied Mathematics and Computation, Vol. 189. pp. 1889-1901. ARKAU. Bersimis, S., Psarakis, S., Panaretos, J. (2007), “Multivariate Statistical Process Control Charts: An Overview”, Quality and Reliability Engineering International, Vol.23, pp. 517-543.

Biller, B. et Nelson. B. (2003), “Modeling and generating multivariate time-series input processes using a vector autoregressive technique”, ACM Transactions on Modeling and Computer Simulation, Vol. 13. No.3. pp. 211-237.

Chen, S. et Nembhard, H.B. (2011), “Multivariate cuscore control charts for monitoring the mean vector in autocorrelated process”, IIE Transactions, Vol. 43. pp. 291-307.

Costa, A. F. B., Epprecht, E.K., Carpinetti, L.C.R. (2005), Controle Estatístico de Qualidade. 2a. ed., São Paulo: Editora Atlas.

Hotelling, H. (1947), “Multivariate quality control, illustrated by the air testing of sample bombsights”, Techniques of Statistical Analysis, pp.111-184. New York, McGraw Hill.

Hwarng, H.B. et Wang. Y. (2010), “Shift detection a source identification in multivariate autocorrelated process”, International Journal of Production Research, Vol. 48. No. 3. pp.835-859.

Issam, B.K. et Mohamad. L. (2008), “Support vector regression based residual MCUSUM control chart for autocorrelated process”, Applied Mathematics and Computation, Vol. 201. pp. 565-574.

Jarrett, J.E. et Pan. X. (2007), “The quality control chart for monitoring multivariate autocorrelated processes”, Computational Statistics & Data Analysis, Vol. 51. pp. 3862-3870.

Jiang, W. (2004), “Multivariate control charts for monitoring autocorrelated processes”, Journal of Quality Technology, Vol. 36. pp. 367-379.

Kalgonda, A.A. (2012), “A Note on generalization of Z Graph”, Journal of Academia and Industrial Research, Vol. 1. No.6. pp. 286-289.

Kalgonda, A.A. et Kulkarni. S.R. (2004), “Multivariate quality control chart for autocorrelated processes”, Journal of Applied Statistics, Vol. 31. pp. 317-327.

Kim, S.B., Jitpitaklert. W., Sukchotrat. T. (2010), “One-Class Classification-Based Control Charts for Monitoring Autocorrelated Multivariate Processes”, Communications in Statistics - Simulation and Computation, Vol. 39. No.3. pp. 461-474.

Ltkepohl, H. (2007), New Introduction to Multiple Time Series Analysis. New York: Springer.

Mahalanobis, P.C. (1936), In Proceedings National Institute of Science, India, Vol. 2, No.1. pp. 49-55.

Mason, R. et Young, J.C. (2002), Multivariate statistical process control with industrial applications. Alexandria. Society for Industrial and Applied Mathematics.

Mastrangelo, C.M. et Forrest, D. R. (2002), “Multivariate Autocorrelated Processes: Data and Shift Generation”, Journal of Quality Technology, Vol. 34, No. 2. pp. 216-220.

Montgomery, D.C. (2004), Introduction to statistical quality control. John Wiley & Sons. Inc. New York. New York.

Niaki, S.T.A. et Davoodi, M. (2009), “Designing a multivariate-multistage quality control system using artificial neural networks”, International Journal of Production Research, Vol. 47. pp. 251-271.

Pan, X. et Jarrett, J.E. (2007), “Using vector autoregressive residuals to monitor multivariate processes in the presence of serial correlation”, International Journal of Production Economics, Vol. 106. pp. 204-216.

Pfaff, B. (2008), “VAR, SVAR and SVEC models: implementation within r package vars”, Journal of Statistical Software, Vol. 27. No. 4. pp. 204-216.

Shewhart, W.A. (1931), Economic control of quality of manufactured product. 1ª Ed. New York: D. Van Nostrand Company.

Shumway, R. H. et Stoffer. D. S. (2006), “Time Series Analysis and Its Applications: With R Examples. 2ª Ed. New York: Springer.

Vargas, M., Alfaro, J.L., Mondéjar, J. (2009), “On the run length of a state-space control chart for multivariate autocorrelated process data”, Communications in Statistics - Simulation and Computation, Vol. 38. pp. 1823-1833.