1 São Paulo State University (UNESP), School of Engineering, Guaratinguetá
2 Military Academy of Agulhas Negras (AMAN)
The presence of autocorrelation violates the hypothesis of data independence used in statistical control charts in the manufacturing environment. This article examines graphically, using the Mahalanobis distance, the effect of autocorrelation in two measurable quality characteristics of X and Y, whose correlation and autocorrelation structures are from a VAR model(1). With the graphical evaluation, it is possible to understand that the presence of autocorrelation cannot be neglected by the users who use as statistical tool the control charts to monitor processes.
Keywords: Autocorrelation; Control Chart; T2 of Hotelling.
The products of an industrial process have quality requirements that are defined by means of variables, that is, measurable quantities. Thanks to the existence of a system composed of numerous random causes, economically unfeasible to be eliminated, it is necessary to control the process by means of information extracted from samples collected during manufacturing. The state of the process is judged by means of this information: whether in statistical control, that is, only under the influence of random causes, or out of statistical control, that is, under the influence not only of random causes, but also of special causes that alter the characteristics of the product, however, Likely to be eliminated (Costa et al., 2005).
The monitoring of the various characteristics of a process stands out in the industrial scenario, as it can affect the final quality of the product. These processes are called multivariate processes. One of the most used tools in this type of monitoring is the control charts, which are statistical tools that signal changes in the process based on the behavior of one or several quality characteristics of interest. Hotelling (1947) was a precursor in introducing techniques to simultaneously monitor two or more quality features from control charts.
Monitoring these characteristics individually is not effective when there is dependency between them. The use of univariate control charts for each variable of a process is a possible solution; however, it may not have the same efficiency as the use of a multivariate control chart, a technique in which there is simultaneous monitoring and control of several related variables (Montgomery, 2004).
Although widely known in the manufacturing environment, the conditions for the use of control charts may be breached in some cases. Montgomery (2004) describes that basically all processes are governed by inertial elements and, when the interval between the withdrawal of the samples presents small intervals with respect to these forces, the observations show correlation over time. According to Mason et Young (2002), many industrial operations of continuous flow have autocorrelation and one of the possible causes is the gradual erosion of critical components of the process. Kim et al. (2010) state that the hypothesis of independence between observations of a variable can be violated by the high production rates that generate correlation and dependence between the observations of neighboring products, according to the time of manufacture.
The monitoring of multivariate processes whose observations are autocorrelated appears in recent publications. Mastrangelo et Forrest (2002) provided a program to generate autocorrelated data where it is possible to simulate displacement in the mean value of the variable under monitoring. Kalgonda et Kulkarni (2004) presented the control chart of Z to monitor observations that follow a VAR model (1). The advantage of the Z chart is that it identifies the quality characteristic that undergoes a change in its mean value, that is, the graph indicates which of the quality characteristics has been affected by a special cause that led to a change in the mean value. Pan et Jarrett (2007) and Jarrett et Pan (2007) proposed the use of VAR (p) model residues to monitor autocorrelated processes. The technique requires the adjustment of the model to the process data for later use of the residues in the chart. Arkat et al. (2007) use artificial neural networks to monitor autocorrelated multivariate processes. Issam et Mohamad (2008) propose the use of the SVR (support vector regression) method to monitor changes in the mean vector in autocorrelated processes from the MCUSUM control chart. Hwarng et Wang (2010) establishe the use of neural networks that are able to identify displacements in the vector of the means of autocorrelated processes. There are several other papers on monitoring autocorrelated processes; Apley et Tsung (2002), Jiang (2004), Vargas et al. (2009) and Chen et Nembhard (2011) are some of them.
Therefore, this article aims to graphically evaluate the effect of autocorrelation on two measurable quality characteristics X and Y when there is a correlation between the observations of X and Y and there is a time dependence between the observations of X and also between the observations of Y and this correlation and autocorrelation structure is of a VAR(1) model. It was considered in the evaluation that the displacement in the mean is the most important in the whole process and that the vector of means and the covariance matrix are known or estimated with precision.
The classic control procedures in multivariate processes consider the basic hypothesis that the observations follow normal multivariate distribution and are independent, with mean vector and variance-covariance matrix .
In which represents the observations from a vector of order p x 1 (p is the number of variables); are independent random vectors of order p x 1 with normal multivariate distribution, whose mean is zero and variance-covariance matrix .
The independence hypothesis is violated in many manufacturing processes, which makes equation (1) inadequate to represent such observations. First-order autoregressive vectors, or VAR (1), equation (2), have been used to model multivariate processes with temporal correlation between observations of the same variable and correlation between observations of different quality characteristics (Mastrangelo et Forrest, 2002; And Nelson, 2003, Kalgonda et Kulkarni, 2004, Arkat et Niaki, 2007, Jarrett et Pan, 2007, Issam et Mohamad, 2008, Pfaff, 2008, Niaki et Davoodi, 2009, Hwarng et Wang, 2010, Kim et al. 2010; Kalgonda, 2012).
In autocorrelated multivariate processes, the VAR model (1) is represented by:
In which is the data vector order p x 1; is the mean vector of order p x 1 and is a matrix with the autoregressive parameters of order p x p and are independent random vectors of order p x 1 cwith normal multivariate distribution, whose mean is zero and variance-covariance matrix is .
If is a null matrix, the equation (2) is reduced to equation (1), that is, this is the classical model for independent data over time. Otherwise, the data will be dependent over time and the structure of variation of the model is represented by the cross-covariance matrix (Shumway et Stoffer, 2006). Under the assumption that the process is stationary, , for all t, the cross-covariance matrix will be:
Being stationary means that is constant for every and the cross-covariance matrix does not depend on t, it depends only on h, qwhich represents the interval over time between vector and .
The matrix is formed by the elements provided by:
The cross-covariance matrix for h=0, , when and are known, can be obtained by the relation of Yule-Walker (Ltkepohl, 2005).
Supposing that is a vector of data with a p-variance distribution and follows the model described in equation (2), according to Kalgonda et Kulkarni (2004) and Kalgonda (2012),
If the process is in statistical control, follows a normal multivariate distribution with mean vector and cross-covariance matrix .
One of the solutions to monitor processes with two or more quality characteristics was proposed by Hotelling (1947) through the use of statistics. Hotelling's graph is a multivariate version of of the Shewhart's control chart (Shewhart, 1931), making it the most used control device in monitoring the average process vector. The statistic can be calculated with a single observation of each quality characteristic or from the average of the samples of several quality characteristics simultaneously monitored. By means of the distribution of probability, it is possible to establish adequate control limits for Hotelling’s chart (Mason et Young, 2002; Bersimis et al., 2007).
Assuming that the mean vector () eand the covariance matrix (), the control chart uses the statistical distance , equation (7), which has a chi-square distribution with p degrees of freedom when the process is in statistical control (Alt, 1985).
where n is the size of the t-th rational subgroup and is the vector of the sample means of the p variables for the t-th rational subgroup. When n=1, the statistic is reduced to:
In the control graph, when the statistic is less than the upper control limit (UCL), the process remains in statistical control, that is,
When the vector of means () and the covariance matrix () are unknown and need to be estimated, the control limits are calculated according to the monitoring phase (Bersimis et al., 2007).
If a special cause acts on the average of the process, moving it to a new threshold, the vector can be represented by:
in which ndicates the magnitude of the displacement in the mean; thus, the statistic will follow non-central chi-square distribution .
Some studies dealing with multivariate process control schemes use the parameter of non-centrality as a measure of displacement in the vector of means of the process (Alt, 1985; Aparisi, 1996; Aparisi et Haro, 2001; Mason et Young, 2002).
This measure has non-central chi-square distribution with p degrees of freedom and non-centrality parameter . The mean number of samples up to the out-of-control signal (NMA) given by the control chart is a function of the non-centrality parameter.
With the presence of autocorrelation in the process, the control limit of the chart no longer has a chi-square distribution with p degrees of freedom when the vector of means and the covariance matrix are known. Similarly, when there is a deviation in the vector of means, the statistic no longer has a non-central chi-square distribution !.
Hotelling’s chart is one of the best known in the manufacturing environment and the application of this technique is materialized in numerous articles, as can be seen in the multidisciplinary reference database Web of Science which is integrated with the ISI Web of Knowledge base. When searching for the keywords Hotelling and chart in the title of the periodicals available in December 2013, the database presents 28 articles that are cited 162 times in several works, evidencing the importance of this technique as a tool in the scientific and academic environment. Figure 1 shows the distribution of articles per year.
Figure 1. Distribution of articles found in the ISI Web of Knowledge database
Source: The authors themselves.
Hotelling’s chart was created to be used when the assumption of independence between the observations of one or more quality characteristics is not violated. Disregarding the effect of such hypothesis is quite detrimental to the proper performance of the control chart tool and, for this reason, it has to be evaluated when monitoring a process.
We considered in this paper the distance of vector X to the vector of means called statistical distance or distance of Mahalanobis (Mahalanobis, 1936). This distance is the same used in Hotelling's control chart.
The relationship between the cross-covariance matrix, , and the elements of the matrices and is obtained using the equation (5). Considering the presence of autocorrelation and correlation from the VAR model (1), the distance from Mahalanobis will be:
Without loss of generality, considering the bivariate case in which and , when and the vector the distance is equivalent to:
The equation (16) reveals the influence of in the distance .
If , that is, (there is no autocorrelation), the distance is reduced to:
When there is no autocorrelation, that is, the data are independent, phas a chi-square distribution with p degrees of freedom . In order to evaluate the effect of autocorrelation, the bivariate and , in which case = 10,5966.
The performance of a control chart can be evaluated because of the number of samples used by the chart to detect an offset in the characteristic that is to be monitored. When there is no displacement, the process is in statistical control. It is expected, in this case, that the signal given by the chart is a false alarm. The value =10,5966 is equivalent to a false alarm, on average, for every 200 samples evaluated when the Hotelling chart is used (Costa et al., 2005).
Based on the VAR (1) model, the vector of process averages when in control () can undergo shifts from the order of to a new threshold , where is an order vector (px1) and each element represents the magnitude of the displacement in the mean value of the p-th variable. For an idea in terms of what happens in the mean of the process after a displacement, the VAR (1) model is represented here as a function of the error vector () and the vector of means ().
If the displacement occurs in the vector of means of the process in control, at some instant of time t = T, then the mean of will change from to:
Without loss of generality, considering , the change in the means vector can be represented in three stages:
The chart for best performance will be the one that more rapidly detects, from a instant of time t = T, change in the mean value of the quality characteristics that are being monitored.
In the graphical evaluation of the effect of autocorrelation, it was considered that the displacement is described by equation (19). For example, in a bivariate process, the occurrence of a special cause displaces the vector of means to a new level . In Sections 4.1 and 4.2, the graphical variation is presented along with the under-control process and with the out-of-control process , respectively.
In an autocorrelation-free process, and =0,7, we have . The ellipse representing the distribution level curve for =10,5966 is illustrated in Figure 2.
Figure 2. Ellipse: and =0,7
Source: The authors.
Figure 3. Ellipse: and =0,7
Source: The authors.
Generalizing, for , it can be observed in Figure 4 a graphical demonstration in which the greater the autocorrelation, the greater the elliptic region, that is, autocorrelation increases the variability of the variables of the process under monitoring.
Figure 4. Ellipses: e =0,7
Source: The authors.
If the data are normally distributed, the ellipses of Figure 4 represent all points equidistant, at the distance of Mahalanobis, from the origin. This suggests that all of these points are equally likely to be governed by a multivariate normal distribution centered at (0,0), since =0. In the Hotelling chart, the control limit (UCL) equal to =10,5966, generates, on average, a false alarm for every 200 samples collected when . The same does not occur when , i.e. the average false alarm rate does not correspond to an alarm for every 200 samples collected, even if the UCL value used is 10.5966. In practice, this means that, when we use the Hotelling chart, considering the UCL of the chi-square graph with p degrees of freedom in the presence of autocorrelation, will give us a false alarm rate different from that desired.
Figure 5 shows a process free of autocorrelation with and =0,7. The dashed ellipse with center in (0,0) represents an under-control process and its equation is =10,5966. The other ellipses represent the occurrence of a special cause that displaces the mean vector to a new level:
Figure 6 shows a process with autocorrelation with and =0,7. The dashed ellipse with center in (0,0) represents a under-control process and its equation is: . The value 10.06 was used to make a fair comparison that, in the presence of autocorrelation, keeps the false alarm mean rate equal to one alarm per 200 samples. The other ellipses represent the occurrence of a special cause that moves the mean vector to a new level:
Figure 5. Ellipses: and =0,7
Source: The authors.
Figure 6. Ellipses: e =0,7
Source: The authors.
In Figure 5, it can be observed that, in processes without autocorrelation, the displacement in the vector of means caused by a special cause is represented by the ellipses that move away from the center in (0,0), characterizing that the chart, in this case, presents superior performance in relation to the process in which autocorrelation is present. In Figure 6, the ellipses present greater resistance to remain close to the center at (0,0), when displacements that mismatch the mean vector occur, meaning that the performance of the chart is lower when autocorrelation is present.
This article evaluated the effect of autocorrelation on the control chart as one of the most popular tools in the academic and industrial environment. The distance from Mahalanobis, the same statistic used in the chart, was used to represent geometrically the behavior of a process in the presence and absence of special causes that affect the average value of the quality characteristics monitored.
The violation of the autocorrelation hypothesis should be taken seriously and verified before the use of the graphic control statistical tool, since the presence of autocorrelation affects the performance of traditional control charts, reducing the ability to detect deviations in the mean vector.
The use of ellipses illustrated how the data of a process behave in the presence of autocorrelation, masking the effect of the displacement that occurs when the quality characteristics said in statistical control shift to the situation of out of statistical control.
It is suggested, in future works, the presentation of statistics or techniques that improve performance of control charts in the presence of autocorrelation.
Alt, F. B. (1985), Multivariate control charts. Encyclopedia of Statistical Sciences. Kotz. S. Johnson. N. L. Eds., Wiley.
Aparisi, F. (1996), “Hotelling’s T2 control chart with adaptive sample sizes”, International Journal of Production Research, Vol. 34. pp. 2853-2862.
Aparisi, F. et Haro C.L, (2001), “Hotelling’s T2 control chart with variable sampling intervals”, International Journal of Production Research, Vol. 39. pp. 3127-3140.
Apley, D.W. et Tsung F. (2002), “The autoregressive T2 chart for monitoring univariate autocorrelated processes”, Journal of Quality Technology, Vol. 34. pp. 80-96.
Arkat. J., Niaki. S.T.A., Abbasi. B. (2007), “Artificial neural networks in applying MCUSUM residuals charts for AR(1) processes”, Applied Mathematics and Computation, Vol. 189. pp. 1889-1901. ARKAU. Bersimis, S., Psarakis, S., Panaretos, J. (2007), “Multivariate Statistical Process Control Charts: An Overview”, Quality and Reliability Engineering International, Vol.23, pp. 517-543.
Biller, B. et Nelson. B. (2003), “Modeling and generating multivariate time-series input processes using a vector autoregressive technique”, ACM Transactions on Modeling and Computer Simulation, Vol. 13. No.3. pp. 211-237.
Chen, S. et Nembhard, H.B. (2011), “Multivariate cuscore control charts for monitoring the mean vector in autocorrelated process”, IIE Transactions, Vol. 43. pp. 291-307.
Costa, A. F. B., Epprecht, E.K., Carpinetti, L.C.R. (2005), Controle Estatístico de Qualidade. 2a. ed., São Paulo: Editora Atlas.
Hotelling, H. (1947), “Multivariate quality control, illustrated by the air testing of sample bombsights”, Techniques of Statistical Analysis, pp.111-184. New York, McGraw Hill.
Hwarng, H.B. et Wang. Y. (2010), “Shift detection a source identification in multivariate autocorrelated process”, International Journal of Production Research, Vol. 48. No. 3. pp.835-859.
Issam, B.K. et Mohamad. L. (2008), “Support vector regression based residual MCUSUM control chart for autocorrelated process”, Applied Mathematics and Computation, Vol. 201. pp. 565-574.
Jarrett, J.E. et Pan. X. (2007), “The quality control chart for monitoring multivariate autocorrelated processes”, Computational Statistics & Data Analysis, Vol. 51. pp. 3862-3870.
Jiang, W. (2004), “Multivariate control charts for monitoring autocorrelated processes”, Journal of Quality Technology, Vol. 36. pp. 367-379.
Kalgonda, A.A. (2012), “A Note on generalization of Z Graph”, Journal of Academia and Industrial Research, Vol. 1. No.6. pp. 286-289.
Kalgonda, A.A. et Kulkarni. S.R. (2004), “Multivariate quality control chart for autocorrelated processes”, Journal of Applied Statistics, Vol. 31. pp. 317-327.
Kim, S.B., Jitpitaklert. W., Sukchotrat. T. (2010), “One-Class Classification-Based Control Charts for Monitoring Autocorrelated Multivariate Processes”, Communications in Statistics - Simulation and Computation, Vol. 39. No.3. pp. 461-474.
Ltkepohl, H. (2007), New Introduction to Multiple Time Series Analysis. New York: Springer.
Mahalanobis, P.C. (1936), In Proceedings National Institute of Science, India, Vol. 2, No.1. pp. 49-55.
Mason, R. et Young, J.C. (2002), Multivariate statistical process control with industrial applications. Alexandria. Society for Industrial and Applied Mathematics.
Mastrangelo, C.M. et Forrest, D. R. (2002), “Multivariate Autocorrelated Processes: Data and Shift Generation”, Journal of Quality Technology, Vol. 34, No. 2. pp. 216-220.
Montgomery, D.C. (2004), Introduction to statistical quality control. John Wiley & Sons. Inc. New York. New York.
Niaki, S.T.A. et Davoodi, M. (2009), “Designing a multivariate-multistage quality control system using artificial neural networks”, International Journal of Production Research, Vol. 47. pp. 251-271.
Pan, X. et Jarrett, J.E. (2007), “Using vector autoregressive residuals to monitor multivariate processes in the presence of serial correlation”, International Journal of Production Economics, Vol. 106. pp. 204-216.
Pfaff, B. (2008), “VAR, SVAR and SVEC models: implementation within r package vars”, Journal of Statistical Software, Vol. 27. No. 4. pp. 204-216.
Shewhart, W.A. (1931), Economic control of quality of manufactured product. 1ª Ed. New York: D. Van Nostrand Company.
Shumway, R. H. et Stoffer. D. S. (2006), “Time Series Analysis and Its Applications: With R Examples. 2ª Ed. New York: Springer.
Vargas, M., Alfaro, J.L., Mondéjar, J. (2009), “On the run length of a state-space control chart for multivariate autocorrelated process data”, Communications in Statistics - Simulation and Computation, Vol. 38. pp. 1823-1833.