Testing for Outliers

AuthorFrancesca Torti - Marco Riani - Anthony C. Atkinson - Domenico Perrotta - Aldo Corbellini
ProfessionEuropean Commission, Joint Research Centre (JRC) - University of Parma, Italy - London School of Economics, UK - European Commission, Joint Research Centre (JRC) - University of Parma, Italy
Pages9-9
4. Testing for Outliers
The test statistic (4) is the (m+ 1)st ordered value of the absolute deletion residuals. We can therefore use
distributional results to obtain envelopes for our plots. The argument parallels that of Riani et al. (2009)
where envelopes were required for the Mahalanobis distances arising in applying the FS to multivariate data.
Let Y[m+1] be the (m+ 1)st order statistic from a sample of size nfrom a univariate distribution with
c.d.f. G(y). Then the c.d.f of Y[m+1] is given exactly by
P{Y[m+1] y}=
n
X
j=m+1 n
j{G(y)}j{1G(y)}nj.
(5)
See, for example, Lehmann (1991, p. 353). We then apply properties of the beta distribution to the RHS
of (5) to obtain
P{Y[m+1] y}=IG(y)(m+ 1, n m),(6)
where Ip(A, B)is the incomplete beta integral. From the relationship between the Fand the beta distribution
equation (6) becomes
P{Y[m+1] y}=PF2(nm),2(m+1) >1G(y)
G(y)
m+ 1
nm,(7)
where F2(nm),2(m+1) is the Fdistribution with 2(nm)and 2(m+1) degrees of freedom (Guenther , 1977).
Thus, the required quantile of order γof the distribution of Y[m+1], say ym+1,n;γ, is obtained as
ym+1,n;γ=G1(q) = G1m+ 1
m+ 1 + (nm)x2(nm),2(m+1);1γ,(8)
where x2(nm),2(m+1);1γis the quantile of order 1γof the Fdistribution with 2(nm)and 2(m+ 1)
degrees of freedom.
In our case we are considering the absolute values of the deletion residuals. If the c.d.f of the tdistribution
on νdegrees of freedom is written as Tν(y), the absolute value has the c.d.f.
G(y) = 2Tν(y)1,0y ∞.(9)
The required quantile of Y[m+1] is given by
ym+1,n;γ=T1
mp{0.5(1 + q)},
where qis def‌ined in Eq. (8). To obtain the required quantile we call an inverse of the Fand than an inverse
of the tdistribution.
If we had an unbiased estimator of σ2the envelopes would be given by ym+1,n;γfor m=m0, . . . , n 1.
However, the estimator s2(m)is based on the central mobservations from a normal sample – strictly the
mobservations with smallest squared residuals based on the parameter estimates from S(m1). The
variance of the truncated normal distribution containing the central m /n portion of the full distribution is.
σ2
T(m) = 1 2n
mΦ1n+m
2nφΦ1n+m
2n,(10)
where φ(.)and Φ(.)are respectively the standard normal density and c.d.f. See, for example, Johnson et al.
(1994, pp. 156-162) and Riani et al. (2009) for a derivation from the general method of Tallis (1963). Since
the outlier tests we are monitoring are divided by an estimate of σ2that is too small, we need to scale up
the values of the order statistics to obtain the envelopes
y
m+1,n;γ=ym+1,n;γT(m).
To be specif‌ic, in the case of the 99% envelope γ= 0.99, corresponds to a nominal pointwise size
α= 1 γwhich is equal to 1%. We expect, for the particular step mwhich is considered, to f‌ind
exceedances of the quantile in a fraction 1% of the samples under the null normal distribution. We however
require a samplewise probability of 1% of the false detection of outliers, that is over all values of mconsidered
in the search. The algorithm in the next section is accordingly designed to have a size of 1%.
9

To continue reading

Request your trial

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT