Algebra for the Forward Search
Author | Francesca Torti - Marco Riani - Anthony C. Atkinson - Domenico Perrotta - Aldo Corbellini |
Profession | European Commission, Joint Research Centre (JRC) - University of Parma, Italy - London School of Economics, UK - European Commission, Joint Research Centre (JRC) - University of Parma, Italy |
Pages | 8-8 |
which is specified in advance. The LTS estimate is intended to minimize the sum of squares of the
residuals of hobservations. For LS, h=n. In the generalization of Least Median of Squares (LMS,
Rousseeuw, 1984) that we monitor, the estimate minimizes the median of hsquared residuals.
2. Adaptive Hard Trimming. In the Forward Search (FS), the observations are again hard trimmed, but
the value of his determined by the data, being found adaptively by the search. Data analysis starts
from a very robust fit to a few, carefully selected, observations found by LMS or LTS with the minimum
value of h. The number of observations used in fitting then increases until all are included. (See
Atkinson and Riani, 2000 and Riani et al., 2014c for regression, Atkinson et al., 2010 for a general
survey of the FS, with discussion, and Cerioli et al., 2014 for results on consistency).
3. Soft trimming (downweighting). M estimation and derived methods. The intention is that observations
near the centre of the distribution retain their value, but the ρfunction ensures that increasingly remote
observations have a weight that decreases with distance from the centre.
We shall consider all three classes of estimator. The FS by its nature provides a series of decreasingly robust
fits which we monitor for outliers in order to determine how to increment the subset of observations used in
fitting. For LTS and LMS we fit the regression model to increasing sized subsets h. For S estimation, which
we use as our example of soft trimming, we look at fits as the breakdown point varies. Here our focus is on
SAS programs.
3. Algebra for the Forward Search
Examples and a discussion of monitoring using the MATLAB version of FSDA are in Riani et al. (2014a).
To describe the SAS procedures which are the subject of this paper, needs fuller details than are given there.
It is convenient to rewrite the regression model Eq. (1) in matrix form as y=Xβ +ǫ, where yis the
n×1vector of responses, Xis an n×pfull-rank matrix of known constants (with ith row x⊤
i), and βis a
vector of punknown parameters.
The least squares estimator of βis ˆ
β. Then the vector of nleast squares residuals is e=y−ˆy=y−Xˆ
β=
(I−H)y, where H=X(X⊤X)−1X⊤is the ‘hat’ matrix, with diagonal elements hiand off-diagonal elements
hij . The residual mean square estimator of σ2is s2=e⊤e/(n−p) = Pn
i=1 e2
i/(n−p).
The forward search fits subsets of observations of size mto the data, with m0≤m≤n. Let S∗(m)be
the subset of size mfound by the forward search, for which the matrix of regressors is X(m). Least squares
on this subset of observations yields parameter estimates ˆ
β(m)and s2(m), the mean square estimate of σ2
on m−pdegrees of freedom. Residuals can be calculated for all observations including those not in S∗(m).
The nresulting least squares residuals are
ei(m) = yi−x⊤
iˆ
β(m).(2)
The search moves forward with the augmented subset S∗(m+ 1) consisting of the observations with the
m+ 1 smallest absolute values of ei(m). In the batch algorithm of §8 we explore the properties of a faster
algorithm in which we move forward by including k > 1observations.
To start we take m0=pand search over subsets of pobservations to find the subset that yields
the LMS estimate of β. However, this initial estimator is not important, provided masking is broken. Our
computational experience for regression is that randomly selected starting subsets also yield indistinguishable
results over the last one third of the search, unless there is a large number of structured outliers.
To test for outliers the deletion residual is calculated for the n−mobservations not in S∗(m). These
residuals, which form the maximum likelihood tests for the outlyingness of individual observations, are
ri(m) = yi−x⊤
iˆ
β(m)
ps2(m){1 + hi(m)}=ei(m)
ps2(m){1 + hi(m)},(3)
where the leverage hi(m) = x⊤
i{X(m)⊤X(m)}−1xi. Let the observation nearest to those forming S∗(m)be
imin where
imin = arg min
i /∈S∗(m)|ri(m)|.
To test whether observation imin is an outlier we use the absolute value of the minimum deletion residual
rmin(m) = eimi n(m)
ps2(m){1 + himi n(m)},(4)
as a test statistic. If the absolute value of (4) is too large, the observation imin is considered to be an outlier,
as well as all other observations not in S∗(m).
8
To continue reading
Request your trial