Timing comparisons

AuthorFrancesca Torti - Marco Riani - Anthony C. Atkinson - Domenico Perrotta - Aldo Corbellini
ProfessionEuropean Commission, Joint Research Centre (JRC) - University of Parma, Italy - London School of Economics, UK - European Commission, Joint Research Centre (JRC) - University of Parma, Italy
Pages16-17
0 1 5 10 15 20 40 60 80 100
0
5
10
15
% outside
whiskers
-0.02
0
0.02
bias for slope
and intercept
Figure 6: Top panel. Boxplots showing, for different values of k(the fs_step parameter, on the x-axis), the bias
and dispersion of the estimated slopes and intercepts (respectively from left to right for each k). The estimates
are obtained from 500 simulated datasets of 5,150 observations. Bottom panel: percentage of estimated values
lying outside the boxplot whiskers for slope (blue asterisks) and intercept (black circles).
of the box seem even to become smaller for increasing k, may be interpreted as a reduced capacity of the
batch FS to capture the f‌ine grained structure of the data when kis too large.
The stability of the batch procedure can also be appreciated by looking, in the bottom panel of the
f‌igure, at the number of estimated slopes/intercepts outside the boxplot whiskers: up to k= 10 there is no
appreciable increase with respect to the standard FS with k= 1; between k= 10 and k= 20 the increase
is still contained to 5%; then the number of bad estimates rapidly increases exceeding 10%. Finally, there
is no evidence of major failure of the batch FS to reject outliers, which would be shown by occasional large
values of bias.
9. Timing comparisons
We now describe the results of an assessment of the computational benef‌it of the new batch Forward Search
approach available only in SAS, in comparison with the standard SAS and FSDA MATLAB implementations.
We tested the functions on a workstation with a CPU 2 x Xeon E5-262v4 (2.6GHz 4cores), two RAM of
32GB DDR4 2400 ECC, and a Disk SSD of 512GB, equipped with MATLAB R2018b and SAS 9.4.
Figure 7 shows the elapsed time needed for analysing simulated datasets of dif‌ferent sizes (from 30
to 100,000), when f‌itting one explanatory variable. The results are split into three panels for small (n=
30, . . . , 1,000), medium (n= 2,000,...,15,000) and large data sizes (n= 20,000,...,100,000). The
bottom-right panel gives the ratio between the time required by the MATLAB implementation and the
two SAS ones. For small samples, the FSDA MATLAB implementation (orange squares) is faster than
the standard SAS implementation (blue diamonds), but there is a crossing point at a sample size between
n= 800 and n= 900 where the latter starts to perform better. The advantage of using the SAS function
increases for larger sample sizes. For example in a sample of 50,000 observations SAS is about 7times
faster. The batch option in SAS (red circles), with k= 20, is even faster: 12 times faster in a sample of
50,000 observations; note that in Figure 7 the batch results are reported only for n20,000, because the
computational benef‌it for smaller nvalues would not “compensate” the loss in statistical accuracy due to
the approximate batch solution.
The bottom-left panel shows that the standard SAS and FSDA MATLAB implementations crash (because
of memory limits) when the sample sizes exceed 50,000 observations. Only the SAS batch algorithm seems
to cope with larger datasets (n= 100,000 in the f‌igure), which however, requires about 3.5 hours to
terminate.
Finally, by interpolating the time values in the three cases with a quadratic curve – the time complexity for
producing nstatistics for nsteps is expected to be O(n2)– we found the following approximate coef‌f‌icients
for the quadratic terms: 1.23 ·105for the MATLAB implementation, 1.98 ·106for the SAS standard
implementation, 7.17 ·107for the SAS batch implementation (the last one f‌itted on all nvalues, not
reported in the Figure). This ranking might be used to extrapolate the computational performances of the
three FSR implementations for nvalues not considered here, on hardware conf‌igurations that can cope with
16

To continue reading

Request your trial

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT