Data processing

When survey data colle ction was completed, aseries of
steps were underta ken to ensure the quality and accu-
racy of the data colle cted. The steps covered validation
rules to monitor error s and assess response pat terns
ref‌lecting inconsistencies, as well as cleaning-up strate-
gies to identify an d further hand le suspicious entri es.
Finally, weighting adjus tments were applie d to data
to correct for potential over-/under-representation of
8.1. Data validation
To validate the correctness an d to ensure the high qual-
ity of the data collec ted aseries of s teps was under-
taken (see Sections 8.1.1–8.1.5). This included test s to
evaluate the length of ti me respondents took to answer
the questionnaire and checks for consistency/logic and
to detect falsif‌icati on attempts and dupli cate responses.
Each validation tes t was applied at respon dent level.
Adecision about each completed questionnaire’s valid-
ity was made based o n the results of the val idation
tests and on the basi s of aset of pre-specif‌ied cri teria
(see Section 8.1.6). Questionnaires that were asse ssed
as erroneous, suspicious or inconsistent were excluded
from the cleaned datas et.
The uncleaned dataset of 141621 responses was
validated and edited. This resulted in acleaned
dataset of 139799 responses. This comprises the
final data that were used for analytical purposes.
The following section elaborates on the validation tests
and the decision cr iteria, which were examined in com-
bination to assess each ca se.
8.1 .1 CAPTCHA score
CAPTCHA is acomputer syste m intended to distinguish
human from machin e input. An invisib le reCAPTCHA
was used to detect app lications from bots (23). Fo r
those users who were actually eligible to participate
in the survey, the CAPTCH A score was recalculated just
before questionna ire submission and only entries wi th
scores greater than 0.3 were accepted– plausible v al-
ues ranged from 0 (representing almost certainly an
automated completion of t he survey) to 1 (representing
almost certai nly an authentic completion of th e survey
by ahuman respondent).
8.1.2 Questionnaire duration
This step was meant to detec t respondents completing
the survey too quickly. Short questionnaire durations
raise suspicions of li mited accuracy (responde nts may
have not read or answered the ques tions with caution).
Total time spent on questionnaire completion is the
difference between th e starting ti me and the time of
submission. Very l ong times (typicall y exceeding 100
minutes) can be explai ned by interrupted completion or
by late submission due to tec hnical issues. In contrast,
short times are sus picious and were subject to further
Respondents were divi ded into categories created as
combinations of respondent categories (lesbian, gay
(23) A bot or aweb robot is asoft ware application tha t runs
automated ta sks over the internet. I n this way, the malicious
deployment o f bots aims to imitate or re place the behaviour
of human user s. Similar program mes may be used to imitate
and reproduce i n arepetitive way and at ahig h rate the
completion of aq uestionnaire by ala rge number of survey
respondent s in an attempt to falsif y asurvey and inf‌lue nce
its outcomes o r annul its validi ty and scope.
Data processing
A long way to go for LGBTI equality — Technical report
and bisexual; tra ns; and intersex) and the nu mber of
different types of i ncidents (physical/sexual att ack, har-
assment, discrim ination). The number of different t ypes
of incidents was ex pected to affect t he questionnai re
duration because incidents experienced increased the
number of survey qu estions that had to be completed
and thus led to longe r expected compl etion times.
Respondent catego ry was relevant beca use of the
additional questionnaire sections for trans and intersex
respondents. Intersex respondents who also identif‌ied
as trans were asked to complete both th e intersex sec-
tion and the trans se ction. Total completio n time was
studied separately for each combination of respond-
ent category and n umber of incident s. Cut-off times
that def‌ined the min imum duration needed to pas s the
total duration test were cho sen based on expertise fro m
similar surveys i n such away that the same proportion
of respondents were iden tif‌ied as ‘speeders’ among the
LGBTI groups. Arespond ent was:
identified as aspeede r (i.e. fail) if the questionnaire
duration was less th an or equal to 0.7 percentile of
the duration distribution;
flagged with a warnin g if the question naire dura-
tion was between 0.7 and 1 percenti le;
identified as a non-speed er (i.e. pass) if the ques-
tionnaire durati on was greater than 1 percentile.
Due to their limited n umber, intersex respondents were
categorised only b etween those who do or do not iden-
tify as trans peo ple, and the numbe r of incidents was
not used in the analy sis of their survey completion ti me.
Table 16 shows the selected cut-off points.
Furthermore, the questionnaires were also evaluated
on the basis of partial durations in six questionnaire
sections. The s ections were selec ted in away t hat
minimises the effect of routing and includes questions
answered by the majorit y of respondents. Analog ically
to the approach used wit h the total questionnaire time,
the 0.7 and 1 percentiles were chosen to i dentify speed-
ers (fail) and non-spee ders (pass) and to give inter-
mediate warnings . Each respondent ended up with six
f‌lags, showing the outcom e of the test for each section.
The f‌lags were combined in to asingle f‌lag summarising
performance across a ll sections (Table 17).
Table 16. Cut-off durations in seconds defining speeding: fail if [min,0.7], warning if (0.7,1] and pass if (1,max]
category Percentile Number of incidents
0 1 2 3
Lesbian, gay
and bisexual
0.7 368 (6 min) 4 20 (7min) 473 (8min) 545 (10min)
1 382 (6min) 436 (7min) 490 (8min) 567 ( 10min)
Trans 0.7 419 (7min) 460 (8min) 533 (9min) 595 (10min)
1 429 (7min) 477 (8min) 552 (9min) 608 (10min)
Intersex 0.7 393 (7min)
1407 (7min)
and trans
0.7 331 (6min)
1361 (6min)
Table 17. Decision rule for combining partial duration speeder tests in six sections
Final f‌lag Intermediate f‌lags from six sections
No of section speeders No of section warnings
Non-speeder 01
Warning 0 [2, 3, 4]
Speeder 05
Non-speeder 1 0
Warning 1 [1, 2, 3]
Speeder 14
Warning 21
Speeder 22
Speeder 3 0

To continue reading

Request your trial

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT