New Statistical Developments in Data Science by Unknown

New Statistical Developments in Data Science by Unknown

Author:Unknown
Language: eng
Format: epub
ISBN: 9783030211585
Publisher: Springer International Publishing


Keywords

Auxiliary variablesNon-probability samplingNon-response adjustmentRepresentativenessSelf-selection bias

1 Introduction

As response rates have declined over the past decades, the statistical benefits of probabilistic sampling have diminished. Assuming that a representative sample is initially selected, low response rates mean that those who ultimately supply the target data might not be representative. Moreover, with recent technological innovations, it is increasingly convenient and cost-effective to collect large numbers of highly non-representative samples via online surveys.

In the literature, there are many different interpretation of the ‘representativeness’ concept. See [6] for a thorough investigation of the statistical literature. Here we relate the concept of ‘representativeness’ to the possibility of obtaining, from the sample, results that tell us more or less what we would have found by measuring the whole population from which the sample has been selected. Of course this possibility implies the absence in the sampling process of unknown selective forces for whose some groups in the population are over or under represented, and these groups behave differently with respect to the survey variables. Although this definition is appealing, the validity of it can never be tested in practice since results for the whole population are unknown. Moreover as stated by [3] (on p. 286), there are various ways of selecting a sample, but only with random (probability) sampling it is possible to know how representative the sample results are likely to be. A weaker definition of the representativeness concept that can be tested in practise, whatever is the selection process of those who ultimately supply the target data, is that of ‘representativeness with respect to a set of auxiliary variables’. A representative sample with respect to one or more auxiliary variables is a sample in which the distribution of these variables is the same as in the population from which the sample is selected. In this paper, when we refer to this last concept of representativeness, we explicitly declare it.

The main problem caused by non-representative survey data is that estimators of population characteristics must be assumed to be biased unless convincing evidence to the contrary is provided. This problem influences the data coming from a probability sample affected by non-response and the data obtained with a convenience sample in the same way. Hence, in both the cases, the same quality indicators may be used in order to evaluate the impact of non-representativeness and the same post-survey adjustment methods may be used to deal with it.

In the remainder of this paper we just consider non-response but the points made for it also apply in general to all generation processes of non-representative survey data.

It is well known that non-response bias is the product of non-response rates and differences between respondents and non respondents on the statistic of interest. Of course previous to the survey the statistic of interest is unknown and when non-response occurs its value can be estimated only for respondents. Therefore the non-response bias cannot be assessed except through indirect measures based on more or less reasonable assumptions and on the use of data external to the survey.



Download



Copyright Disclaimer:
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.