Ideally, if one wants to know the answer to a research question, for example, to what extent is people’s buying behavior is influenced by an advertisement, one will need to access everyone who viewed that advertisement.
Photo by Firmbee.com on Unsplash
Accessing everyone concerned, however, is well-nigh impossible; therefore, when conducting research, one accesses a sample or portion of those who viewed the advertisement.
The question that arises then is how representative is that sample because if it is not representative of the population who have seen that advertisement, the ability to generalize the results of the research will be limited. That brings us to a discussion of the various types of samples and their limitations for generalizing the results of a research.
There are basically two types of samples: probability sampling and nonprobability sampling. For the most part, quantitative methods, or methods that crunch numbers, require a probability sample.
Four kinds of probability sampling exist (Shin, 2020), all of which require a sampling frame, in other words, a data base of elements that pertain to the population on which the research is focused and from whom (or which if you are doing research in the life sciences) you will gather a sample.
In probability sampling, the size of the sample counts. A general rule is the smaller the sampling frame, the higher the percentage of participants chosen because many statistical equations require a certain number of responses(f) to be executed. For example, a chi-square test requires an expected frequency of more than 5 for any cross-tabulated variable (cell), so larger samples are essential (Gravetter & Wallnau, 2005).
The most representative sample would be a simple random sample. In other words, every person in the population in which you are interested has an equal change of being chosen to participate in the research. For example, if you were interested in the needs of homeowners, you might access a list of all rate payers in a city. Rate payers in your city would be the sampling frame. Likewise, if I was researching the extent to which patients are satisfied with their treatment at a specific hospital, I would access the database of everyone who has visited that hospital over the last three years (sampling frame), assign a number to each of those patients, and then choose a sample using a random sample table or generator. That ensures that every patient has an equal chance of being chosen.
Even simple random sampling, however, may not be as random as one thinks. Some patients may have died over the last three years and others moved without providing a forwarding address. In other instances, the sampling frame may not be ideal. For example, if one is trying to establish the market for fridges in a particular area and the sampling frame to which one has access is ratepayers, ratepayers may have a different demographic from those who do not pay rates by virtue of not being able to afford to buy a home, so in the end, the data represents only the market for fridges among people who own homes. If one wants a more accurate view, one must find a different sampling frame, for example, people who access electricity because that would include both home and apartment dwellers.
A second sampling option is a systematic random sampling strategy, where one would sample every, for example, 5th or 10th person on the list in a database. For example, I might use a data providers’ list of smartphone numbers and call every 10th number or begin with a map of the suburbs and visit every fifth house on every fifth block to establish if they have seen the advertisement, and if so, to what extent their buying behavior was influenced by the advertisement. Arguably, approaching every fifth person entering a mall or shopping center, or even store, would be also be considered systematic random sampling.
The random nature of systematic random sampling may be compromised by participants choosing not to answer an unknown number or having the phone put down or door slammed in one’s face. And there is no guarantee people will respond to emails requesting their participation, so very often a systematic random sample is not random, but a volunteer sample, in other words, a sample of people willing to participate in the research.
So, given that simple and stratified random samples may not be possible because a suitable sampling frame may not exist and/or those chosen to participate choose not to participate, the next best bet is a stratified random sample. Here one divides the population into groups with similar attributes (Health Knowledge, n.d.), for example, people living in standalone homes and people living in apartments, and randomly samples each group. Or if I am exploring the effects of an advertisement for a particular fridge, I might access an electrical company’s customer database and randomly sample only those who buy or use a certain number of units of electricity because it takes a certain number of units to run a fridge in addition to other electrical appliances. Stratified random sampling is also useful if one want to make comparisons, for example, in a study of the health outcomes of nursing staff in a country, if there are seven hospitals each with different numbers of nursing staff, it would be appropriate to sample numbers from each hospital proportionally, so the hospitals with more nursing staff constitute a larger proportion of the sample. And if I am going to use chi-square, I best ensure the samples from smaller hospitals are large enough to ensure that on any cross-tabulation, the expected frequency is more than 5.
A final probability strategy is cluster sampling. For example, if I am conducting research in education about the efficacy of a particular Math module, there may be five classes at one school using that module and seven at another school, and I might choose just one class from each school based on the assumption that the classes not chosen would demonstrate the same dynamics as the classes I chose for my sample.
Non-probability sampling includes convenience sampling, quota sampling, purposive sampling, and snowball sampling.
Quota sampling is a strategy most often used by market researchers. They are given a quota of specific types of people to select, for example, the research question might be who is buying nonfiction books, and interviewers are asked to recruit a certain number of adolescents, young adults, and adults over the age of 40 based on the proportion of those categories of people in the general population; so, ideally, the sample would be representative of the proportions of those age groups in the population.
Convenience sampling is often used in the social sciences and humanities because (a) participants are protected by the ethic of informed consent and (b) participants have the right to withdraw from a participation at any point and without prejudice. So, convenience sampling is the easiest way to recruit available and willing participants.
With convenience samples, the representativeness is severely compromised because those who volunteer to participate may have very different profiles from those who do not. For example, social media has become a popular means for distributing surveys, but one has to bear in mind that not everyone uses Facebook and/or Instagram and that those who volunteer to participate may be people with time on their hands rather than busy professionals. That may skew the sample towards people who are not employed or are underemployed, and the less random the nature of the sample, the more unreliable the statistical manipulations. However, while conveniences samples may not be appropriate for asking how many and to what extent, they can answer the what, how, and why questions or assist with describing components and processes and explaining their connections.
A third non-probability sampling strategy for gathering a sample is purposive sampling, which means earmarking specific individuals who would make suitable participants and inviting them to participate. This sampling strategy is most often used in qualitative research, the assumption being that the person can comment on the focus of the research. For example, if one is exploring the meaning of boredom, it would be pointless to include people who claim that they are never bored in one’s sample. Purposive sampling is the most time- and cost-effective strategy, but the least representative and generalizable. It is also the most time-consuming data to process. One can apply software that helps, but software helps: it does not distill the understanding for one.
Snowball sampling is generally used in social sciences to access groups that are difficult to reach. For example, before it became trendy to be part of the LGTB+ community, one would ask an interviewee to nominate two members of the community who would be willing to share their experiences or opinions, and those two would in turn nominate two, and thus the sample would grow. The danger of such samples is that one ends up examining a subculture of the culture on which one is focused.
Defining the Inclusion and Exclusion Criteria
When writing about the sampling strategy chosen, it is critically important to define both the inclusion and exclusion criteria for your sample. For example, if one is going to explore the effects of secondary trauma among neighborhood watch volunteers, it is critical that those participating have (a) experienced secondary trauma within a particular timeframe, (b) are active members of a neighborhood watch, and (c), are volunteers and not paid security personnel. If they are paid security personnel, that would be a reason to exclude them from the sample.
Likewise, if one is exploring the impact of being terminated from one’s employment for not having had the corporate-mandated jab, one would not interview those who not working in a corporate that mandated the jab, those who obeyed the mandate, or those have not had their employment terminated because they refused to take the jab. None of those potential participants would be able to speak about their experience of being terminated for that particular reason. Likewise, it would be pointless to ask people who have not viewed an advertisement how it affected them. So, for example, if one was using a survey method, the first filter question might be, “Have you viewed the said advertisement.” If not, the survey would be terminated with that person. Of course, information about how many people did and did not view the advertisement would be useful, but the latter could not offer an opinion about an advertisement they have not seen.
Samples, and the strategies used to choose those samples, are important because applying statistics and making valid claims about what the data says and then generalizing the findings of the research depend on a sample accurately representing the population in which you are interested and about which you are making claims. At the same time, sampling in the humanities and social sciences is subject to sampling bias that makes their representativeness questionable because ethically, a researcher has little control over who chooses to participate and/or drop out. Moreover, a sample may not be as random as assumed because the return rate for a questionnaire sent, even with a self-addressed and stamped envelope or on email, might be skewed towards those who have the time and motivation to complete the questionnaire.
There are ways and means of evaluating to what extent a sample is representative after the fact. One way is to compare the demographic data of the sample (age, education, income, gender, etc.) to the demographic of the general population you intend generalizing about, if such is available. That not only underlines the importance of collecting at least some basic demographic data, but also allows one to understand which categories of the population may skew the results. Knowing, for example, that people over the age of 60 are over-represented in a sample allows a researcher to temper the interpretation of the processed data. On the other hand, if one can show that the demographic of the sample matches that of the population on which one is focusing, it strengthens one’s ability to generalize the results to that population. So, collecting the relevant demographic facts about the sample is not just about being able to introduce the sample. It is also important because it allows one to assess the degree to which the sample represents the population in which one is interested.
Finally, it would be well to remember that sampling and the associated statistics, even for a probability sample, are based on probabilities, not certainties. So, even when people insist that one attend to the science, as if science offers truth, bear in mind that even hard science is not about proof or truth but about what is most probably true, all things considered.
Shin, T. (2020, Oct. 25). Four types of random sampling techniques explained with visuals. Towards Data Science. https://towardsdatascience.com/four-types-of-random-sampling-techniques-explained-with-visuals-d8c7bcba072a
Health Knowledge. (n.d.). Methods of sampling from a population. https://www.healthknowledge.org.uk/public-health-textbook/research-methods/1a-epidemiology/methods-of-sampling-population
Gravetter, F. J., & Wallnau, L. B. (2005). Essentials of statistics for the behavioral sciences (5th ed.). Wadsworth.