Internal and External Validity
In the previous blog, I attended to specific types of validity that need to be addressed when applying a quantitative research design, including content validity, face validity, criterion-related validity, and construct validity.
Photo by Jo Szczepanska on Unsplash
As if that is not enough, one also has to design a methodology around internal validity, or estimate the extent to which conclusions about relationships between variables are likely to be true based on the measures used, the research setting, and the whole research design, and external validity, or the extent to which one may generalize from the sample studied to the target population defined as well as other populations in time and space.
Experimental techniques involve measuring the effect of an independent variable on a dependent variable under highly controlled conditions, for example, one measures how stressed a participant is before and after an intervention. Such designs usually allow for high degrees of internal validity. There are a number of extraneous factors, however, that may threaten the internal validity of even an experimental design:
- History factors pertain to specific events that occur between first and second measurements in addition to the experimental variables. For example, when seeking to measure the effectiveness of a post-traumatic stress intervention, a traumatic event between the pre-intervention and post-intervention tests may affect the degree to which the intervention offered can be said to be effective.
- Maturation factors pertain to processes that occur within participants due to the passage of time as opposed to specific events. For example, participants becoming hungry and tired between a pre- and post-tests on the same day may well affect the results. If measuring the effectiveness of meditation for lowering stress levels, for instance, by the time the post-test is executed, the participants may be feeling exhausted or bored or be worrying about what is going on at home due to their prolonged absence, and that may affect their responses.
- Testing factors pertain to the effects of taking a test upon the scores of a second test, particularly if the same test is used to compare pre-test and post-test scores in, for example, language proficiency tests where one gives the test, gives a lesson, and then uses the same test to assess the change in participants’ proficiency. In this instance, it might be preferable to use and compare the results of two tests that have been shown to have high convergent validity.
- Instrumentation factors pertain changes in the calibration of a measurement tool. For example, when using a peak flow meter to measure the force of the breath of a person suffering from asthma, it is advised to use the same brand of peak flow meter because different brands have different calibrations. In other words, if one was measuring the effectiveness of a medication for treating asthma using a different brand of peak flow meter, the pre- and post-medication measures obtained would likely be misleading. Likewise, an observer may have read more about a topic between pre- and post-intervention observations and note aspects in the post-treatment phase that he or she would not have thought to note in pre-treatment phase. There may thus be changes noted that are not a product of the intervention or treatment but a product of the observer’s increased knowledge.
- Statistical regression factors occur where participants with extreme scores are included in the analysis. Most often in statistical analyses, especially a Pearson product-moment correlation, outliers, or participants with extreme scores, would be excluded from the analysis.
- Selection factor biases occur due to differential selection of participants, which is why most quantitative research designs, ideally, use random samples and/or the criteria for selection are discussed in detail upfront. For example, selecting volunteers from social media may point to a particular demographic because not everyone participates on social media platforms, for example, Generation X rather than Baby Boomers and/or people who are unemployed or underemployed and have the time to complete questionnaires. Moreover, the characteristics of those who volunteer versus those who do not may differ. Selection bias threatens the generalizability of the data unless one is looking at a construct that is peculiar to the demographic.
- Experimental mortality pertains to the differential loss of respondents from the comparison groups. For example, if one is examining adolescent development in a longitudinal research design, the chances are that over the five years’ duration of the research, some adolescents who participated in the first stages of the study may move away or lose interest.
Four factors might jeopardize external validity or representativeness of one’s research findings:
- Reactive or interaction effect of testing is where a pretest might increase the scores on a post-test because practice makes perfect. This threat may be overcome at least to some degree, by comparing the pretest and post-test means for the sample, or by using an equivalent test with high convergent validity.
- Reactive effects of experimental arrangements may also affect the external validity of one’s findings. Often experimental settings are artificial, and one cannot ignore the Hawthorne effect, i.e., when people know they are being observed, contributing to research data, or having their personality assessed, their behavior may change. Moreover, we may ask that participants answer the questions as honestly as possible, but that does not guarantee they will answer honestly. It may also be the case that questions are interpreted differently by different people. For example, with respect to a question like, “Do you often feel angry?” How often is often? My often may not be your often. Even, for the statement, “I feel angry most of the time,” what does most mean? Ultimately, the results of any experimental design, even in hard science, can be questioned based on the fact that the experimental situation can only ever approximate reality.
- Multiple-treatment interference occurs when the effects of earlier treatments are not erasable. Moreover, a participant may be participating in treatments other than the one being tested—one might be testing for the merits of meditation as a stress reliever, but the participant may also be engaged in therapy as well as practice a host of other means to alleviate their stress, and one cannot then be sure that it was the meditation that reduced their stress level or indeed, whether being in therapy interfered with the efficacy of medication because therapy often only works if there is a certain degree of anxiety present on the part of the patient.
- The interaction effects of selection biases and the experimental variable may also threaten the validity of a research. Clearly, selection biases may negatively affect both the internal and external validity, so it is critical that you think about how you will select participants carefully and to what extent those selection criteria will limit the generalizability of the research findings. For example, it may be true that students are more likely to embrace remote work, but if the only participants selected to participate are students, one is limited from generalizing the findings to people on the edge of retirement. In most instances, perfectly random selections are not possible because it would require a list of everyone the population of interest in order to select a random sample from that population.
Finally, there is ecological validity, or the extent to which the results the research can be applied to real life situations. For example, an actual driving test would have more ecological validity than would a simulated driving test.
In most instance, a methodology chapter would include a section devoted to discussing the particular threats to the validity of a research design and the extent to which the findings may be limited by the selection of participants, the methods of measurement chosen, and the context in which the research will be or was undertaken. The conclusion to the research would remind the reader about the threats to the validity of the findings so that the reader can take those limitations into account when generalizing the findings.
The most important aspect to remember when discussing the extent to which validity issues pertain is to discuss only those issues that pertain to your research. For example, a once-off cross-sectional design would not be subject to maturation factors or participants dropping out, whereas discussion of these issues is critical in a longitudinal design. So be clear about which types of validity apply to your research and focus on the threats to validity to your particular research design and selection criteria in order to make the issues and how you will deal with those threats explicit.
Remember, too, that there is no perfect research design, so it is a question of being aware of what kinds of threats to the validity of your findings exist, making your reader aware of those threats, developing strategies to minimize those threats, and then being honest about the extent to which your findings can be depended upon for making decisions.