The Hawthorne effect refers to people’s tendency to behave differently when they become aware that they are being observed. As a result, what is observed may not represent ‘normal’ behaviour, threatening the internal and external validity of your research.
The Hawthorne effect is also known as the observer effect and is closely linked with observer bias.
Example: Hawthorne effectYou are researching the smoking rates among bank employees as part of a smoking-cessation programme. You collect your data by watching the employees during their work breaks.
If employees are aware that you are observing them, this can affect your study’s results. For example, you may record higher or lower smoking rates than is genuinely representative of the population under study.
As other types of research bias, the Hawthorne effect often occurs in observational and experimental study designs in fields like medicine, organisational psychology, and education.
Inclusion and exclusion criteria determine which members of the target population can or can’t participate in a research study. Collectively, they’re known as eligibility criteria, and establishing them is critical when seeking study participants for clinical trials.
This allows researchers to study the needs of a relatively homogeneous group (e.g., people with liver disease) with precision. Examples of common inclusion and exclusion criteria are:
Study-specific variables: Type and stage of disease, previous treatment history, presence of chronic conditions, ability to attend follow-up study appointments, technological requirements (e.g., internet access)
Control variables: Fitness level, tobacco use, medications used
Confirmation bias is the tendency to seek out and prefer information that supports our preexisting beliefs. As a result, we tend to ignore any information that contradicts those beliefs. Confirmation bias is often unintentional but can still lead to poor decision-making in (psychology) research and in legal or real-life contexts.
Example: Confirmation biasDuring elections, people tend to seek information that paints the candidate they support in a positive light, while dismissing any information that paints them in a negative light.
This type of research bias is more likely to occur while processing information related to emotionally charged topics, values, or deeply held beliefs.
Predictive validity refers to the ability of a test or other measurement to predict a future outcome. Here, an outcome can be a behaviour, performance, or even disease that occurs at some point in the future.
Example: Predictive validityA pre-employment test has predictive validity when it can accurately identify the applicants who will perform well after a given amount of time, such as one year on the job.
Predictive validity is a subtype of criterion validity. It is often used in education, psychology, and employee selection.
Ecological validity measures how generalisable experimental findings are to the real world, such as situations or settings typical of everyday life. It is a subtype of external validity.
If a test has high ecological validity, it can be generalised to other real-life situations, while tests with low ecological validity cannot.
Example: Ecological validityYou are researching whether airline passengers pay attention to in-flight safety videos. You are interested in whether or not they can recall specific information from them. You recruit a sample of 100 people and send them a safety video, asking them to watch it on their own time. Afterwards, you send them a questionnaire to find out what they can recall from the video.
Using this approach, your findings would have low ecological validity. The experience of watching the video at home is vastly different from watching it on a plane.
To achieve high ecological validity, the best approach would be to conduct the experiment on an actual flight.
Ecological validity is often applied in experimental studies of human behaviour and cognition, such as in psychology and related fields.
Concurrent validity shows you the extent of the agreement between two measures or assessments taken at the same time. It compares a new assessment with one that has already been tested and proven to be valid.
Concurrent validity is a subtype of criterion validity. It is called ‘concurrent’ because the scores of the new test and the criterion variables are obtained at the same time.
Example: Concurrent validityYou want to assess the concurrent validity of a new survey measuring employee commitment. To do so, you can either:
Ask the same sample of employees to fill in both an existing (validated) survey and your new survey. Then compare the results.
Ask a sample of employees to fill in your new survey. Then, compare their responses to the results of a common measure of employee performance, such as a performance review.
If the results of the two measurement procedures are similar, you can conclude that they are measuring the same thing (i.e., employee commitment). This demonstrates concurrent validity.
Establishing concurrent validity is particularly important when a new measure is created that claims to be better in some way than existing measures: more objective, faster, cheaper, etc.
Criterion validity (or criterion-related validity) evaluates how accurately a test measures the outcome it was designed to measure. An outcome can be a disease, behaviour, or performance. Concurrent validity measures tests and criterion variables in the present, while predictive validity measures those in the future.
To establish criterion validity, you need to compare your test results to criterion variables. Criterion variables are often referred to as a “gold standard” measurement. They comprise other tests that are widely accepted as valid measures of a construct.
Example: Criterion validityA researcher wants to know whether a college entrance exam is able to predict future academic performance. First-semester average grades can serve as the criterion variable, as it is an accepted measure of academic performance.
The researcher can then compare the college entry exam scores of 100 students to their average grade after one semester in college. If the scores of the two tests are close, then the college entry exam has criterion validity.
When your test agrees with the criterion variable, it has high criterion validity. However, criterion variables can be difficult to find.
Discriminant validity refers to the extent to which a test is not related to other tests that measure different constructs. Here, a construct is a behaviour, attitude, or concept, particularly one that is not directly observable.
The expectation is that two tests that reflect different constructs should not be highly related to each other. If they are, then you cannot say with certainty that they are not measuring the same construct. Thus, discriminant validity is an indication of the extent of the difference between constructs.
Discriminant validity is assessed in combination with convergent validity. In some fields, discriminant validity is also known as divergent validity.
Example: Discriminant validity (divergent validity)You are researching extroversion as a personality trait among marketing students. To establish discriminant validity, you must also measure an unrelated construct, such as intelligence.
You have developed a questionnaire to measure extroversion, but you also ask your respondents to fill in a second questionnaire measuring intelligence in order to test the discriminant validity of your questionnaire.
Since the two constructs are unrelated, there should be no significant relationship between the scores of the two tests.
If there is a correlation, then you may be measuring the same construct in both tests. This is an indication of poor discriminant validity.
Content validity evaluates how well an instrument (like a test) covers all relevant parts of the construct it aims to measure. Here, a construct is a theoretical concept, theme, or idea – in particular, one that cannot usually be measured directly.
Content validity is one of the four types of measurement validity. The other three are:
Face validity: Does the content of the test appear to be suitable for its aims?
Criterion validity: Do the results accurately measure the concrete outcome they are designed to measure?
Construct validity: Does the test measure the concept that it’s intended to measure?
Example: Content validity in examsA written exam tests whether individuals have enough theoretical knowledge to acquire a driver’s license. The exam would have high content validity if the questions asked cover every possible topic in the course related to traffic rules. At the same time, it should also exclude all other questions that aren’t relevant for the driver’s license.
Convergent validity refers to how closely a test is related to other tests that measure the same (or similar) constructs. Here, a construct is a behaviour, attitude, or concept, particularly one that is not directly observable.
Ideally, two tests measuring the same construct, such as stress, should have a moderate to high correlation. High correlation is evidence of convergent validity, which, in turn, is an indication of construct validity.
Example: Convergent validitySuppose you use two different methods to collect data about anger: observation and a self-report questionnaire. If the scores of the two methods are similar, this suggests that they indeed measure the same construct. A high correlation between the two test scores suggests convergent validity.