What Is Criterion Validity? | Definition & Examples
Criterion validity (or criterion-related validity) evaluates how accurately a test measures the outcome it was designed to measure. An outcome can be a disease, behaviour, or performance. Concurrent validity measures tests and criterion variables in the present, while predictive validity measures those in the future.
To establish criterion validity, you need to compare your test results to criterion variables. Criterion variables are often referred to as a “gold standard” measurement. They comprise other tests that are widely accepted as valid measures of a construct.
The researcher can then compare the college entry exam scores of 100 students to their average grade after one semester in college. If the scores of the two tests are close, then the college entry exam has criterion validity.
When your test agrees with the criterion variable, it has high criterion validity. However, criterion variables can be difficult to find.
What is criterion validity?
Criterion validity shows you how well a test correlates with an established standard of comparison called a criterion.
A measurement instrument, like a questionnaire, has criterion validity if its results converge with those of some other, accepted instrument, commonly called a “gold standard.”
A gold standard (or criterion variable) measures:
- The same construct
- Conceptually relevant constructs
- Conceptually relevant behaviour or performance
When a gold standard exists, evaluating criterion validity is a straightforward process. For example, you can compare a new questionnaire with an established one. In medical research, you can compare test scores with clinical assessments.
However, in many cases, there is no existing gold standard. If you want to measure pain, for example, there is no objective standard to do so. You must rely on what respondents tell you. In such cases, you can’t achieve criterion validity.
It’s important to keep in mind that criterion validity is only as good as the validity of the gold standard or reference measure. If the reference measure is biased, it can impact an otherwise valid measure. In other words, a valid measure tested against a biased gold standard may fail to achieve criterion validity.
Similarly, two biased measures will confirm one another. Thus, criterion validity is no guarantee that a measure is in fact valid. It’s best used in tandem with the other types of validity.
Types of criterion validity
There are two types of criterion validity. Which type you use depends on the time at which the two measures (the criterion and your test) are obtained.
- Concurrent validity is used when the scores of a test and the criterion variables are obtained at the same time.
- Predictive validity is used when the criterion variables are measured after the scores of the test.
Concurrent validity
Concurrent validity is demonstrated when a new test correlates with another test that is already considered valid, called the criterion test. A high correlation between the new test and the criterion indicates concurrent validity.
Establishing concurrent validity is particularly important when a new measure is created that claims to be better in some way than its predecessors: more objective, faster, cheaper, etc.
Remember that this form of validity can only be used if another criterion or validated instrument already exists.
Predictive validity
Predictive validity is demonstrated when a test can predict future performance. In other words, the test must correlate with a variable that can only be assessed at some point in the future, after the test has been administered.
For predictive criterion validity, researchers often examine how the results of a test predict a relevant future outcome. For example, the results of an IQ test can be used to predict future educational achievement. The outcome is, by design, assessed at some point in the future.
A student’s average grades are a widely accepted marker of academic performance and can be used as a criterion variable. To assess the predictive validity of the math test, you compare how students scored in that test to their final results after the first semester in the engineering program. If high test scores were associated with individuals who later performed well in their studies and achieved a high average grade, then the math test would have strong predictive validity.
A high correlation provides evidence of predictive validity. It indicates that a test can correctly predict something that you hypothesise it should.
Criterion validity example
Criterion validity is often used when a researcher wishes to replace an established test with a different version of the same test, particularly one that is more objective, shorter, or cheaper.
Although the original test is widely accepted as a valid measure of procrastination, it is very long and takes a lot of time to complete. As a result, many students fill it in without carefully considering their answers.
To evaluate how well the new, shorter test assesses procrastination, the psychologist asks the same group of students to take both the new and the original test. If the results between the two tests are similar, the new test has high criterion validity. The psychologist can be confident that the new test will measure procrastination as accurately as the original.
How to measure criterion validity
Criterion validity is assessed in two ways:
- By statistically testing a new measurement technique against an independent criterion or standard to establish concurrent validity
- By statistically testing against a future performance to establish predictive validity
The measure to be validated, such as a test, should be correlated with a measure considered to be a well-established indication of the construct under study. This is your criterion variable.
Correlations between the scores on the test and the criterion variable are calculated using a correlation coefficient, such as Pearson’s r. A correlation coefficient expresses the strength of the relationship between two variables in a single value between −1 and +1.
Correlation coefficient values can be interpreted as follows:
- r = 1: There is perfect positive correlation
- r = 0: There is no correlation at all.
- r = −1: There is perfect negative correlation
You can automatically calculate Pearson’s r in Excel, R, SPSS or other statistical software.
Positive correlation between a test and the criterion variable shows that the test is valid. No correlation or a negative correlation indicates that the test and criterion variable do not measure the same concept.
You give the two scales to the same sample of respondents. The extent of agreement between the results of the two scales is expressed through a correlation coefficient.
You calculate the correlation coefficient between the results of the two tests and find out that your scale correlates with the existing scale (r = 0.80). This value shows that there is a strong positive correlation between the two scales.
In other words, your scale is accurately measuring the same construct operationalised in the validated scale.
Frequently asked questions about criterion validity
Cite this Scribbr article
If you want to cite this source, you can copy and paste the citation or click the ‘Cite this Scribbr article’ button to automatically add the citation to our free Reference Generator.
Nikolopoulou, K. (2022, September 02). What Is Criterion Validity? | Definition & Examples. Scribbr. Retrieved 10 March 2025, from https://www.scribbr.co.uk/research-methods/criterion-validity-explained/