How IQ Tests Work: Norming, Scaling and Validity
- Written by
- IQCognify Editorial Team
- Reviewed for accuracy
- IQCognify Research Review Process
- Last updated
Quick answer
An IQ test doesn't measure intelligence directly the way a ruler measures length. It samples your performance on carefully chosen reasoning tasks and converts it into a score by comparing you to thousands of other people. This guide explains the machinery behind that conversion — standardization, the 15-point scale, item difficulty, reliability, validity and the adaptive tests now common in 2026.
Standardization and norming
When test publishers build an IQ test, they first administer it to a large, carefully selected 'standardization sample' chosen to mirror the wider population on characteristics such as age, sex, education and region. The pattern of scores from that sample becomes the yardstick — the 'norms' — against which every future test-taker is measured.
This is why an IQ score is always a comparison, never an absolute quantity. Answering 40 questions correctly means nothing on its own; it only acquires meaning once it is placed against how a representative group of people the same age performed. Norms also age: as a population changes over years, publishers periodically re-standardize, which is one reason older test editions can drift out of calibration.
Why age-matching matters
A 7-year-old and a 40-year-old are never compared to each other. Each person's raw performance is scored against the norm group for their own age band, then placed on the same 100-centered scale.
The 100-mean, 15-SD scale
Modern tests use a 'deviation IQ'. The raw score is transformed so that the average of the norm group is set to exactly 100 and the standard deviation — a measure of typical spread — is set to 15. Your IQ then describes how far above or below average you fall, measured in these standard-deviation units.
Because scores follow an approximately normal (bell-shaped) distribution, the scale has predictable properties. About 68% of people fall within one standard deviation of the mean (85–115), about 95% within two (70–130), and only around 2% score above 130 or below 70.
| IQ Range | Classification | % of People | What it means |
|---|---|---|---|
| ≤69 | Extremely Low | ~2.2% | Well below average. On clinical tests this range may warrant professional assessment. |
| 70–79 | Borderline | ~6.7% | Below average reasoning on this scale. |
| 80–89 | Low Average | ~16.1% | Slightly below the population average. |
| 90–109 | Average | ~50% | The middle of the distribution — where most people score. |
| 110–119 | High Average | ~16.1% | Above average reasoning ability. |
| 120–129 | Superior | ~6.7% | Notably above average — roughly the top 10%. |
| 130–144 | Gifted | ~2.1% | The conventional 'gifted' threshold (130) and above — top ~2%. Mensa qualifies here. |
| 145+ | Highly Gifted / Genius | ~0.1% | Exceptionally rare — the far right tail of the distribution. |
A practical consequence: the same number of points means something very different near the middle versus the tails. Moving from 100 to 115 is common; moving from 145 to 160 describes a vanishingly small slice of the population.
Item difficulty and discrimination
Individual questions ('items') are not chosen at random. Each is studied for two properties. Difficulty is the proportion of people who answer it correctly — good tests span a wide range, from items most people solve to items only a few do. Discrimination is how well an item separates higher-ability from lower-ability test-takers; an item that strong and weak performers answer equally adds noise rather than signal.
Items are typically arranged in ascending difficulty, so a test-taker works until problems become too hard. Many modern tests are analyzed with Item Response Theory (IRT), which models the probability of a correct answer as a function of both the person's ability and the item's properties — the mathematical foundation that makes adaptive testing possible.
- Difficulty — the share of the norm group who solve the item correctly.
- Discrimination — how sharply the item distinguishes higher- from lower-ability test-takers.
- Bias screening — checking that items don't behave differently for groups matched on ability.
Reliability and validity
Two properties decide whether a test is worth trusting. Reliability is consistency: would you get a similar score if you took the test again, or on a parallel form? Well-constructed individual IQ tests are highly reliable, though no measurement is perfect — every score carries a margin of error, which is why results are best read as a range rather than a single exact point.
Validity asks the harder question: does the test actually measure reasoning ability, rather than test-taking practice, cultural familiarity or motivation? Evidence for validity comes from how a score relates to other measures and to real-world outcomes it should predict. A test can be highly reliable yet still not valid for a given purpose, so the two must be evaluated together.
Scores are ranges, not pinpoints
Because of measurement error, a reported IQ of 120 is better understood as 'most likely somewhere around the mid-110s to mid-120s'. Treating a single number as exact overstates what any test can deliver.
Computer-adaptive testing
Traditional fixed tests give everyone the same questions. A computer-adaptive test (CAT) instead chooses each next item based on how you have answered so far: get one right and the next is harder, get one wrong and it eases off. The test converges on the difficulty level that best pins down your ability.
The payoff is efficiency and precision. A CAT can reach a confident estimate in fewer items than a fixed form, because it stops asking questions far below or far above your level that would add little information. The trade-off is that it depends on a well-calibrated item bank and on the IRT models behind it being a good fit for the population being tested.
Whether fixed or adaptive, every legitimate IQ test rests on the same chain: a normed sample, a standardized scale, vetted items, and demonstrated reliability and validity. A test that skips those steps may produce a number, but not a meaningful one.
Frequently asked questions
How is an IQ score calculated?+
Your raw performance on the test items is compared to a norm group of people your own age, then converted to a standardized scale with a mean of 100 and a standard deviation of 15. The score describes how far you fall above or below the average of that group, not a count of correct answers.
What is a good standard deviation for an IQ test?+
Most modern tests, including the Wechsler scales, use a standard deviation of 15. Some older or specialized tests used 16 or 24, which is why the same percentile can map to a slightly different number — always check which scale a score is reported on.
Are adaptive IQ tests more accurate?+
Adaptive tests can reach a precise estimate from fewer questions by tailoring item difficulty to the test-taker, which reduces fatigue and avoids irrelevant items. Their accuracy depends on a well-calibrated item bank; a poorly built adaptive test is not automatically better than a good fixed one.
Why do IQ scores come with a margin of error?+
No psychological measurement is perfect, so every score reflects some measurement error. Reputable tests report a confidence interval — for example, a band of several points around the score — to show the range your 'true' ability most likely falls within.
What makes an IQ test valid?+
Validity means the test actually measures reasoning ability rather than practice, cultural knowledge or motivation. It is established by showing that scores relate to other established measures and predict relevant outcomes, and it must be evaluated alongside reliability, which measures consistency.
Sources
This guide draws on standard psychometric references and peer-reviewed research:
- 1.Pearson — Wechsler Adult Intelligence Scale (WAIS) and Wechsler Intelligence Scale for Children (WISC).
- 2.McGrew, K. S. (2009). “CHC theory and the human cognitive abilities project.” Intelligence, 37(1).
- 3.Deary, I. J. (2020). Intelligence: A Very Short Introduction (2nd ed.). Oxford University Press.
- 4.American Psychological Association (APA)
Sources are provided for further reading. Organization links point to official sites; academic works are cited in full. See our research standards and editorial team.
Find out your IQ
Take the free IQ test and get your score, percentile, and a full cognitive breakdown in about 12 minutes.