Readability Formulas
What you'll learn
- How readability formulas estimate the difficulty of a text using simple surface statistics
- The difference between syllable-based, character-based, and complex-word-based approaches
- How to interpret scores as grade levels and choose the right formula for your use case
- How to compute all six formulas by hand and interactively
Introduction
How hard is a piece of text to read? Readability formulas answer this question using nothing more than word counts, sentence lengths, syllable counts, and character counts. They were originally developed in the 1940s–1970s to help educators match texts to students and to ensure government and health documents could be understood by a broad audience.
Despite their simplicity, these formulas remain surprisingly useful. They power readability checkers in word processors, guide plain-language legislation, and serve as baseline features in modern NLP systems. Each formula captures a slightly different aspect of difficulty, and understanding their differences will help you choose the right one for your task.
All six formulas share two core assumptions: longer sentences are harder (more working memory required) and longer words are harder (whether measured by syllables, characters, or a polysyllabic threshold). What varies is how they combine these signals and what scale they report on.
Flesch Reading Ease (FRE)
Think of it this way: The Flesch Reading Ease score tells you how comfortable a text is to read on a scale from 0 to 100. Higher is easier. A score of 70 means most adults can read it without difficulty; a score of 30 means you probably need a graduate degree. It penalizes long sentences and polysyllabic words.
Worked Example
Given a passage with 200 words, 10 sentences, and 280 total syllables:
-
Calculate averages:
\(\text{W/S} = 200/10 = 20\), \(\text{Sy/W} = 280/200 = 1.4\) -
Plug into the formula:
\(\text{FRE} = 206.835 - 1.015 \times 20 - 84.6 \times 1.4\) -
Compute:
\(= 206.835 - 20.3 - 118.44\) = 68.1 (standard / fairly easy)
Flesch-Kincaid Grade Level (FKGL)
Think of it this way: This is the "inverse" of the Flesch Reading Ease — it uses the same two inputs (sentence length and syllable density) but recalibrates the coefficients to output a U.S. school grade level directly. A score of 8.0 means an 8th-grader should be able to understand the text.
Worked Example
Using the same passage: 200 words, 10 sentences, 280 syllables:
-
Same averages: \(\text{W/S} = 20\), \(\text{Sy/W} = 1.4\)
-
Plug in:
\(\text{FKGL} = 0.39 \times 20 + 11.8 \times 1.4 - 15.59\) -
Compute:
\(= 7.8 + 16.52 - 15.59\) = 8.7 (about 9th grade)
Gunning Fog Index
Think of it this way: The Fog Index estimates the years of formal education needed to understand a text on first reading. Instead of counting all syllables, it focuses specifically on "complex words" — those with three or more syllables. The idea is that complex words are the primary barrier to comprehension, not the occasional two-syllable word.
Worked Example
Given 200 words, 10 sentences, and 30 complex words (3+ syllables):
-
Calculate components:
\(\text{W/S} = 200/10 = 20\), \(C_w/W = 30/200 = 0.15\) -
Plug in:
\(\text{Fog} = 0.4 \times (20 + 100 \times 0.15)\) -
Compute:
\(= 0.4 \times (20 + 15) = 0.4 \times 35\) = 14.0 (college sophomore)
Coleman-Liau Index (CLI)
Think of it this way: Instead of counting syllables (which requires language-specific rules), Coleman-Liau just counts letters. Longer words tend to be harder regardless of syllable structure. By using average letters per 100 words and average sentences per 100 words, it avoids the syllable-counting problem entirely — making it faster and more language-portable.
Worked Example
Given 200 words, 10 sentences, and 900 letters:
-
Scale to per-100-words:
\(L = 900 \times 100/200 = 450\), \(S = 10 \times 100/200 = 5\) -
Plug in:
\(\text{CLI} = 0.0588 \times 450 - 0.296 \times 5 - 15.8\) -
Compute:
\(= 26.46 - 1.48 - 15.8\) = 9.2 (about 9th grade)
SMOG Index
Think of it this way: SMOG stands for "Simple Measure of Gobbledygook." It focuses exclusively on polysyllabic words (3+ syllables) because those are the strongest predictor of text difficulty in health and consumer materials. The square root smooths out outliers, giving a robust grade-level estimate that tends to run 1–2 grades higher than other formulas.
Worked Example
Given a passage with 30 sentences and 72 polysyllabic words:
-
Normalize to 30 sentences:
\(P \times 30/S = 72 \times 30/30 = 72\) -
Take the square root:
\(\sqrt{72} = 8.485\) -
Plug in:
\(\text{SMOG} = 1.0430 \times 8.485 + 3.1291\) = 11.98 (about 12th grade)
Automated Readability Index (ARI)
Think of it this way: ARI was designed for real-time readability monitoring. Like Coleman-Liau, it avoids syllable counting entirely — using raw character counts instead. It was originally developed for typewriter-era automation where counting characters was trivial but counting syllables required manual effort. It outputs an approximate U.S. grade level.
Worked Example
Given 200 words, 10 sentences, and 900 characters:
-
Calculate averages:
\(C/W = 900/200 = 4.5\), \(W/S = 200/10 = 20\) -
Plug in:
\(\text{ARI} = 4.71 \times 4.5 + 0.5 \times 20 - 21.43\) -
Compute:
\(= 21.195 + 10.0 - 21.43\) = 9.8 (about 10th grade)
Interactive: Readability Report Card
Paste any text below and get an instant readability assessment across all six formulas. The report card shows each score, its grade-level interpretation, and the raw text statistics used in the calculations.
Summary: Comparing the Six Formulas
| Formula | Output | Uses Syllables | Uses Characters | Uses Complex Words | Best For |
|---|---|---|---|---|---|
| Flesch Reading Ease | 0–100 score | Yes | No | No | General readability assessment, legal requirements |
| Flesch-Kincaid | Grade level | Yes | No | No | Education, government, military documentation |
| Gunning Fog | Years of education | Indirectly (3+ threshold) | No | Yes | Business writing, journalism |
| Coleman-Liau | Grade level | No | Yes | No | Large-scale automated analysis |
| SMOG | Grade level | Indirectly (3+ threshold) | No | Yes | Health materials, conservative estimates |
| ARI | Grade level | No | Yes | No | Real-time feedback, streaming analysis |
All six formulas measure the same underlying idea: short sentences with short words are easier to read. They differ in which proxy they use for "word difficulty" (syllables, characters, or a 3+ syllable threshold) and what scale they report on. For the most robust assessment, compute several and look at the consensus rather than relying on any single formula.