Chapter 08

Readability Formulas

6 formulas Beginner

What you'll learn

How readability formulas estimate the difficulty of a text using simple surface statistics
The difference between syllable-based, character-based, and complex-word-based approaches
How to interpret scores as grade levels and choose the right formula for your use case
How to compute all six formulas by hand and interactively

Introduction

How hard is a piece of text to read? Readability formulas answer this question using nothing more than word counts, sentence lengths, syllable counts, and character counts. They were originally developed in the 1940s–1970s to help educators match texts to students and to ensure government and health documents could be understood by a broad audience.

Despite their simplicity, these formulas remain surprisingly useful. They power readability checkers in word processors, guide plain-language legislation, and serve as baseline features in modern NLP systems. Each formula captures a slightly different aspect of difficulty, and understanding their differences will help you choose the right one for your task.

All six formulas share two core assumptions: longer sentences are harder (more working memory required) and longer words are harder (whether measured by syllables, characters, or a polysyllabic threshold). What varies is how they combine these signals and what scale they report on.

Flesch Reading Ease (FRE)

Think of it this way: The Flesch Reading Ease score tells you how comfortable a text is to read on a scale from 0 to 100. Higher is easier. A score of 70 means most adults can read it without difficulty; a score of 30 means you probably need a graduate degree. It penalizes long sentences and polysyllabic words.

Flesch Reading Ease

$$\text{FRE} = 206.835 - 1.015 \cdot \textcolor{#e11d48}{\frac{W}{S}} - 84.6 \cdot \textcolor{#2563eb}{\frac{Sy}{W}}$$

W/S Average number of words per sentence (sentence length)

Sy/W Average number of syllables per word (word complexity)

Worked Example

Given a passage with 200 words, 10 sentences, and 280 total syllables:

Calculate averages:
$\text{W/S} = 200/10 = 20$, $\text{Sy/W} = 280/200 = 1.4$
Plug into the formula:
$\text{FRE} = 206.835 - 1.015 \times 20 - 84.6 \times 1.4$
Compute:
$= 206.835 - 20.3 - 118.44$ = 68.1 (standard / fairly easy)

Score interpretation: 90–100 = 5th grade (very easy); 60–70 = 8th–9th grade (standard); 30–50 = college level; 0–30 = college graduate (very difficult). Most consumer-facing text targets 60–70.

Flesch-Kincaid Grade Level (FKGL)

Think of it this way: This is the "inverse" of the Flesch Reading Ease — it uses the same two inputs (sentence length and syllable density) but recalibrates the coefficients to output a U.S. school grade level directly. A score of 8.0 means an 8th-grader should be able to understand the text.

Flesch-Kincaid Grade Level

$$\text{FKGL} = 0.39 \cdot \textcolor{#e11d48}{\frac{W}{S}} + 11.8 \cdot \textcolor{#2563eb}{\frac{Sy}{W}} - 15.59$$

W/S Average words per sentence

Sy/W Average syllables per word

Worked Example

Using the same passage: 200 words, 10 sentences, 280 syllables:

Same averages: $\text{W/S} = 20$, $\text{Sy/W} = 1.4$
Plug in:
$\text{FKGL} = 0.39 \times 20 + 11.8 \times 1.4 - 15.59$
Compute:
$= 7.8 + 16.52 - 15.59$ = 8.7 (about 9th grade)

vs. Flesch Reading Ease: FRE gives a 0–100 comfort score (higher = easier). FKGL gives a grade level (higher = harder). They use the same inputs with different coefficients. FKGL is the most widely used readability metric in U.S. education and government.

Gunning Fog Index

Think of it this way: The Fog Index estimates the years of formal education needed to understand a text on first reading. Instead of counting all syllables, it focuses specifically on "complex words" — those with three or more syllables. The idea is that complex words are the primary barrier to comprehension, not the occasional two-syllable word.

Gunning Fog Index

$$\text{Fog} = 0.4 \times \left(\textcolor{#e11d48}{\frac{W}{S}} + 100 \cdot \textcolor{#059669}{\frac{C_w}{W}}\right)$$

W/S Average words per sentence

C_w/W Proportion of complex words (3+ syllables) to total words

Worked Example

Given 200 words, 10 sentences, and 30 complex words (3+ syllables):

Calculate components:
$\text{W/S} = 200/10 = 20$, $C_w/W = 30/200 = 0.15$
Plug in:
$\text{Fog} = 0.4 \times (20 + 100 \times 0.15)$
Compute:
$= 0.4 \times (20 + 15) = 0.4 \times 35$ = 14.0 (college sophomore)

Interpretation: A Fog Index of 12 means high school senior level. Ideal for wide audiences: aim for 7–8. Major newspapers like the Wall Street Journal typically score around 11. Scores above 17 are considered extremely difficult.

Coleman-Liau Index (CLI)

Think of it this way: Instead of counting syllables (which requires language-specific rules), Coleman-Liau just counts letters. Longer words tend to be harder regardless of syllable structure. By using average letters per 100 words and average sentences per 100 words, it avoids the syllable-counting problem entirely — making it faster and more language-portable.

Coleman-Liau Index

$$\text{CLI} = 0.0588 \cdot \textcolor{#d97706}{L} - 0.296 \cdot \textcolor{#7c3aed}{S} - 15.8$$

L Average number of letters per 100 words

S Average number of sentences per 100 words

Worked Example

Given 200 words, 10 sentences, and 900 letters:

Scale to per-100-words:
$L = 900 \times 100/200 = 450$, $S = 10 \times 100/200 = 5$
Plug in:
$\text{CLI} = 0.0588 \times 450 - 0.296 \times 5 - 15.8$
Compute:
$= 26.46 - 1.48 - 15.8$ = 9.2 (about 9th grade)

vs. Flesch-Kincaid: Both output grade levels, but CLI uses characters instead of syllables. This makes CLI easier to compute (no syllable heuristics needed) and more reliable for automated processing. The trade-off: it cannot distinguish between monosyllabic long words and polysyllabic short ones.

SMOG Index

Think of it this way: SMOG stands for "Simple Measure of Gobbledygook." It focuses exclusively on polysyllabic words (3+ syllables) because those are the strongest predictor of text difficulty in health and consumer materials. The square root smooths out outliers, giving a robust grade-level estimate that tends to run 1–2 grades higher than other formulas.

SMOG Index

$$\text{SMOG} = 1.0430 \times \sqrt{\textcolor{#059669}{P} \times \frac{30}{\textcolor{#7c3aed}{S}}} + 3.1291$$

P Number of polysyllabic words (3+ syllables)

S Number of sentences

Worked Example

Given a passage with 30 sentences and 72 polysyllabic words:

Normalize to 30 sentences:
$P \times 30/S = 72 \times 30/30 = 72$
Take the square root:
$\sqrt{72} = 8.485$
Plug in:
$\text{SMOG} = 1.0430 \times 8.485 + 3.1291$ = 11.98 (about 12th grade)

Conservative by design: SMOG estimates the grade level needed for 100% comprehension, while Flesch-Kincaid estimates 50–75% comprehension. This is why SMOG scores typically run higher. For health literacy assessment, SMOG is the gold standard recommended by the National Institutes of Health.

Automated Readability Index (ARI)

Think of it this way: ARI was designed for real-time readability monitoring. Like Coleman-Liau, it avoids syllable counting entirely — using raw character counts instead. It was originally developed for typewriter-era automation where counting characters was trivial but counting syllables required manual effort. It outputs an approximate U.S. grade level.

Automated Readability Index

$$\text{ARI} = 4.71 \cdot \textcolor{#0891b2}{\frac{C}{W}} + 0.5 \cdot \textcolor{#e11d48}{\frac{W}{S}} - 21.43$$

C/W Average characters per word

W/S Average words per sentence

Worked Example

Given 200 words, 10 sentences, and 900 characters:

Calculate averages:
$C/W = 900/200 = 4.5$, $W/S = 200/10 = 20$
Plug in:
$\text{ARI} = 4.71 \times 4.5 + 0.5 \times 20 - 21.43$
Compute:
$= 21.195 + 10.0 - 21.43$ = 9.8 (about 10th grade)

Speed advantage: ARI is the fastest readability formula to compute. It requires only three counts (characters, words, sentences) and no linguistic analysis whatsoever. This makes it ideal for real-time feedback in text editors and streaming text analysis.

Interactive: Readability Report Card

Paste any text below and get an instant readability assessment across all six formulas. The report card shows each score, its grade-level interpretation, and the raw text statistics used in the calculations.

Summary: Comparing the Six Formulas

Formula	Output	Uses Syllables	Uses Characters	Uses Complex Words	Best For
Flesch Reading Ease	0–100 score	Yes	No	No	General readability assessment, legal requirements
Flesch-Kincaid	Grade level	Yes	No	No	Education, government, military documentation
Gunning Fog	Years of education	Indirectly (3+ threshold)	No	Yes	Business writing, journalism
Coleman-Liau	Grade level	No	Yes	No	Large-scale automated analysis
SMOG	Grade level	Indirectly (3+ threshold)	No	Yes	Health materials, conservative estimates
ARI	Grade level	No	Yes	No	Real-time feedback, streaming analysis

Key Takeaway

All six formulas measure the same underlying idea: short sentences with short words are easier to read. They differ in which proxy they use for "word difficulty" (syllables, characters, or a 3+ syllable threshold) and what scale they report on. For the most robust assessment, compute several and look at the consensus rather than relying on any single formula.