PMIPlus

An interactive manual for text analysis formulas. From word association to transformers — understand the math that powers NLP.

53+ Formulas

12 Chapters

20+ Interactive Demos

Your progress

PMI and Its Variants

Start here. Learn how Pointwise Mutual Information reveals hidden word associations, and explore its many variants.

PMI, PPMI, NPMI, PMI², PMI^k, Shifted PMI

Co-occurrence & Association

Beyond PMI: statistical tests and measures that quantify how strongly words attract each other.

Log-Likelihood, Chi², Dice, Jaccard, t-score, z-score, Log-Dice, MI, Fisher's

How important is a word to a document? TF-IDF and BM25 answer this — and power every search engine.

TF-IDF, BM25, TF-ICF

Information Theory

Shannon's gift to NLP: entropy, cross-entropy, and divergence — the language of uncertainty.

Entropy, Cross-Entropy, KL Divergence, JS Divergence, Perplexity

Similarity & Distance

How close are two texts? Four ways to measure it — from the angle between vectors to city-block distance.

Cosine Similarity, Euclidean, Manhattan, Minkowski

Language Models

Predicting the next word: n-gram models and the smoothing tricks that make them work.

N-gram Prob, Laplace, Good-Turing, Kneser-Ney, Katz Back-off

Word Embeddings

Words as vectors: how Word2Vec and GloVe learn that "king - man + woman = queen".

Word2Vec CBOW, Skip-gram, GloVe

Readability Formulas

Is your text too complex? Six classic formulas that grade reading difficulty.

Flesch Reading Ease, Flesch-Kincaid, Gunning Fog, Coleman-Liau, SMOG, ARI

Lexical Diversity

How rich is a vocabulary? Measures from the simple type-token ratio to sophisticated statistical indices.

TTR, Hapax, Yule's K, Simpson's D, MTLD, MATTR, vocd-D

Sentiment Analysis

Is this text positive or negative? Lexicon-based approaches that score sentiment without machine learning.

VADER, SentiWordNet

IR Evaluation Metrics

How good is your search? Precision, recall, and ranking metrics that judge retrieval quality.

Precision@K, Recall@K, F1, MAP, NDCG

The Road to Transformers

How everything connects: from PMI to attention. The conceptual bridge to modern deep learning.

Softmax, Dot-Product Attention, Self-Attention, Positional Encoding, Multi-Head Attention