PMIPlus

An interactive manual for text analysis formulas. From word association to transformers — understand the math that powers NLP.

53+ Formulas
12 Chapters
20+ Interactive Demos
Your progress
Chapter 01
PMI and Its Variants
Start here. Learn how Pointwise Mutual Information reveals hidden word associations, and explore its many variants.
PMI, PPMI, NPMI, PMI², PMI^k, Shifted PMI
Beginner
Chapter 02
Co-occurrence & Association
Beyond PMI: statistical tests and measures that quantify how strongly words attract each other.
Log-Likelihood, Chi², Dice, Jaccard, t-score, z-score, Log-Dice, MI, Fisher's
Beginner
Chapter 03
TF-IDF Family
How important is a word to a document? TF-IDF and BM25 answer this — and power every search engine.
TF-IDF, BM25, TF-ICF
Beginner
Chapter 04
Information Theory
Shannon's gift to NLP: entropy, cross-entropy, and divergence — the language of uncertainty.
Entropy, Cross-Entropy, KL Divergence, JS Divergence, Perplexity
Intermediate
Chapter 05
Similarity & Distance
How close are two texts? Four ways to measure it — from the angle between vectors to city-block distance.
Cosine Similarity, Euclidean, Manhattan, Minkowski
Intermediate
Chapter 06
Language Models
Predicting the next word: n-gram models and the smoothing tricks that make them work.
N-gram Prob, Laplace, Good-Turing, Kneser-Ney, Katz Back-off
Intermediate
Chapter 07
Word Embeddings
Words as vectors: how Word2Vec and GloVe learn that "king - man + woman = queen".
Word2Vec CBOW, Skip-gram, GloVe
Intermediate
Chapter 08
Readability Formulas
Is your text too complex? Six classic formulas that grade reading difficulty.
Flesch Reading Ease, Flesch-Kincaid, Gunning Fog, Coleman-Liau, SMOG, ARI
Beginner
Chapter 09
Lexical Diversity
How rich is a vocabulary? Measures from the simple type-token ratio to sophisticated statistical indices.
TTR, Hapax, Yule's K, Simpson's D, MTLD, MATTR, vocd-D
Intermediate
Chapter 10
Sentiment Analysis
Is this text positive or negative? Lexicon-based approaches that score sentiment without machine learning.
VADER, SentiWordNet
Intermediate
Chapter 11
IR Evaluation Metrics
How good is your search? Precision, recall, and ranking metrics that judge retrieval quality.
Precision@K, Recall@K, F1, MAP, NDCG
Intermediate
Appendix
The Road to Transformers
How everything connects: from PMI to attention. The conceptual bridge to modern deep learning.
Softmax, Dot-Product Attention, Self-Attention, Positional Encoding, Multi-Head Attention
Advanced