Model Selection and multimodal Inference---A Practical Information-Theoretic Approach (2002) by Kenneth P. Burnham, David R. Anderson

@book{burnham-and-anderson-2002, address = {New York, Berlin, Heidelberg}, author = {Burnham, Kenneth P. and Anderson, David R.}, citeulike-article-id = {4425594}, edition = {Second}, isbn = {0-387-95364-7}, keywords = {model-selection, statistics}, posted-at = {2014-06-16 11:24:17}, priority = {2}, publisher = {Springer}, title = {Model Selection and multimodal {Inference---A} Practical {Information-Theoretic} Approach}, year = {2002} }

See the CiteULike entry for more info, PDF links, BibTex etc.

Kullback-Leibler divergence can be used **heuristically** as a distance between probability distributions.⇒

Kullback-Leibler divergence $D_{KL}(P,Q)$ between probability distributions $P$ and $Q$ can be interpreted as the information lost when approximating $P$ by $Q$.⇒

For discrete probability distributions $P,E$ with the set of outcomes $E$, Kullback-Leibler divergence is defined as $$D_{KL}(P,Q)=\sum_{e\in E} P(e)\log\left(\frac{P(e)}{Q(e)}\right).$$⇒