# 信源编码定理

## 陈述

### 信源编码定理

N均为 H(X)独立同分布的随机变量N → ∞ 时，可以很小的信息损失风险压缩成多于 N H(X) bit；但相反地，若压缩到少于 N H(X) bit，则信息几乎一定会丢失。

### 码符号的信源编码定理

Σ1, Σ2 表示两个有限编码表，并令 Σ
1
Σ
2
（分别）表示来自那些编码表的所有有限字的集合

X 为从 Σ1 取值的随机变量，令 f 为从 Σ
1
Σ
2

${\displaystyle {\frac {H(X)}{\log _{2}a}}\leq \mathbb {E} S<{\frac {H(X)}{\log _{2}a}}+1}$

## 证明：码符号的信源编码定理

{\displaystyle {\begin{aligned}H(X)&=-\sum _{i=1}^{n}p_{i}\log _{2}p_{i}\\&\leq -\sum _{i=1}^{n}p_{i}\log _{2}q_{i}\\&=-\sum _{i=1}^{n}p_{i}\log _{2}a^{-s_{i}}+\sum _{i=1}^{n}p_{i}\log _{2}C\\&=-\sum _{i=1}^{n}p_{i}\log _{2}a^{-s_{i}}+\log _{2}C\\&\leq -\sum _{i=1}^{n}-s_{i}p_{i}\log _{2}a\\&\leq \mathbb {E} S\log _{2}a\\\end{aligned}}}

${\displaystyle C=\sum _{i=1}^{n}a^{-s_{i}}\leq 1}$

${\displaystyle s_{i}=\lceil -\log _{a}p_{i}\rceil }$

${\displaystyle -\log _{a}p_{i}\leq s_{i}<-\log _{a}p_{i}+1}$

${\displaystyle a^{-s_{i}}\leq p_{i}}$

${\displaystyle \sum a^{-s_{i}}\leq \sum p_{i}=1}$

{\displaystyle {\begin{aligned}\mathbb {E} S&=\sum p_{i}s_{i}\\&<\sum p_{i}\left(-\log _{a}p_{i}+1\right)\\&=\sum -p_{i}{\frac {\log _{2}p_{i}}{\log _{2}a}}+1\\&={\frac {H(X)}{\log _{2}a}}+1\\\end{aligned}}}

## 参考资料

1. ^ C.E. Shannon, "A Mathematical Theory of Communication Archived 2007-10-26 at WebCite", Bell System Technical Journal, vol. 27, pp. 379–423, 623-656, July, October, 1948
2. ^ David J. C. MacKay. Information Theory, Inference, and Learning Algorithms页面存档备份，存于互联网档案馆 Cambridge: Cambridge University Press, 2003. ISBN 0-521-64298-1
3. ^ Cover, Thomas M. Chapter 5: Data Compression. Elements of Information Theory. John Wiley & Sons. 2006. ISBN 0-471-24195-4.