# 語言模型

## 模型类型

### 单元语法（unigram）

a 0.1
world 0.2
likes 0.05
we 0.05
share 0.3
... ...
${\displaystyle \sum _{\text{term in doc}}P({\text{term}})=1\,}$

${\displaystyle P({\text{query}})=\prod _{\text{term in query}}P({\text{term}})}$

a 0.1 0.3
world 0.2 0.1
likes 0.05 0.03
we 0.05 0.02
share 0.3 0.2
... ... ...

### n-元语法

${\displaystyle P(w_{1},\ldots ,w_{m})=\prod _{i=1}^{m}P(w_{i}\mid w_{1},\ldots ,w_{i-1})\approx \prod _{i=1}^{m}P(w_{i}\mid w_{i-(n-1)},\ldots ,w_{i-1})}$

${\displaystyle P(w_{i}\mid w_{i-(n-1)},\ldots ,w_{i-1})={\frac {\mathrm {count} (w_{i-(n-1)},\ldots ,w_{i-1},w_{i})}{\mathrm {count} (w_{i-(n-1)},\ldots ,w_{i-1})}}}$

#### 例子

{\displaystyle {\begin{aligned}&P({\text{I, saw, the, red, house}})\\\approx {}&P({\text{I}}\mid \langle s\rangle )P({\text{saw}}\mid {\text{I}})P({\text{the}}\mid {\text{saw}})P({\text{red}}\mid {\text{the}})P({\text{house}}\mid {\text{red}})P(\langle /s\rangle \mid {\text{house}})\end{aligned}}}

{\displaystyle {\begin{aligned}&P({\text{I, saw, the, red, house}})\\\approx {}&P({\text{I}}\mid \langle s\rangle ,\langle s\rangle )P({\text{saw}}\mid \langle s\rangle ,I)P({\text{the}}\mid {\text{I, saw}})P({\text{red}}\mid {\text{saw, the}})P({\text{house}}\mid {\text{the, red}})P(\langle /s\rangle \mid {\text{red, house}})\end{aligned}}}

### 指数型

${\displaystyle P(w_{m}|w_{1},\ldots ,w_{m-1})={\frac {1}{Z(w_{1},\ldots ,w_{m-1})}}\exp(a^{T}f(w_{1},\ldots ,w_{m}))}$

