# 余弦相似性

## 定义

${\displaystyle \mathbf {a} \cdot \mathbf {b} =\left\|\mathbf {a} \right\|\left\|\mathbf {b} \right\|\cos \theta }$

${\displaystyle {\text{similarity}}=\cos(\theta )={A\cdot B \over \|A\|\|B\|}={\frac {\sum \limits _{i=1}^{n}{A_{i}\times B_{i}}}{{\sqrt {\sum \limits _{i=1}^{n}{(A_{i})^{2}}}}\times {\sqrt {\sum \limits _{i=1}^{n}{(B_{i})^{2}}}}}}}$ ，這裡的${\displaystyle A_{i}}$ ${\displaystyle B_{i}}$ 分別代表向量${\displaystyle A}$ ${\displaystyle B}$ 的各分量

### 角相似性

「余弦相似性」一詞有时也被用来表示另一個系数，儘管最常见的是像上述定义那样的。透過使用相同計算方式得到的相似性，向量之间的规范化角度可以作为一个范围在[0,1]上的有界相似性函数，從上述定义的相似性计算如下：

${\displaystyle 1-\left({\frac {\cos ^{-1}({\text{similarity}})}{\pi }}\right)}$

${\displaystyle 1-\left({\frac {2\cdot \cos ^{-1}({\text{similarity}})}{\pi }}\right)}$

### 與「Tanimoto」系数的混淆

${\displaystyle T(A,B)={A\cdot B \over \|A\|^{2}+\|B\|^{2}-A\cdot B}}$

### Ochiai系数

${\displaystyle K={\frac {n(A\cap B)}{\sqrt {n(A)\times n(B)}}}}$

## 参考文献

1. ^ P.-N. Tan, M. Steinbach & V. Kumar, "Introduction to Data Mining", , Addison-Wesley (2005), ISBN 0-321-32136-7, chapter 8; page 500.
2. ^ Ochiai A. Zoogeographical studies on the soleoid fishes found Japan and its neighboring regions. II // Bull. Jap. Soc. sci. Fish. 1957. V. 22. № 9. P. 526-530.
3. ^ Barkman J.J. Phytosociology and ecology of cryptogamic epiphytes, including a taxonomic survey and description of their vegetation units in Europe. – Assen. Van Gorcum. 1958. 628 p.