# 正则化 (数学)

## 概述

${\displaystyle F^{*}:=\mathop {\text{arg min}} _{F}L(F).}$

${\displaystyle F^{*}:=\mathop {\text{arg min}} _{F}{\text{Obj}}(F)=\mathop {\text{arg min}} _{F}{\bigl (}L(F)+\gamma \Omega (F){\bigr )},\qquad \gamma >0.}$

## ${\displaystyle L_{p}}$-正则项

### ${\displaystyle L_{0}}$  和 ${\displaystyle L_{1}}$ -正则项

${\displaystyle F({\vec {x}};{\vec {\omega }}):={\vec {\omega }}^{\intercal }\cdot {\vec {x}}=\sum _{i=1}^{n}\omega _{i}\cdot x_{i}.}$

${\displaystyle \Omega {\bigl (}F({\vec {x}};{\vec {\omega }}){\bigr )}:=\gamma _{0}{\frac {\lVert {\vec {\omega }}\rVert _{0}}{n}},\;\gamma _{0}>0.}$

${\displaystyle \Omega {\bigl (}F({\vec {x}};{\vec {\omega }}){\bigr )}:=\gamma _{1}{\frac {\lVert {\vec {\omega }}\rVert _{1}}{n}},\;\gamma _{1}>0.}$

${\displaystyle L_{1}}$ -正则项亦称LASSO-正则项。[5][6]

### ${\displaystyle L_{2}}$ -正则项

${\displaystyle \Omega {\bigl (}F({\vec {x}};{\vec {w}}){\bigr )}:=\gamma _{2}{\frac {\lVert {\vec {\omega }}\rVert _{2}^{2}}{2n}},\;\gamma _{2}>0,}$

${\displaystyle {\text{Obj}}(F)=L(F)+\gamma _{2}{\frac {\lVert {\vec {\omega }}\rVert _{2}^{2}}{2n}},}$

${\displaystyle {\frac {\partial {\text{Obj}}}{\partial \omega _{i}}}={\frac {\partial L}{\partial \omega _{i}}}+{\frac {\gamma _{2}}{n}}\omega _{i}.}$

${\displaystyle \omega '_{i}\gets \omega _{i}-\eta {\frac {\partial L}{\partial \omega _{i}}}-\eta {\frac {\gamma _{2}}{n}}\omega _{i}={\Bigl (}1-\eta {\frac {\gamma _{2}}{n}}{\Bigr )}\omega _{i}-\eta {\frac {\partial L}{\partial \omega _{i}}}.}$

${\displaystyle L_{2}}$ -正则项又称Tikhonov-正则项或Ridge-正则项。

## 参考文献

1. ^ Bühlmann, Peter; Van De Geer, Sara. Statistics for High-Dimensional Data. Springer Series in Statistics: 9. 2011. ISBN 978-3-642-20191-2. doi:10.1007/978-3-642-20192-9. If p > n, the ordinary least squares estimator is not unique and will heavily overfit the data. Thus, a form of complexity regularization will be necessary.
2. ^ Ron Kohavi; Foster Provost. Glossary of terms. Machine Learning. 1998, 30: 271–274 [2019-12-10]. （原始内容存档于2019-11-11）.
3. ^ Bishop, Christopher M. Pattern recognition and machine learning Corr. printing. New York: Springer. 2007. ISBN 978-0387310732.
4. ^ 范数的非负性保证了范数有下界。当齐次性等式${\displaystyle \lVert c\cdot {\vec {x}}\rVert =|c|\cdot \lVert {\vec {x}}\rVert }$ 中的${\displaystyle c}$ 取零时可知，零向量的范数是零，这保证了范数有下确界。
5. ^ Santosa, Fadil; Symes, William W. Linear inversion of band-limited reflection seismograms.. SIAM Journal on Scientific and Statistical Computing (SIAM). 1986, 7 (4): 1307–1330. doi:10.1137/0907087.
6. ^ Tibshirani, Robert. Regression Shrinkage and Selection via the lasso. Journal of the Royal Statistical Society. Series B (methodological) (Wiley). 1996, 58 (1): 267–88. JSTOR 2346178.
7. ^ 可通过恰当地调整学习率${\displaystyle \eta }$ 与正则系数${\displaystyle \gamma _{2}}$ 来满足这一点。