Differences between normalization, standardization and regularization
It is frequent to see the following three terms in machine learning: normalization, standardization and regularization. Here comes a short introduction to help to distinguish them.
Normalization
Normalization usually rescales features to .^{1} That is,
It will be useful when we are sure enough that there are no anomalies (i.e. outliers) with extremely large or small values. For example, in a recommender system, the ratings made by users are limited to a small finite set like .
In some situations, we may prefer to map data to a range like with zeromean.^{2} Then we should choose mean normalization.^{3}
In this way, it will be more convenient for us to use other techniques like matrix factorization.
Standardization
Standardization is widely used as a preprocessing step in many learning algorithms to rescale the features to zeromean and unitvariance.^{3}
Regularization
Different from the feature scaling techniques mentioned above, regularization is intended to solve the overfitting problem. By adding an extra part to the loss function, the parameters in learning algorithms are more likely to converge to smaller values, which can significantly reduce overfitting.
There are mainly two basic types of regularization: L1norm (lasso) and L2norm (ridge regression).^{4}
L1norm^{5}
The original loss function is denoted by , and the new one is .
where
L1 regularization is better when we want to train a sparse model, since the absolute value function is not differentiable at 0.
L2norm^{5}^{6}
L2 regularization is preferred in illposed problems for smoothing.
Here is a comparison between L1 and L2 regularizations.
From https://en.wikipedia.org/wiki/Regularization_(mathematics)
References

https://stats.stackexchange.com/a/10298 ↩

https://www.quora.com/Whatisthedifferencebetweennormalizationstandardizationandregularizationfordata/answer/EnzoTagliazucchi?share=c48b6752&srid=51VPj ↩

https://en.wikipedia.org/wiki/Regularization_%28mathematics%29 ↩

https://www.quora.com/WhatisthedifferencebetweenL1andL2regularizationHowdoesitsolvetheproblemofoverfittingWhichregularizertouseandwhen/answer/KennethTran?share=400c336d&srid=51VPj ↩ ↩^{2}

https://en.wikipedia.org/wiki/Ridge_regression ↩