Supervised quantile normalization for low rank matrix factorization

M Cuturi, O Teboul, J Niles-Weed… - … on Machine Learning, 2020 - proceedings.mlr.press
International Conference on Machine Learning, 2020proceedings.mlr.press
Low rank matrix factorization is a fundamental building block in machine learning, used for
instance to summarize gene expression profile data or word-document counts. To be robust
to outliers and differences in scale across features, a matrix factorization step is usually
preceded by ad-hoc feature normalization steps, such as tf-idf scaling or data whitening. We
propose in this work to learn these normalization operators jointly with the factorization itself.
More precisely, given a $ d\times n $ matrix $ X $ of $ d $ features measured on $ n …
Abstract
Low rank matrix factorization is a fundamental building block in machine learning, used for instance to summarize gene expression profile data or word-document counts. To be robust to outliers and differences in scale across features, a matrix factorization step is usually preceded by ad-hoc feature normalization steps, such as tf-idf scaling or data whitening. We propose in this work to learn these normalization operators jointly with the factorization itself. More precisely, given a matrix of features measured on individuals, we propose to learn the parameters of quantile normalization operators that can operate row-wise on the values of and/or of its factorization to improve the quality of the low-rank representation of itself. This optimization is facilitated by the introduction of a new differentiable quantile normalization operator built using optimal transport, providing new results on top of existing work by Cuturi et al.(2019). We demonstrate the applicability of these techniques on synthetic and genomics datasets.
proceedings.mlr.press
Showing the best result for this search. See all results