Derivation of least squares containing vectors

nobben · Feb 11, 2012

Hi,

I'm trying to derive a result from this paper:

http://www.research.yahoo.net/files/HuKorenVolinsky-ICDM08.pdf

given cost function

[tex]
\min_{x_* , y_* } \sum_{u, i} c_{ui} (p_{ui} - x^T_u y_i)^2 + \lambda \left( \sum_u ||x_u||^2 + \sum_i ||y_||^2 \right)
[/tex]

Where both x_u and y_i are vectors in ℝ^k.

I want to find the minimum by using alternating least squares. Therefore I fix y and find the derivative with respect to x_u.

c_ui, p_ui and λ are constants.

They derive the following:

[tex]x_u = \left( Y^TC^uY + \lambda I \right) ^{-1} Y^T C^u p \left( u \right)[/tex]

I'm unsure how reproduce this result...

I don't know if I should try to derive with respect to the whole vector x_u or against one entry k (x_uk) in the vector and then try to map this to a function for the whole vector?

Any pointers are very much appreciated.

nobben · Feb 12, 2012

I think I solved it. Including solution below if anyone is interested!

First we define a variable S_u below which is the cost function from Equation \eqref{eq:implicit-cost-function} but where we have fixed u so that we are calculating the predicted errors for one specific user.

\begin{equation}
\label{eq:cost-u}
S_u = \sum_{i} c_{ui} (p_{ui} - x^T_u y_i)^2 + \lambda \left( || x_u ||^2 + \sum_i || y_i||^2 \right)
\end{equation}

We now find the derivative of S_u with respect to x_u.
To help clarify the process we introduce a variable \epsilon_{i} below which is the prediction-error of item i with respect to user u.
\begin{equation}
\label{eq:pred-error}
\epsilon_{i} = p_{i} - \sum_{j=1}^{n} x_{uj} y_{ij}
\end{equation}
We also calculate the derivative of \epsilon_{i} with respect to the j entries in the user factor x_uj.
\begin{equation}
\label{eq:error-deriv}
\frac{d \epsilon_{i}}{d x_{uj}} = - y_{ij}
\end{equation}
We now rewrite using the \epsilon_i variable.
There are m users, n items and f is the length of the factor vectors.
\begin{equation}
S_u = \sum_{i=1}^{n} \left( c_i \epsilon_i^2 + \lambda \sum_{k=1}^f y_{ik} \right) + \lambda \sum_{p=1}^f x_{up}^2
\end{equation}
Then we find the derivative.
\begin{equation}
\forall[1 \leq j \leq f] : \frac{d S_u}{x_{uj}} = 2 \sum_{i=1}^n \left( c_i \epsilon_i \frac{d \epsilon_i}{d x_{uj}} \right) + 2 \lambda x_{uj}
\end{equation}
Replace \epsilon_i and [tex]\frac{d \epsilon_i}{d x_{uj}}[/tex].
\begin{equation}
\forall[1 \leq j \leq f] : \frac{d S_u}{x_{uj}} = 2 \sum_{i=1}^n \left( c_i \left( p_{i} - \sum_{k=1}^{n} x_{uk} y_{ik} \right) \left( - y_{ij} \right) \right) + 2 \lambda x_{uj}
\end{equation}
Set the derivative to 0.
\begin{equation}
\forall[1 \leq j \leq f] : 0 = 2 \sum_{i=1}^n \left( c_i \left( p_{i} - \sum_{k=1}^{n} x_{uk} y_{ik} \right) \left( - y_{ij} \right) \right) + 2 \lambda x_{uj}
\end{equation}
Rearrange.
\begin{equation}
\forall[1 \leq j \leq f] : \sum_{i=1}^n \sum_{k=1}^{n} y_{ij} c_i y_{ik} x_{uk} + 2 \lambda x_{uj} = \sum_{i=1}^n c_i p_{i} y_{ij}
\end{equation}
Rewrite using matrix notation.
\begin{eqnarray}
x_u (Y^TC^uY + \lambda I) & = & Y^T C^u p(u) \\
\label{eq:xu-diff}
x_u & = & (Y^TC^uY + \lambda I)^{-1}Y^T C^u p(u)
\end{eqnarray}
Y is the $n \times f$ matrix containing the user factors.
$C^u$ is the diagonal matrix containing the confidence values where $C^u_{ii} = c_{ui}$.
The vector $p(u)$ contains all the preference values $p_{ui}$ by user $u$.

Derivation of least squares containing vectors

Related to Derivation of least squares containing vectors

What is the derivation of least squares containing vectors?

Why is least squares used for regression analysis?

What are the assumptions made in the derivation of least squares?

What is the difference between simple and multiple linear regression?

How are the coefficients calculated in the derivation of least squares?

Similar threads

Hot Threads

Recent Insights