- #1
venki1130
- 3
- 0
Can anyone explain Why l1 Norm is non-differentiable in terms of matrix calculus ?
venki1130 said:Can anyone explain Why l1 Norm is non-differentiable in terms of matrix calculus ?
algebrat said:I believe venki1130 may have answered your question, but I am personally not sure. When you say l1 norm, do you mean norm of ##(x_1,\dots,x_n)## is ##|x_1|+\cdots+|x_n|##? That is the first definition I found on wikipedia. I believe this is also called the taxicab metric.
If I try to recall my education, ##\ell1## and ##L1## are different, the first one is called little ell one. The second I believe is the integral version, ##|f(x)|_1=\int|f(x)|dx##. Compare to ##L2##, ##|f(x)|_2=(\int|f(x)|^2dx)^{1/2}##. Little ell two, is ##|(x_1,\dots,x_n)|_2=\sqrt{x_1^1+\cdots+x_n^2}##. This is sort of a distance as the crow flies, as opposed to how a taxi drives.
I believe the ##\ell2##-norm has a familiar representation as a matrix, so that is what is confusing me. You asked for a matrix definition of ##\ell1##-norm, when I only know of one for ##\ell2##-norm.
Further, I could not tell you quickly how to use the matrix representation to show you the norm is not differentiable. I would guess that venki1130 pointed you in the right direction. In general, you could show it is not differentiable along any ##x_i=0## face. It would be easiest to check for ##x_2=\cdots=x_n=0##, and ##x_1## near 0. In other words, show ##|x_1|## is not differentiable near zero. Simply care the slopes from the left and right of 0.
The l1 norm is non-differentiable because it is not a smooth function. This means that it has sharp edges and corners, which make it impossible to calculate a unique slope (or derivative) at these points.
Yes, the l1 norm can be approximated with a differentiable function. This is typically done using the soft-thresholding function, which is commonly used in compressive sensing and signal processing. However, this approximation may not be exact and may introduce some errors.
The non-differentiability of the l1 norm can make it more challenging to use in optimization algorithms. This is because traditional gradient-based methods cannot be used to find the minimum of a non-differentiable function. Instead, specialized algorithms, such as subgradient descent, must be used.
Yes, there are some advantages to using a non-differentiable function like the l1 norm. One of the main advantages is that it promotes sparsity, meaning that it encourages solutions with many zero values. This can be useful in situations where the data is sparse or when interpretability is important.
Yes, there are many practical applications of the l1 norm's non-differentiability. Some examples include feature selection, compressed sensing, and robust regression. In these applications, the non-differentiability of the l1 norm is leveraged to solve problems that traditional differentiable functions cannot.