Why l1 Norm is non-differentiable?

  • Thread starter venki1130
  • Start date
  • Tags
    Norm
In summary: I am personally not sure. When you say l1 norm, do you mean norm of ##(x_1,\dots,x_n)## is ##|x_1|+\cdots+|x_n|##? That is the first definition I found on wikipedia. I believe this is also called the taxicab metric. If I try to recall my education, ##\ell1## and ##L1## are different, the first one is called little ell one. The second I believe is the integral version, ##|f(x)|_1=\int|f(x)|dx##. Compare to ##L2
  • #1
venki1130
3
0
Can anyone explain Why l1 Norm is non-differentiable in terms of matrix calculus ?
 
Physics news on Phys.org
  • #2
Because L1 norm is based on the absolute value of the difference, and absolute value |x| has a kink at x=0. It is not differentiable at the kink.
 
  • #3
venki1130 said:
Can anyone explain Why l1 Norm is non-differentiable in terms of matrix calculus ?

I believe venki1130 may have answered your question, but I am personally not sure. When you say l1 norm, do you mean norm of ##(x_1,\dots,x_n)## is ##|x_1|+\cdots+|x_n|##? That is the first definition I found on wikipedia. I believe this is also called the taxicab metric.

If I try to recall my education, ##\ell1## and ##L1## are different, the first one is called little ell one. The second I believe is the integral version, ##|f(x)|_1=\int|f(x)|dx##. Compare to ##L2##, ##|f(x)|_2=(\int|f(x)|^2dx)^{1/2}##. Little ell two, is ##|(x_1,\dots,x_n)|_2=\sqrt{x_1^1+\cdots+x_n^2}##. This is sort of a distance as the crow flies, as opposed to how a taxi drives.

I believe the ##\ell2##-norm has a familiar representation as a matrix, so that is what is confusing me. You asked for a matrix definition of ##\ell1##-norm, when I only know of one for ##\ell2##-norm.

Further, I could not tell you quickly how to use the matrix representation to show you the norm is not differentiable. I would guess that venki1130 pointed you in the right direction. In general, you could show it is not differentiable along any ##x_i=0## face. It would be easiest to check for ##x_2=\cdots=x_n=0##, and ##x_1## near 0. In other words, show ##|x_1|## is not differentiable near zero. Simply care the slopes from the left and right of 0.
 
  • #4
algebrat said:
I believe venki1130 may have answered your question, but I am personally not sure. When you say l1 norm, do you mean norm of ##(x_1,\dots,x_n)## is ##|x_1|+\cdots+|x_n|##? That is the first definition I found on wikipedia. I believe this is also called the taxicab metric.

If I try to recall my education, ##\ell1## and ##L1## are different, the first one is called little ell one. The second I believe is the integral version, ##|f(x)|_1=\int|f(x)|dx##. Compare to ##L2##, ##|f(x)|_2=(\int|f(x)|^2dx)^{1/2}##. Little ell two, is ##|(x_1,\dots,x_n)|_2=\sqrt{x_1^1+\cdots+x_n^2}##. This is sort of a distance as the crow flies, as opposed to how a taxi drives.

I believe the ##\ell2##-norm has a familiar representation as a matrix, so that is what is confusing me. You asked for a matrix definition of ##\ell1##-norm, when I only know of one for ##\ell2##-norm.

Further, I could not tell you quickly how to use the matrix representation to show you the norm is not differentiable. I would guess that venki1130 pointed you in the right direction. In general, you could show it is not differentiable along any ##x_i=0## face. It would be easiest to check for ##x_2=\cdots=x_n=0##, and ##x_1## near 0. In other words, show ##|x_1|## is not differentiable near zero. Simply care the slopes from the left and right of 0.

I think he means this: http://en.wikipedia.org/wiki/Matrix_norm
 
  • #5
Thank you very much micromass, algebrat and Chogg for your response. I and using L1 norm in the optimization problem.

for example: For least squares optimization using L2 norm for regularization the equation I am using is

min ||Ax-b||22 + λ||x||22

Calculating first derivative(using matrix calculus) and equating it to zero results

x= At(AtA+λI)-1b

similarly for L1 norm

min ||Ax-b||22 + λ||x||1

But, People always say it is non differentiable. In fact, I understand the concept (intuitively, the unit circle in l1 has the sharp corner where the function doesn't change so there is no derivative for it) but I want to learn step by step using matrix derivatives.

Again I thank you very much for your help.

Venki
 

FAQ: Why l1 Norm is non-differentiable?

Why is the l1 norm non-differentiable?

The l1 norm is non-differentiable because it is not a smooth function. This means that it has sharp edges and corners, which make it impossible to calculate a unique slope (or derivative) at these points.

Can the l1 norm be approximated with a differentiable function?

Yes, the l1 norm can be approximated with a differentiable function. This is typically done using the soft-thresholding function, which is commonly used in compressive sensing and signal processing. However, this approximation may not be exact and may introduce some errors.

How does the non-differentiability of the l1 norm affect optimization algorithms?

The non-differentiability of the l1 norm can make it more challenging to use in optimization algorithms. This is because traditional gradient-based methods cannot be used to find the minimum of a non-differentiable function. Instead, specialized algorithms, such as subgradient descent, must be used.

Are there any advantages to using a non-differentiable function like the l1 norm?

Yes, there are some advantages to using a non-differentiable function like the l1 norm. One of the main advantages is that it promotes sparsity, meaning that it encourages solutions with many zero values. This can be useful in situations where the data is sparse or when interpretability is important.

Are there any practical applications of the l1 norm's non-differentiability?

Yes, there are many practical applications of the l1 norm's non-differentiability. Some examples include feature selection, compressed sensing, and robust regression. In these applications, the non-differentiability of the l1 norm is leveraged to solve problems that traditional differentiable functions cannot.

Similar threads

Replies
1
Views
3K
Replies
9
Views
3K
Replies
49
Views
4K
Replies
2
Views
7K
Replies
1
Views
2K
Replies
15
Views
1K
Replies
7
Views
5K
Back
Top