Distances in Knn: Are they metric functions in the Mathematical sense?

In summary, the distances in Knn ; K nearest neighbors, in Machine Learning, are not required to be metrics in the mathematical sense.
  • #1
WWGD
Science Advisor
Gold Member
7,376
11,342
TL;DR Summary
The term 'distance' is used in Knn. Are these distance functions required to satisfy the properties of a metric?
Hi,
Just curious as to whether distances 'd' , used in Knn ; K nearest neighbors, in Machine Learning, are required to be metrics in the Mathematical Sense, i.e., if they are required to satisfy, in a space A:
##d: A \times A \rightarrow \mathbb R^{+} \cup \{0\} ;
d(a,a)=0 ;
d(a,b)=d(b,a) ; d(a,c) < d(a,b)+d(b,c) ##
?
 
Physics news on Phys.org
  • #2
Suggestion: Expand on what you are talking about.
 
  • #3
mathman said:
Suggestion: Expand on what you are talking about.
Sure. We're trying to ; a binary classification 0/1 , of test subjects with the following method.
I am simplifying somewhat

Basically an object belongs to the class of objects it is closest to, by some fixed criterion d.

0) Choose a positive integer k
1) We have objects known to belong to both class 0 and class 1
2) We have a collection of objects to test on whether they belong to class 0 or class 1.
3)We have a real -valued non-negative binary function d AxB --> R ; A is in 2), B is in 1)
4) We find the value of d(a,B) , i.e., { d(a,b) : a fixed, for all b in B}
5) a is assigned to the class of the object b for which d(a,b) is the smallest.

I suspect @pbuk, @Office_Shredder or @Dale may know.
 
Last edited:
  • #4
WWGD said:
... ##d(a,c) < d(a,b)+d(b,c) ##
I think you mean ## d(a,c) \le d(a,b)+d(b,c) ##.

I am not sure we can say that the distance functions are required to satisfy any conditions, however we are only interested in functions that perform well and it may be possible to show that certain conditions must be satisfied in order for this to be the case.

Many metrics in ML do satisfy the triangle equality, however I don't think this is true for the cosine distance and I have a hunch this is also the case for some of the distances in boolean spaces e.g. Kuslinski distance.

Taking the other contemplated 'requirements':
  • ## d: A \times A \rightarrow \mathbb R^{+} \cup \{0\} ; ## clearly not the case for boolean metrics.
  • ## d(a,a)=0 ; ## I don't see this as necessary: Euclidian distance + 1 would work almost as well as Euclidean distance in most cases.
  • ## d(a,b)=d(b,a)## I'll leave a counterexample for this to you :-p
 
  • Like
Likes WWGD
  • #6
  • Like
Likes jedishrfu and WWGD
  • #7
I agree with the clever observation that euclidean distance +1 results in the exact same algorithm but does not satisfy the triangle inequality.

Cosine similarity does not satisfy the triangle inequality either, but does have the property that if A and B are similar, then the similarity of A and C is about the same as the similarity of B and C. Otherwise you can't really get clusters of points that are similar to each other and dissimilar from the other clusters.
 
Last edited:
  • Like
Likes pbuk and WWGD

FAQ: Distances in Knn: Are they metric functions in the Mathematical sense?

What is a metric function in the mathematical sense?

A metric function, also known as a distance function, is a mathematical function that measures the distance between two points in a given space. It is used to quantify the similarity or dissimilarity between objects or data points.

How are distances in Knn calculated?

Distances in Knn (K-nearest neighbors) are typically calculated using a metric function such as Euclidean distance, Manhattan distance, or cosine distance. These functions take into account the numerical values of the features in the data set to determine the distance between two data points.

Why are metric functions important in Knn?

Metric functions are important in Knn because they are used to determine the nearest neighbors of a given data point. The distance between data points is a crucial factor in the Knn algorithm, as it helps to identify the most similar data points and make accurate predictions.

Are all metric functions suitable for use in Knn?

No, not all metric functions are suitable for use in Knn. The chosen metric function should be appropriate for the type of data being analyzed and should satisfy certain mathematical properties, such as the triangle inequality and non-negativity.

Can the choice of metric function affect the performance of Knn?

Yes, the choice of metric function can significantly affect the performance of Knn. Different metric functions may produce different results, and the most suitable one depends on the specific data set and the problem being solved. It is important to experiment with different metric functions to find the one that yields the best results.

Back
Top