Does Average Linkage satisfy the properties of metric space?

In summary, the Average Linkage method satisfies the first and second properties of a dissimilarity measure, but it does not satisfy the third property of metric space. A counterexample for this is when A and B contain multiple data points and the average distance between A and B is less than the sum of the average distances within A and within B.
  • #1
getUsername
3
0

Homework Statement



A dissimilarity measure d(x, y) for two data points x and y typically satisfy the following three properties:

1. [itex]d(x, y) ≥ 0[/itex] and [itex]d(x, y) = 0[/itex] if and only if [itex]x = y[/itex]
2. [itex]d(x, y) = d(y, x)[/itex]
3. [itex]d(x,z) ≤ d(x, y) + d(y,z)[/itex]The following method has been proposed for measuring the dissimilarity[read: distance] between two sets of data points A = {xa1, xa2, . . . , xam} and B = {xb1, xb2, . . . , xbn}Average Linkage(UPGMA):

[tex]d(A,B) = {\frac{Ʃ_{x∈A} Ʃ_{y∈B} d(x, y)}{|A||B|}} [/tex]Show that the linkage method satisfies each of the properties or provide a counter example (a visual/diagram representation of any counter example is sufficient if appropriate).
2. Specific Problem
I am having difficulty proving that the Average Linkage method does/does not satisfy the third property of metric space, the triangle inequality.

The Euclidean distance formula, the Absolute distance formula(Manhattan) or any other distance formula may be used to show this.
 
Physics news on Phys.org
  • #2
getUsername said:

Homework Statement



A dissimilarity measure d(x, y) for two data points x and y typically satisfy the following three properties:

1. [itex]d(x, y) ≥ 0[/itex] and [itex]d(x, y) = 0[/itex] if and only if [itex]x = y[/itex]
2. [itex]d(x, y) = d(y, x)[/itex]
3. [itex]d(x,z) ≤ d(x, y) + d(y,z)[/itex]The following method has been proposed for measuring the dissimilarity[read: distance] between two sets of data points A = {xa1, xa2, . . . , xam} and B = {xb1, xb2, . . . , xbn}Average Linkage(UPGMA):

[tex]d(A,B) = {\frac{Ʃ_{x∈A} Ʃ_{y∈B} d(x, y)}{|A||B|}} [/tex]Show that the linkage method satisfies each of the properties or provide a counter example (a visual/diagram representation of any counter example is sufficient if appropriate).
2. Specific Problem
I am having difficulty proving that the Average Linkage method does/does not satisfy the third property of metric space, the triangle inequality.

The Euclidean distance formula, the Absolute distance formula(Manhattan) or any other distance formula may be used to show this.

I don't think it even satisfies the first property that d(A,A)=0 if A has more than one different data point, does it?
 
Last edited:
  • #3
Yes, I think you might be correct about property 1. I sort of glazed over it as I had already proven that property for Single Linkage and Complete Linkage. I didn't consider the case of d(A,A)=0 where A has more than one data point.
 
  • #4
I think property one holds for average linkage.

e.g. A = { (0,1) (0,0) }

average d(A,A) = 0.5

Obviously this is greater than 0 in the case where x = y.

However the property states that d(x,y) can be 0 if and only if x = y.

This is still true because d(x,y) will only equal to 0 when x = y with the added constraint that x will only contain a single data point.

e.g. B = {(0,0)}

average d(B,B) = 0
 

FAQ: Does Average Linkage satisfy the properties of metric space?

1. What is Average Linkage?

Average Linkage is a hierarchical clustering method used to group data points based on their proximity to each other. It calculates the average distance between all pairs of data points in different clusters and then merges the clusters with the shortest average distance.

2. How is Average Linkage related to metric space?

Average Linkage is a method that is commonly used in metric space, which is a mathematical concept that defines the distance between two points. Average Linkage satisfies the properties of metric space, making it a valid method for clustering data points.

3. What are the properties of metric space?

The properties of metric space include symmetry, non-negativity, and the triangle inequality. Symmetry means that the distance between two points is the same regardless of the order in which they are compared. Non-negativity means that the distance between two points is always greater than or equal to zero. The triangle inequality states that the distance between two points is always less than or equal to the sum of the distances between those points and a third point.

4. How does Average Linkage satisfy the properties of metric space?

Average Linkage satisfies the properties of metric space by calculating the average distance between all pairs of data points and using this as the measure of proximity between clusters. This satisfies the symmetry property because the average distance between two clusters is the same regardless of which cluster is compared first. It also satisfies non-negativity because the average distance between clusters is always greater than or equal to zero. Additionally, the triangle inequality is satisfied because the average distance between two clusters will always be less than or equal to the sum of the average distances between each cluster and a third cluster.

5. Why is it important for Average Linkage to satisfy the properties of metric space?

It is important for Average Linkage to satisfy the properties of metric space because it ensures that the clustering method is mathematically valid and produces accurate results. If Average Linkage did not satisfy these properties, it could lead to incorrect clustering and misinterpretation of the data. Additionally, the properties of metric space allow for the use of other mathematical concepts and algorithms in conjunction with Average Linkage, making it a versatile and reliable method for data analysis.

Similar threads

Replies
15
Views
2K
Replies
10
Views
2K
Replies
3
Views
1K
Replies
6
Views
2K
Replies
2
Views
1K
Replies
1
Views
3K
Back
Top