Decision Theory: Discriminant function for 2D Gaussians

In summary, decision theory is a branch of mathematics and statistics that involves analyzing options and potential outcomes to make decisions. A discriminant function is a mathematical function used in decision theory to classify data points into two groups based on their position in a 2D space. In the context of 2D Gaussians, it uses the properties of a 2D Gaussian distribution to calculate the probability of a data point belonging to each group and assign it to the group with the higher probability. The assumptions for using a discriminant function for 2D Gaussians include normally distributed data points and different mean and standard deviation values for the two groups. This method has applications in pattern recognition, machine learning, and data classification, as well as fields such as
  • #1
Master1022
611
117
Homework Statement
Two classes ## C_1 ## and ## C_2 ## have equal priors. The likelihoods of ## x## belonging to each class are given by 2D normal distributions with different means, but the same covariance: [tex] p(x|C _1) = N(\mu_x, \sum) \text{and} p(x|C_2) = N(\mu_y, \sum) [/tex]
where we know the relationship between ## \mu_x ## and ## \mu_y ##
Determine the shape of the discriminant curve.
Relevant Equations
[tex] g(x) = ln \left( \frac{p(C_1 | x)}{p(C_2 | x)} \right) [/tex]
Hi,

I was working on the following problem:
Two classes ## C_1 ## and ## C_2 ## have equal priors. The likelihoods of ## x## belonging to each class are given by 2D normal distributions with different means, but the same covariance: [tex] p(x|C _1) = N(\mu_x, \Sigma) \text{and} p(x|C_2) = N(\mu_y, \Sigma) [/tex]
where we know the relationship between ## \mu_x ## and ## \mu_y ##
Determine the shape of the discriminant curve.


In a previous part of the question, we are told that ## y = Ax + t ## and thus we know that ## \mu_y = A \mu_x + t ## and ## \Sigma_y = A \Sigma_x A^T ##

Attempt:
From some online lecture notes, I gather that the shape of this discriminant curve should be a hyperplane, but I now want to verify this to be the case.

We can define ## g(x) ## as follows:

[tex] g(x) = ln \left( \frac{p(C_1 | x)}{p(C_2 | x)} \right) \rightarrow ln \left( \frac{p(x | C_1)}{p(x | C_2)} \right) + ln \left( \frac{p(C_1)}{p(C_2)} \right) [/tex]
Given that the priors are equal, then the second term is 0. Thus we are left with the first term. For the first term, the components can be defined as:

[tex] p(x | C_i ) = \frac{1}{(2 \pi) |\Sigma_i|^{1/2}} exp\left( -\frac{1}{2} (x - \mu_i)^T \Sigma_i ^{-1} (x - \mu_i) \right) [/tex]
Given that ## \Sigma_x = \Sigma_y ##, then we can just separate the logarithms and look at the terms in the exponents as:
[tex] g(x) = -(x - \mu_x)^T \Sigma^{-1} (x - \mu_x) + (x - \mu_y)^T \Sigma^{-1} (x - \mu_y) [/tex]
The discriminant curve should be where the classes are equiprobable, and thus ## g(x) = ln \left( \frac{p(C_1 | x)}{p(C_2 | x)} \right) = 0 ##
[tex] 0 = -(x - \mu_x)^T \Sigma^{-1} (x - \mu_x) + (x - \mu_y)^T \Sigma^{-1} (x - \mu_y) [/tex]

Now I suppose there are two ways to proceed:
1. Algebra
2. There is a hint about transforming ## \Sigma ## to the identity matrix, but I am not sure (a) how to properly do that, and (b) how that can help us. How could I do this second method?

Given that I don't quite understand how to do the transformation of the covariance matrix, I will continue with the algebra:
[tex] 0 = - ( x^T \Sigma^{-1} x - x^T \Sigma^{-1} \mu_x - \mu_x ^T \Sigma^{-1} x + \mu_x ^T \Sigma^{-1} \mu_x) + ( x^T \Sigma^{-1} x - x^T \Sigma^{-1} \mu_y - \mu_y ^T \Sigma^{-1} x + \mu_y ^T \Sigma^{-1} \mu_y) [/tex]
[tex] 0 = 2 x^T \Sigma^{-1} \mu_x - \mu_x ^T \Sigma^{-1} \mu_x - 2 x^T \Sigma^{-1} \mu_y + \mu_y ^T \Sigma^{-1} \mu_y [/tex]
[tex] 0 = (\mu_x - \mu_y)^T \Sigma^{-1} (\mu_x - \mu_y) + 2 x^T \Sigma^{-1} (\mu_x - \mu_y) [/tex]

We know that: ## \mu_x - \mu_y = \mu_x (I - A) - t ##, but I am not really sure how to proceed from here. I see that the second two terms are common to both terms, but I am not sure how this brings us closer to seeing that this is a hyperplane.

Any help would be greatly appreciated.
 
Last edited:
Physics news on Phys.org
  • #2


Hi there,

Thanks for your post! It seems like you have a good understanding of the problem and are on the right track. To answer your question about the transformation of the covariance matrix, the hint is referring to a technique called "whitening" or "decorrelation". This is a common technique used in machine learning and pattern recognition to simplify data and make it easier to work with. Essentially, we want to transform our data (in this case, the covariance matrix) into a simpler form, such as an identity matrix. This can help us better understand the relationships between the variables and make our calculations easier.

To do this, we can use the following formula:
\Sigma_w = \Sigma^{-1/2} \Sigma \Sigma^{-1/2}
where \Sigma_w is the whitened covariance matrix and \Sigma^{-1/2} is the inverse square root of the original covariance matrix.

Once we have the whitened covariance matrix, we can use it in our calculations to simplify the equations and hopefully make it easier to see that the discriminant curve is indeed a hyperplane. I hope this helps! Let me know if you have any other questions or if you need further clarification. Good luck with your problem!
 

FAQ: Decision Theory: Discriminant function for 2D Gaussians

What is decision theory?

Decision theory is a branch of mathematics and statistics that focuses on making optimal decisions in situations where there is uncertainty or risk involved. It involves using mathematical models and statistical techniques to analyze different decision-making scenarios and determine the best course of action.

What is a discriminant function?

A discriminant function is a mathematical function that is used in decision theory to classify data into different groups or categories. It takes in a set of input variables and assigns them to one of several predefined classes based on their characteristics.

What are 2D Gaussians?

2D Gaussians, also known as bivariate Gaussian distributions, are probability distributions that describe the likelihood of a set of data points falling within a two-dimensional space. They are often used in decision theory to model data that has two continuous variables.

How is a discriminant function used for 2D Gaussians?

In the context of decision theory, a discriminant function for 2D Gaussians is used to determine the decision boundary between two classes of data points. It takes in the mean and covariance matrix of each class and calculates the optimal decision boundary that maximizes the separation between the two classes.

What are some applications of decision theory and discriminant functions?

Decision theory and discriminant functions have a wide range of applications in various fields, such as finance, economics, psychology, and engineering. They are commonly used in market analysis, risk management, medical diagnosis, and pattern recognition, among others.

Similar threads

Replies
17
Views
2K
Replies
1
Views
873
Replies
1
Views
644
Replies
11
Views
2K
Replies
1
Views
877
Replies
24
Views
2K
Replies
2
Views
1K
Replies
43
Views
4K
Back
Top