Why it doesn't sum to one in this simple naive Bayes classification?

In summary: Third, when we have only 3 features, there is only a 1/3 chance of any given fruit being Orange. So P(Other|Long, Sweet or Yellow) + P(Banana|Long, Sweet or Yellow) = 0.25/0.27 + 0.01/0.27 = 0.26/0.27 < 1.
  • #1
Karagoz
52
5
Summary:: When we have only three classes (Orange, Banana and Other) and three features (Long, Sweet and
Yellow), why P(Other|Long, Sweet, Yellow) + P(Banana|Long, Sweet, Yellow) is not equal to 1 when P(Orange|Long, Sweet, Yellow) = 0 ?

In this example:
https://towardsdatascience.com/all-about-naive-bayes-8e13cef044cf

There's an example of data on fruits with different features, and how do predict probability of what class fruit is it given some features. There are similar guides online using similar examples.

There are only 3 classes of fruits. Banana, Orange and Other.
And we have only 3 features; long, sweet and yellow.

P(Orange|Long,Sweet,Yellow) = 0
The probability given fruit is Orange are zero because the Probability of Orange when given fruit is long are zero.

P(Banana|Long, Sweet, Yellow) = 0.25 / 0.27

P(Other|Long, Sweet, Yellow) = 0.01 / 0.27

But: P(Other|Long, Sweet, Yellow) + P(Banana|Long, Sweet, Yellow) = 0.25/0.27 + 0.01/0.27 = 0.26/0.27 < 1

If the features are given as "long, sweet and yellow" it's impossible to be an orange. It must be either banana or other when features are "long, sweet and yellow".
If the features are given as "long, swet and yellow", then it must be either a banana or "other".

But why the P(Other|Long, Sweet, Yellow) + P(Banana|Long, Sweet, Yellow) is not equal to 1?
Shouldn't it be equal to 1? Also P(Other|Long, Sweet, Yellow) + P(Banana|Long, Sweet, Yellow) = 1 ?

[Moderator's note: moved from a technical forum.]
 
Physics news on Phys.org
  • #2
Karagoz said:
Summary:: When we have only three classes (Orange, Banana and Other) and three features (Long, Sweet and
Yellow), why P(Other|Long, Sweet, Yellow) + P(Banana|Long, Sweet, Yellow) is not equal to 1 when P(Orange|Long, Sweet, Yellow) = 0 ?

In this example:
https://towardsdatascience.com/all-about-naive-bayes-8e13cef044cf
First, do not use undefined commas in a probability like Long, Sweet, Yellow. Do you mean "and" or "or"?
Second, P(Orange|Long or Sweet or Yellow) does not = 0
 
  • Like
Likes Karagoz
  • #3
That's just a rounding error.
 
  • Like
Likes Karagoz

FAQ: Why it doesn't sum to one in this simple naive Bayes classification?

Why is the sum of probabilities not equal to one in naive Bayes classification?

The sum of probabilities in naive Bayes classification is not equal to one due to the naive assumption that the features are independent of each other. This assumption does not hold true in most real-world scenarios, leading to a decrease in the overall accuracy of the model and resulting in a sum of probabilities that is less than one.

How does the naive assumption affect the sum of probabilities in naive Bayes classification?

The naive assumption that the features are independent of each other causes the probabilities to be calculated separately for each feature, rather than considering the joint probability of all features. This can result in an underestimation of the overall probability, leading to a sum of probabilities that is less than one.

Can the sum of probabilities be greater than one in naive Bayes classification?

No, the sum of probabilities cannot be greater than one in naive Bayes classification. This is because the probabilities are calculated based on the assumption that the features are independent, and therefore the joint probability of all features cannot be greater than the individual probabilities.

How does the sum of probabilities affect the accuracy of naive Bayes classification?

The sum of probabilities in naive Bayes classification can affect the accuracy of the model. If the sum is significantly less than one, it may indicate that the naive assumption does not hold true for the data, resulting in a decrease in accuracy. However, if the sum is close to one, it may indicate that the naive assumption is valid and the model is accurate.

Can the sum of probabilities be used to evaluate the performance of naive Bayes classification?

The sum of probabilities alone cannot be used to evaluate the performance of naive Bayes classification. It is important to consider other metrics such as accuracy, precision, and recall to fully evaluate the performance of the model. The sum of probabilities can provide insight into the validity of the naive assumption and help identify potential issues with the model, but it should not be the sole factor in evaluating performance.

Back
Top