A decision tree has an expected depth of at least log n

In summary, a decision tree is a machine learning algorithm that uses a tree-like model to make decisions based on multiple inputs or features. The expected depth of a decision tree refers to the average number of nodes or levels in the tree and is often calculated using the base 2 logarithm. A decision tree with an expected depth of at least log n indicates efficiency and the ability to handle large datasets without becoming overly complex. This results in a more accurate and easier to interpret algorithm, making it a popular choice for machine learning tasks.
  • #1
mathmari
Gold Member
MHB
5,049
7
Hey! :eek:

I am looking at the proof of the following theorem and I have some questions.

The theorem is the following:

On the assumption that all permutations of a sequence of $n$ elements are equally likely to appear as input, any decision tree that sorts $n$ elements has an expected depth of at least $\log n!$.
The proof is the following:

Let $D(T)$ be the sum of the depths of the leaves of a binary rtee $T$.

Let $D(m)$ be the smallest value of $D(T)$ taken over all binary trees $T$ with $m$ leaves.

We shall show, by induction on $m$, that $D(m) \geq m \log m$.

The basis, $m=1$, is trivial.

Now, assume the inductive hypothesis is true for all values of $m$ less than $k$.

Consider a decision tree $T$ with $k$ leaves.

$T$ consists of a root having a left subtree $T_{i}$ with $i$ leaves and a right subtree $T_{k-i}$ with $k-i$ leaves for some $i$, $1 \leq i \leq k$.

Clearly, $$D(T)=i+D(T_{i})+(k-i)+D(T_{k-i})$$

Therefore, the minimum sum is given by $$D(k)=MIN_{1 \leq i \leq k} [k+D(i)+D(k-i)] \ \ \ \ (*)$$

Invoking the inductive hypothesis, we obtain from $(*)$ $$D(k) \geq k+MIN_{1 \leq i \leq k} [i \log i+(k-i \log (k-i)]$$

It is easy to show that the minimum occurs at $i=k/2$. Thus $$D(k) \geq k+k \log \frac{K}{2}=k \log k$$

We conclude that $D(m) \geq m \log m$ for all $m \geq 1$.

Now we claim that a decision tree $T$ sorting $n$ random elements has at least $n!$ leaves.

Moreover, exactly $n!$ leaves will have probability $1/n!$ each, and the remaining leaves will have probability zero.

We may remove from $T$ all vertices that are ancestors only of leaves of probability zero, without changing the expected depth of $T$.

We are thus left with a tree $T'$ having $n!$ leaves each of probability $1/n!$.

Since $D(T') \geq n! \log n!$, the expected depth of $T'$ (and hence of$T$) is at least $(1/n!)n! \log n!=\log n!$.

My questions are the following:

  1. Why do we want to show, by induction, that $D(m) \geq m \log m$ ??
  2. When we consider a decision tree $T$ with $k$ leaves where $T_{i}$ is the left subtree with $i$ leaves and $T_{k-i}$ is the right subtree with $k-i$ leaves, why does it stand that $$D(T)=i+D(T_{i})+(k-i)+D(T_{k-i})$$ ??
  3. How do we conlcude that the minimum sum is given by $$D(k)=MIN_{1 \leq i \leq k} [k+D(i)+D(k-i)]$$ ??
  4. How do we obtain, from the inductive hypothesis, that $$D(k) \geq k+MIN_{1 \leq i \leq k} [i \log i +(k-i) \log (k-i) ]$$ ??
  5. Could you expalin me the part after "Now, assume the inductive hypothesis is true for all values of $m$ less than $k$." ??

(Wondering)
 
Technology news on Phys.org
  • #2
The goal of the proof is to show that any decision tree that sorts $n$ elements has an expected depth of at least $\log n!$. To do this, we use induction to prove that $D(m) \geq m\log m$. This result is then used to show that the expected depth of a decision tree with $n!$ leaves is at least $\log n!$.To prove that $D(m) \geq m\log m$, we consider a decision tree $T$ with $k$ leaves. We note that $T$ consists of a root with a left subtree $T_i$ with $i$ leaves and a right subtree $T_{k-i}$ with $k-i$ leaves for some $i$, $1 \leq i \leq k$. Then, we have $$D(T)=i+D(T_i)+(k-i)+D(T_{k-i})$$ which means the minimum sum is given by $$D(k)=MIN_{1 \leq i \leq k} [k+D(i)+D(k-i)] \ \ \ \ (*).$$Invoking the inductive hypothesis, we obtain from $(*)$ $$D(k) \geq k+MIN_{1 \leq i \leq k} [i \log i+(k-i \log (k-i)]$$ It is easy to show that the minimum occurs at $i=k/2$. Therefore, $$D(k) \geq k+k \log \frac{K}{2}=k \log k$$We conclude that $D(m) \geq m \log m$ for all $m \geq 1$. Now we claim that a decision tree $T$ sorting $n$ random elements has at least $n!$ leaves. Moreover, each of the $n!$ leaves will have probability $1/n!$ and the remaining leaves will have probability zero. We may remove from $T$ all vertices that are ancestors only of leaves of probability zero, without changing the expected depth of $T$. This leaves us with a tree $T'$ with $n!$ leaves each of probability $1/n!$. Since $D(T') \geq n! \log n
 

FAQ: A decision tree has an expected depth of at least log n

What is a decision tree?

A decision tree is a machine learning algorithm that uses a tree-like model to make decisions based on multiple inputs or features.

What is the expected depth of a decision tree?

The expected depth of a decision tree refers to the average number of nodes or levels in the tree. It is a measure of the complexity of the tree and is often used to evaluate the performance of the algorithm.

How is the expected depth of a decision tree calculated?

The expected depth of a decision tree is typically calculated using the logarithm function, specifically the base 2 logarithm. This is because decision trees often follow a binary split, where each node has two branches.

What does it mean if a decision tree has an expected depth of at least log n?

If a decision tree has an expected depth of at least log n, it means that the average number of nodes in the tree increases logarithmically as the number of training instances or features (n) increases. This indicates that the algorithm is efficient and can handle large datasets without becoming overly complex.

What are the benefits of having an expected depth of at least log n in a decision tree?

Having an expected depth of at least log n in a decision tree can result in a more efficient and accurate algorithm. It can handle large datasets without overfitting, and the resulting tree will be simpler and easier to interpret. This makes it a popular choice for machine learning tasks.

Similar threads

Back
Top