Find Smallest Subtree of G w/ Nodes from S

In summary, according to the summarizer, the problem is NP complete and requires an ideal solution. The cost of non-ideal solutions may be high, and the range of node set sizes the algorithm can work with is unknown. The summarizer suggests starting by identifying all "unavoidable" paths and then trying to find "independent" target nodes or groups of target nodes.
  • #1
Sneaky6666
14
0
I have a unique problem and looking for a known fast algorithm for it. I have an unweighted but directed graph `G`. I then have a subset collection of nodes `S` existing in `G`. I want to find the minimum sub tree in `G` such that it contains all the nodes in `S`.

I so far found Chu-Liu/Edmonds algorithm but it is for finding the MST for all nodes in the graph.

Does anyone know how to do this for a subset of nodes?

Example
Untitled.png

If the picture is G. And S is the node set {1, 6}. Then the solution (smallest sub tree) is the subset graph of {1, 3, 4, 6} since it includes the nodes of S. It wouldn't be {1, 3, 2, 5, 6} (even though it includes the nodes of S) because that is larger.

Thanks
 
Last edited:
Mathematics news on Phys.org
  • #2
The ideal solution to this would be very similar to the traveling salesman problem - which is NP complete.
I very much suspect this is also NP complete.

My exact attack on this problem would depend very much on other specifics such as:
Is only an ideal solution good enough?
What is the cost of non-ideal solutions?
What is the range of node set sizes you will be working with?
What kind of processor power is available and how quickly do you need an answer?
 
  • Like
Likes Sneaky6666
  • #3
I am not familiar with Chu-Liu/Edmonds, but if you start with the nodes M=S is it then not enough to "simply" find shortest path (Dijkstra's algorithm) between each node in M that ends with a different node in M (possibly handling cycles if S is not a tree) and then add those to M?

1 min later: well, I guess its possible to make an S that such that the set M as constructed above will not be the minimal solution. Right, disregard my question please.
 
  • #4
I was thinking about this, and what I had in mind is first we pick a random node in S, and do a BFS traversal from that node, basically storing 2 pieces of data in every node. First is the smallest distance from the current node to the starting node, and a reference of the previous node to the current node (for the shortest distance path).
So while doing the BFS, if it checks a node and sees the distance for it is smaller than a previous calculated distance, then it overwrites the data stored on that node with the better one.

In the end, every node knows its shortest distance, and a reference to the previous node, which allows a path to form from the current node to the starting node. If we take all the paths and combine them, we can construct a sub tree.

What do you all think of this?
 
  • #5
.Scott said:
The ideal solution to this would be very similar to the traveling salesman problem - which is NP complete.
I very much suspect this is also NP complete.

My exact attack on this problem would depend very much on other specifics such as:
Is only an ideal solution good enough?
What is the cost of non-ideal solutions?
What is the range of node set sizes you will be working with?
What kind of processor power is available and how quickly do you need an answer?
I think only ideal, exact solutions are enough. Range of S, could be anything, from 1 node to all nodes in G. We can take our time doing any calculations, even if its 5 seconds and have a very strong processor.
 
  • #6
I would start by identifying all "unavoidable" paths - the ones that, if eliminated, would result in some of the target nodes being isolated from the home node.
Then, each of those unavoidable paths becomes a separate problem (think recursion) and all the target nodes from that unavoidable path can be treated in the parent problem as a single "target node".

Once you are left with a problem (either the original problem or a child problem from the recursion) with no "unavoidable paths", then you need to attempt to find "independent" target nodes or groups of target nodes, node groups which cannot share a path home with any node outside of the group. Those independent groups can be identified by listing all node path segments then following all paths home for each target node (think tree search) and adding the target node id to the list of nodes for each path segment used by that node. So for every node path segment (the line between adjacent nodes), you have a list of target nodes that might find that path segment useful. Then, starting at the home position, work out through the network and look for nodes that have down-tree path segments (DTPSs) (or sets of DTPSs ) with no target node overlaps. The most straight forward way of finding such groups is brute force. If there are 7 DTPSs, go through values 1 to 126 expressed in binary - each of the 7 bits representing whether the corresponding DTPS is part of candidate group 0 or candidate group 1. Then tally the target node coverage for each DTPS group - if there is clean split of the target nodes, then they are independent and can be treated as separate problems (recursion). Once a group split has occurred, each of the two smaller groups should also be tested to see if there aren't three or more independent DTPS groups.

That DTSP group processing can be done in concert with the "unavoidable path" checking. Depending of the characteristics of you problem set, identifying the "unavoidable" before the "independents" may be more or less optimal.

Once you have a problem with no "unavoidables" or "independents", you're probably at the NP point. The most brute force method would simply be to do elimination trials. Pick a path to eliminate then attempt to solve the problem without it. Since all "unavoidables" have already been sidelined, at least one solution will be found. Note its score, restore it, and disable the next one. This scoring process will also be implemented recursively. It will be be similar to tree search where you are looking for the best score.

So this will give you only ideal solutions.
Since you have placed no size limits on S or G and no topology limits on G, there is no guarentee that your "strong processor" will complete this task within 5 minutes or 5 centuries.
 

FAQ: Find Smallest Subtree of G w/ Nodes from S

What is the problem of finding the smallest subtree of G with nodes from S?

The problem involves identifying the smallest subtree within a given graph G that includes all the nodes specified in a set S. The subtree must be connected and contain every node from S while minimizing the total number of nodes or edges in the subtree.

How is the smallest subtree of G with nodes from S defined?

The smallest subtree is typically defined as the minimal connected subgraph that contains all the nodes in the set S. This means it must include all the required nodes while having the least number of additional nodes or edges to maintain connectivity.

What algorithms can be used to solve this problem?

Common algorithms for solving this problem include Depth-First Search (DFS), Breadth-First Search (BFS), and Minimum Spanning Tree (MST) algorithms like Prim's or Kruskal's. Additionally, techniques such as dynamic programming and tree decomposition can be employed to optimize the search for the smallest subtree.

What are the challenges in finding the smallest subtree?

Challenges include handling large graphs where the number of nodes and edges can be substantial, ensuring that the subtree remains connected while minimizing size, and efficiently managing cases where there are multiple potential subtrees that satisfy the conditions. The problem can also be computationally intensive, particularly for dense graphs.

Are there any applications for finding the smallest subtree of G with nodes from S?

Yes, this problem has various applications in network design, such as optimizing communication networks, resource allocation in distributed systems, and constructing efficient data structures in computer science. It is also relevant in biology for phylogenetic tree construction and in social network analysis for identifying key connections.

Similar threads

Back
Top