Complexity for generating strings of length k from CNF grammar

In summary, the algorithm in the given code is a recursive variant of the CYK algorithm for generating strings of length k from a CNF grammar. It uses memoization to optimize its time complexity and has a polynomial time-complexity of O(k^2). However, the space complexity is more difficult to determine, but it is likely also polynomial.
  • #1
CGandC
326
34
Homework Statement
For the algorithm below, explain whether the time and space complexities are Linear, Polynomial or Exponential ( find a tight upper-bound ) in ##k ##. Assume the size of the rules set is constant.
Relevant Equations
Big-Oh notation, context-free languages
Here's the following code I've written for generating strings of length k given a CNF grammar ( Chomsky Normal Form grammar ) ( The code is a recursive variant of the CYK algorithm ). The algorithm is recursive and uses memoization.

Python:
def generate_language(rule_dict, start_var, k):
    mem = dict()
    return generate_language_rec(rule_dict, start_var, k, mem)def generate_language_rec(rule_dict, var, k, mem):
    if (var, k) in mem:
        return mem[(var, k)]

    s = set()
    if k == 0:
        s.add("")
        mem[(var, k)] = s
        return s

    if k == 1:
        for x in rule_dict[var]:
            if len(x) == 1:
                s.update(x[0])
        mem[(var, k)] = s
        return s

    for var_rule in rule_dict[var]:
        if len(var_rule) == 2:
            X, Y = var_rule[0], var_rule[1]
            for j in range(1, k):
                s1 = generate_language_rec(rule_dict, X, j  , mem)
                s2 = generate_language_rec(rule_dict, Y, k-j, mem)
                s.update( { str1 + str2 for str1 in s1 for str2 in s2 }.union({ str2 + str1 for str1 in s1 for str2 in s2 } ) ) ## concatenating strings in s1 and s2 in both ways , i.e., concatenating strings from st1 with strings from st2 and then I concatenate strings from st2 with strings in st1
    mem[(var, k)] = s
    return s

The algorithm works fine, here's an input example:
Python:
rule_dict = {"S": {"AB", "BC"}, "A": {"BA", "a"}, "B": {"CC", "b"}, "C": {"AB", "a"}}
res = generate_language(rule_dict, "S", 5)

Next, I was asked to analyze both the Time-complexity of the algorithm and its Space-complexity ( Space-complexity excluding the arguments of the function ), under the assumption that the rules set ( rules_dict ) of the grammar is of constant length. I was asked to explain whether the complexities are Linear, Polynomial or Exponential ( find a tight upper-bound ) in ##k ##.

My answer was that the both the time and space complexities are Polynomial ( I coudn't explain for the space-complexity). Here's my explanation:

For Time-complexity:

For each recursive call with input ##k## , we pass over all rules whose derivation leads to two variables X and Y,
For variable X we check if it can be used to produce a set of strings of lengths ## 1,...,k-1 ##
For variable Y we check if it can be used to produce a set of strings of lengths ## 1,...,k-1 ##

Note, because of the Memoization, each recursive call with input ## k ## will have ## O(2(k-1))## recursive calls.
Hence,
For variable X, we'll have ## \sum_{i=0}^{k-1} O(2(i-1)) = O(k^2) ## recursive calls in total
For variable Y, we'll have ## \sum_{i=0}^{k-1} O(2(i-1)) = O(k^2) ## recursive calls in total

Since we assumed the size of the set of rules to be constant then the total time-complexity of the algorithm is ## O(k^2) + O(k^2) = O(k^2) ##
so the time-complexity is polynomial.

For Space-complexity:

I couldn't figure it out, maybe the space complexity is also ## O(k^2) ##? I couldn't explain it, but I think the answer lies in the lines:
Python:
                s1 = generate_language_rec(rule_dict, X, j  , mem)
                s2 = generate_language_rec(rule_dict, Y, k-j, mem)
Questions:
What do you think about the time-complexity of the above algorithm? am I correct in my analysis that it is polynomial?
How can I proceed to analyze the space-complexity? I feel lost.

Thanks in advance for any help!
 
Last edited:
Physics news on Phys.org
  • #2
CGandC said:
What do you think about the time-complexity of the above algorithm? am I correct in my analysis that it is polynomial?
It looks credible, but I haven't checked that the analysis matches the code. Why don't you check it experimentally?

CGandC said:
How can I proceed to analyze the space-complexity? I feel lost.
Add up the size of all the memoized items?
 
  • Like
Likes CGandC
  • #3
1.
I've checked the running time of the code experimentally for different inputs, the code part of the analysis is:

Python:
rule_dict = {"S": {"AB", "BC"}, "A": {"BA", "a"}, "B": {"CC", "b"}, "C": {"AB", "a"}}
t_0 = time.perf_counter()
strings  = generate_language(rule_dict, "S", 10)
t_1 = time.perf_counter()
print(" DELTA ",t_1-t_0)
For k=5: DELTA 0.0005285000000001538
For k=6: DELTA 0.001191899999999968
For k=7: DELTA 0.0022890999999999884
For k=8: DELTA 0.004781599999999997
For k=9: DELTA 0.011498800000000031
For k=10: DELTA 0.022812100000000002

It seems the time-complexity is linear ( of the form y=2x ) Instead of polynomial. Maybe I was wrong in my analysis that the time-complexity is polynomial?

2. I think the Space-complexity will be linear, explanation:
The depth of the recursion tree is ## O(k) ##.
The function 'generate_language_rec' has a space-complexity of ## O(1) ## if we disregard recursive calls and input sizes.
We store all recursive calls for a constant number of variables ( since we assumed the size of the set of rules to be constant ) in a memoization table.
For input ## k ##, for two variables X, Y, for each of them will be produced ## O(2(k-1)) \in O(k) ## recursive calls ( the calls will be with inputs ##1,...,k-1## for X, ## 1,...,k-1 ## for Y ), each such call will be put in a memoization table. So each variable causes ## O(k) ## calls in total ( which is what determines the depth of the recursion ) because of the memoization , hence, since the size of the set of rules is constant, the total space complexity is ## O(k) \cdot O(1) \in O(k) ##

( Maybe the explanation is a little hard to understand, but I could write it better if I sat more time on it )
 
  • #4
CGandC said:
I've checked the running time of the code experimentally for different inputs
For a number of reasons measuring execution time is not a good way of experimentally estimating time complexity, particularly with small values of n. Instead set a counter inside the tightest loop and consider the theoretical time complexity of the slowest action in that loop.
CGandC said:
I think the Space-complexity will be linear
This is more reliable experimentally: in this case just print sum(map(len, mem)) at the end.
 

FAQ: Complexity for generating strings of length k from CNF grammar

1. How does complexity for generating strings of length k from CNF grammar differ from other grammars?

The complexity for generating strings of length k from CNF (Chomsky Normal Form) grammar is generally lower compared to other grammars. This is because CNF grammar has strict rules that limit the number of symbols and productions that can be used, making it more efficient for generating strings.

2. What is the time complexity for generating strings of length k from CNF grammar?

The time complexity for generating strings of length k from CNF grammar is O(k3). This means that the time it takes to generate a string of length k using CNF grammar increases at a cubic rate as k increases.

3. Can CNF grammar generate all possible strings of length k?

No, CNF grammar cannot generate all possible strings of length k. CNF grammar has strict rules that limit the number of symbols and productions that can be used, so it can only generate a subset of all possible strings of length k.

4. How does the size of the grammar affect the complexity for generating strings of length k from CNF grammar?

The size of the grammar directly affects the complexity for generating strings of length k from CNF grammar. As the size of the grammar increases, the time complexity also increases. This is because larger grammars have more rules and symbols, which require more time to process and generate strings.

5. Are there any practical applications for CNF grammar in generating strings of length k?

Yes, CNF grammar has practical applications in various fields such as natural language processing, computer science, and linguistics. It is commonly used for parsing and generating sentences in computer programs, as well as in the development of artificial intelligence and language models.

Similar threads

Replies
10
Views
2K
Replies
1
Views
3K
Replies
7
Views
4K
Replies
75
Views
5K
Replies
1
Views
3K
Back
Top