Complexity for generating strings of length k from CNF grammar

CGandC · Dec 28, 2021

Here's the following code I've written for generating strings of length k given a CNF grammar ( Chomsky Normal Form grammar ) ( The code is a recursive variant of the CYK algorithm ). The algorithm is recursive and uses memoization.

Python:

def generate_language(rule_dict, start_var, k):
    mem = dict()
    return generate_language_rec(rule_dict, start_var, k, mem)def generate_language_rec(rule_dict, var, k, mem):
    if (var, k) in mem:
        return mem[(var, k)]

    s = set()
    if k == 0:
        s.add("")
        mem[(var, k)] = s
        return s

    if k == 1:
        for x in rule_dict[var]:
            if len(x) == 1:
                s.update(x[0])
        mem[(var, k)] = s
        return s

    for var_rule in rule_dict[var]:
        if len(var_rule) == 2:
            X, Y = var_rule[0], var_rule[1]
            for j in range(1, k):
                s1 = generate_language_rec(rule_dict, X, j  , mem)
                s2 = generate_language_rec(rule_dict, Y, k-j, mem)
                s.update( { str1 + str2 for str1 in s1 for str2 in s2 }.union({ str2 + str1 for str1 in s1 for str2 in s2 } ) ) ## concatenating strings in s1 and s2 in both ways , i.e., concatenating strings from st1 with strings from st2 and then I concatenate strings from st2 with strings in st1
    mem[(var, k)] = s
    return s

The algorithm works fine, here's an input example:

Python:

rule_dict = {"S": {"AB", "BC"}, "A": {"BA", "a"}, "B": {"CC", "b"}, "C": {"AB", "a"}}
res = generate_language(rule_dict, "S", 5)

Next, I was asked to analyze both the Time-complexity of the algorithm and its Space-complexity ( Space-complexity excluding the arguments of the function ), under the assumption that the rules set ( rules_dict ) of the grammar is of constant length. I was asked to explain whether the complexities are Linear, Polynomial or Exponential ( find a tight upper-bound ) in

.

My answer was that the both the time and space complexities are Polynomial ( I coudn't explain for the space-complexity). Here's my explanation:

For Time-complexity:

For each recursive call with input

, we pass over all rules whose derivation leads to two variables X and Y,
For variable X we check if it can be used to produce a set of strings of lengths

For variable Y we check if it can be used to produce a set of strings of lengths

Note, because of the Memoization, each recursive call with input

will have

recursive calls.
Hence,
For variable X, we'll have

recursive calls in total
For variable Y, we'll have

recursive calls in total

Since we assumed the size of the set of rules to be constant then the total time-complexity of the algorithm is

so the time-complexity is polynomial.

For Space-complexity:

I couldn't figure it out, maybe the space complexity is also

? I couldn't explain it, but I think the answer lies in the lines:

Python:

                s1 = generate_language_rec(rule_dict, X, j  , mem)
                s2 = generate_language_rec(rule_dict, Y, k-j, mem)

Questions:
What do you think about the time-complexity of the above algorithm? am I correct in my analysis that it is polynomial?
How can I proceed to analyze the space-complexity? I feel lost.

Thanks in advance for any help!

pbuk · Dec 29, 2021

CGandC said:

What do you think about the time-complexity of the above algorithm? am I correct in my analysis that it is polynomial?

It looks credible, but I haven't checked that the analysis matches the code. Why don't you check it experimentally?

CGandC said:

How can I proceed to analyze the space-complexity? I feel lost.

Add up the size of all the memoized items?

CGandC · Dec 30, 2021

1.
I've checked the running time of the code experimentally for different inputs, the code part of the analysis is:

Python:

rule_dict = {"S": {"AB", "BC"}, "A": {"BA", "a"}, "B": {"CC", "b"}, "C": {"AB", "a"}}
t_0 = time.perf_counter()
strings  = generate_language(rule_dict, "S", 10)
t_1 = time.perf_counter()
print(" DELTA ",t_1-t_0)

For k=5: DELTA 0.0005285000000001538
For k=6: DELTA 0.001191899999999968
For k=7: DELTA 0.0022890999999999884
For k=8: DELTA 0.004781599999999997
For k=9: DELTA 0.011498800000000031
For k=10: DELTA 0.022812100000000002

It seems the time-complexity is linear ( of the form y=2x ) Instead of polynomial. Maybe I was wrong in my analysis that the time-complexity is polynomial?

2. I think the Space-complexity will be linear, explanation:
The depth of the recursion tree is

.
The function 'generate_language_rec' has a space-complexity of

if we disregard recursive calls and input sizes.
We store all recursive calls for a constant number of variables ( since we assumed the size of the set of rules to be constant ) in a memoization table.
For input

, for two variables X, Y, for each of them will be produced

recursive calls ( the calls will be with inputs

for X,

for Y ), each such call will be put in a memoization table. So each variable causes

calls in total ( which is what determines the depth of the recursion ) because of the memoization , hence, since the size of the set of rules is constant, the total space complexity is

( Maybe the explanation is a little hard to understand, but I could write it better if I sat more time on it )

pbuk · Dec 30, 2021

CGandC said:

I've checked the running time of the code experimentally for different inputs

For a number of reasons measuring execution time is not a good way of experimentally estimating time complexity, particularly with small values of n. Instead set a counter inside the tightest loop and consider the theoretical time complexity of the slowest action in that loop.

CGandC said:

I think the Space-complexity will be linear

This is more reliable experimentally: in this case just print sum(map(len, mem)) at the end.

Complexity for generating strings of length k from CNF grammar

Similar threads

Hot Threads

Recent Insights