- #1
Avarus
- 12
- 0
Hi all, I'm not quite sure if this is the right place to post my question, so forgive me if its not...
I've written a program in Python that analyses data that I got from a compression experiment (mechanical testing of rocks and such), and I've written a piece of code that estimates the gradient of a given quantity. It seems to work fine, but it is very slow. Whereas the rest of the program does its job in about 10 seconds, this little piece of code takes about 4 minutes to execute.
Note that this is an excerpt that takes up almost 100% of the function's computation time, so I left out the rest. What does this piece do? It starts with a window of size 10, then checks if the values of the first and last datapoint in this window differ more than a given value (so that the difference is significant). If they do, the window size is correct, if they don't, the window size grows by 20 and check again, etc. The if statements in there are to make sure the window does not fall outside the data.
So I know this method works, but I realize this is horribly inefficient, since most window sizes are about 400 in width, some even up to 2000. Having about 500,000 datapoints to process, this loop occurs about 10,000,000 times for the entire dataset. My mathematical basis for these kind of things is not too good, but do you guys perhaps know a more elegant method? Or do you have any other optimization tips?
Thanks
I've written a program in Python that analyses data that I got from a compression experiment (mechanical testing of rocks and such), and I've written a piece of code that estimates the gradient of a given quantity. It seems to work fine, but it is very slow. Whereas the rest of the program does its job in about 10 seconds, this little piece of code takes about 4 minutes to execute.
Code:
# written in Python
while not done:
i += 10
start = n-i
end = n+i
if n-i < 0:
start = 0
end = n+2*i
if n+i > len(data)-1:
end = len(data)-1
start = n-i
if start < 0:
start = 0
if absolute(data[start] - data[end]) >= std/0.05:
done = True
windows[n] = end - start
Note that this is an excerpt that takes up almost 100% of the function's computation time, so I left out the rest. What does this piece do? It starts with a window of size 10, then checks if the values of the first and last datapoint in this window differ more than a given value (so that the difference is significant). If they do, the window size is correct, if they don't, the window size grows by 20 and check again, etc. The if statements in there are to make sure the window does not fall outside the data.
So I know this method works, but I realize this is horribly inefficient, since most window sizes are about 400 in width, some even up to 2000. Having about 500,000 datapoints to process, this loop occurs about 10,000,000 times for the entire dataset. My mathematical basis for these kind of things is not too good, but do you guys perhaps know a more elegant method? Or do you have any other optimization tips?
Thanks