Algorithm running time discrepencies

In summary, the bisection search algorithm takes longer to run than the linear search algorithm, but the expected trend is followed thereafter.
  • #1
K29
108
0
Twice I have had to calculate the speed of algorithms depending on input size. The first time I was working in MATLAB(which is based on C) and was timing the difference in time to perform matrix operations on the identity matrix in standard storage and in sparse storage, as I grew the matrix size.

More recently I have been testing (for increasing array input sizes) the running time of the bisection search and the linear search in the worst case.

In all the above experiments I get correct results, for example for bisection search it increases linearly with array size. For bisection search I get a graph looking like log(n). For matrix multiplication under standard storage, operation time increased exponentially, and under sparse, some function of log(n) was observed.

HOWEVER, in all the above experiments the input size was increased AUTOMATICALLY with a loop, and left for a few hours to automatically generate all data. For example, for the linear search it would increase it by 50000 inputs and always look for the element in the last position.

The first call of the searching function/matrix operation always takes much longer than the rest. Only thereafter do I observe the expected trend. In the case of linear search it is always the first two calls of the function.

In the case of MATLAB for matrix operations the algorithms were most definitely correct; it simply consisted of timing A*B with
Code:
tic
and
Code:
toc

In the case of the search algorithms I've verified I've coded them correctly. I am using
Code:
System.nanotime
to time the algorithms.

Why does the first one(or two) runs take so long, while the expected trend only follows thereafter?

I am not running any other software at the same time, nor doing anything other than letting the algorithm run. I do not have a screensaver; UBUNTU just switches off the screen after a while. I have an Intel Celeron 2.13Ghz, and 2 gigs of DDR 2 RAM

For the java bisection search and linear search I am using an ArrayList datastructure. Note that to get() operation on the array list it is O(1) and this is the only operation I perform on it during the linear search.

Also note, that the input size seems to be sufficiently large. It takes a good 15 minutes to sort lists of this size ascending before running and timing the bisection search algorithm.

I have an attached an image of linear search results. You can see after the first two loops it starts following the O(n) trend(for worst case scenario).
 

Attachments

  • Screenshot from 2012-09-21 17:36:58.png
    Screenshot from 2012-09-21 17:36:58.png
    5 KB · Views: 460
Last edited:
Technology news on Phys.org
  • #2
The first call might be getting impacted by cache loading or dynamic memory allocation when the list is being created, but I'm not sure why the second call would be slow. You could try running the first call twice in a row to see of the second instance of the first call runs faster.

The other but less likely issue is that your timer could be wrapping around on the longer calls. How long can the timer run before it wraps around? Do you know which timer is being used to generate these timings?

15 minutes to sort a list? How large is this list, in terms of element size and number of elements?
 
  • #3
rcgldr said:
The first call might be getting impacted by cache loading or dynamic memory allocation when the list is being created, but I'm not sure why the second call would be slow. You could try running the first call twice in a row to see of the second instance of the first call runs faster.

A fresh list is automatically created inside a loop, for different arraysizes. The code follows this pseudocode:
loop{
List of size 50 000 created.
Linear search performed and timed on this list.
List of size 100 000 created.
Linear search performed and timed on this list.
List of size 150 000 created.
Linear search performed and timed on this list.
etc...
} until user-specified number of list-size-increments are done

Does your above statement still hold?
I will still try running a call for size 50000 twice in a row later and let you know...

rcgldr said:
The other but less likely issue is that your timer could be wrapping around on the longer calls. How long can the timer run before it wraps around? Do you know which timer is being used to generate these timings?
The timer takes the System.nanotime before executing the algorithm and after executing the algorithm to determine the duration. This is stored in a 'long'. The time to do a linear search is less than a second (observed) and the resultant duration from the program (a few millliion nanoseconds) corresponds to this. I am not entirely confident it is not wrapping around though. I will look into this as well and post back...
Code:
if(SearchType.contentEquals("LinearSearch")){
			System.out.println("Starting linear search...");
			
			long startTime = System.nanotime();
			int found_index = linearSearch(ready_array, query);
			long endTime = System.nanoTime();
			duration = endTime - startTime;
			
		}
		
		return duration;

duration is of type long. The durations are later stored in an ArrayList<Long> for graph plotting.

rcgldr said:
15 minutes to sort a list? How large is this list, in terms of element size and number of elements?

It is an ArrayList<Integer>. Bubble-sorted for the Bisection Search. It takes 10-15 minutes to sort, when the listsize is > 100 000. It consists of random Integers ranging randomly from -2,147,483,648 to 2,147,483,647.
 
Last edited:
  • #4
K29 said:
It is an ArrayList<Integer>.
I'm not familiar with MATLAB. For C++ STL (Standard Template Library), the list type of classes are implemented as doubly linked lists, while vector type of classes are implemented as arrays, which are much quicker for doing operations like search or sort. I don't know if the same applies to MATLAB. Some example times using the faster sort methods with C++ STL vector class (microsoft sorts are part of its STL, merge sort is one I wrote):

Code:
Time to sort 4,194,304 64 bit elements of psuedo random data:

microsoft sort             461 ms  (quick sort)
merge sort                 539 ms  (merge sort)
microsoft stable_sort      554 ms  (insertion (32 element) / merge sort)

64 bit mode, Windows XP64, Visual Studio 2005 compiler.
Intel core 2 X6800 CPU, 2GB 4-4-4-10 DDR2 ram, Intel D975XBX motherboard.
 
Last edited:
  • #5
rcgldr said:
I'm not familiar with MATLAB. For C++ STL (Standard Template Library), the list type of classes are implemented as doubly linked lists, while vector type of classes are implemented as arrays, which are much quicker for doing operations like search or sort. I don't know if the same applies to MATLAB. Some example times using the faster sort methods with C++ STL vector class (microsoft sorts are part of its STL, merge sort is one I wrote):

Code:
Time to sort 4,194,304 64 bit elements of psuedo random data:

microsoft sort             461 ms  (quick sort)
merge sort                 539 ms  (merge sort)
microsoft stable_sort      554 ms  (insertion (32 element) / merge sort)

64 bit mode, Windows XP64, Visual Studio 2005 compiler.
Intel core 2 X6800 CPU, 2GB 4-4-4-10 DDR2 ram, Intel D975XBX motherboard.
My apologies I forgot to state I'm doing searching and sorting in java :)

The MATLAB project was for timing matrix operations
 
Last edited:
  • #6
I am thinking my sorting is taking long (I have checked; a good hour on lists size 1 000 000) because ArrayList.add(index,item) runs in linear time, O(n)
Inside the bubble sort, this essentially makes the sorting algorithm O(n^4) instead of O(n^2) since I am doing two adds every iteration.

I may have to leave it like this, because the searching uses a get(), and a LinkedList's get takes O(n) time, while the ArrayList get() takes O(1), the purpose is to time the searching, not the sorting.

Unless I sort with a LinkedList and then convert it to an ArrayList. I'm not sure how long that will take

Edit: actually it seems like my best bet is to use a standard array. Every operation I need will be O(1).

I will post back whether this fixes the timing discrepencies. I still need to check for overflow as well.
 
Last edited:
  • #7
If the number of elements or the maximum number of elements is known, you could use ensureCapacity to preallocate memory for the arraylist. This should speed up the adds . As you mentioned, a standard array may be faster. The C++ STL vector isn't much different than a standard array except that it can change in size (implemented by allocating a new block of memory, copying the old vector to the new memory, then releasing the memory for the old vector. usually there's some additional memory allocated so that allocate, copy, release doesn't occur on small increases in size), and there are some methods included for vectors, such as sort or stable_sort.
 
  • #8

FAQ: Algorithm running time discrepencies

What is algorithm running time?

Algorithm running time refers to the amount of time it takes for a computer program to execute a particular algorithm. It is often measured in terms of the number of operations or steps the algorithm takes to solve a problem.

Why do algorithm running times sometimes differ?

Algorithm running times may differ due to a variety of factors such as the input size, the specific implementation of the algorithm, and the efficiency of the computer or programming language being used.

What is the difference between best case, worst case, and average case running time?

Best case running time refers to the minimum amount of time it takes for an algorithm to run, typically when the input is already in the most optimal form. Worst case running time is the maximum amount of time an algorithm takes to run, often when the input is in the most challenging form. Average case running time is the expected time for an algorithm to run with a typical input.

How can algorithm running time be measured?

Algorithm running time can be measured through various methods such as counting the number of operations, recording the amount of time it takes for an algorithm to run on a specific input, or using specific tools and techniques designed for performance analysis.

Why is it important to analyze algorithm running time?

Analyzing algorithm running time is crucial for determining the efficiency of a program and identifying potential bottlenecks or areas for improvement. It can also help in selecting the most appropriate algorithm for a given problem and predicting how a program will perform on different inputs or in different environments.

Similar threads

Back
Top