In this exercise we are going to do some algorithms engineering. You will have to evaluate the performance of the sorting algorithms we have discussed on various inputs.Though you will have to do some programming, the main focus here is not on the programming part, but on the analysis of your experimental results.
We are trying to figure out which algorithms run faster than others by performing experiments. To be fairly confident of the results we must collect a large amount of data, for a variety of inputs as large as possible. Your eventual goal is to produce a spreadsheet that looks like this:
Input size | Insertion sort | Selection sort | Quicksort |
---|---|---|---|
1000 | 23 | 52 | 28 |
2000 | 43 | 72 | 38 |
3000 | 63 | 82 | 48 |
4000 | 83 | 92 | 58 |
... | .. | .. | .. |
What this means is that for random arrays of size 1000 you ran insertion sort a number of times and on average it took 23 seconds for it to sort the input on your computer. The numbers here are completely made up of course, while your numbers should come from actual experiments. Some important details:
There are two important sub-topics that we need to expand on: what data will you use and how you will measure performance
Inputs: I would like you to consider one kind of input: completely random arrays. You should generate an array of n elements simply by producing n random numbers in the range 1..n. Use the rand() function for this (see here for an example).
Measuring time: The most straightforward way to measure time in C++ is using the time() function. It is contained in the header file <time.h>. time() returns the number of seconds elapsed since 1/1/1970. To measure the time it takes for a function to execute you call time() before the function, then call the function and then call time() again. Subtracting the two values gives us the desired time. See an example here. An important point to keep in mind is that since we are measuring time in seconds data points of just a few seconds are likely to be affected by rounding errors. In other words, if something takes 2.5 seconds to execute the above technique will estimate the answer as 2 or 3 seconds, both of which are 20% off. If it takes 100.5 seconds though, the rounding error is at most 1%.
Goal: After you have run all the experiments you should gather the data in one spreadsheet. Then try to summarize the results graphically using charts (Excel skills will come in handy here, but if you don't like Excel I can suggest other freely available tools for the job). Finally, provide a brief evaluation of the results. Do your experiments agree with the theory we discussed in class? Why? Your answer must be supported by your data!
What to submit: Hand in a technical report including a brief description of your methodology, your charts and your conclusions. As a supplemental technical appendix, also submit the one spreadsheet and any programs you wrote. The report and other materials should be handed both electronically (in pdf format only through blackboard) and in person in class.