Emmanuel García: AGREP (Aproximate GREP) Performance Analysis

jueves, 28 de febrero de 2013

AGREP (Aproximate GREP) Performance Analysis

For extra points, I took the task of testing AGREP with different length texts (T) searching for a certain pattern (P).

For this I used the same function that generates a certain length T, and then searched for a pattern in there. Initially this was very random, because with lengths of 100-1000 or so, the time variation was pretty much random depending on how busy the CPU was. This way a word with length 100 could had been searched slower than a word with length 1000.

To fix this, the T samples were generated from 1000 to 100000 characters each. Also, because 1 character or 20 of difference still doesn't make a difference, the samples are generated jumping 100 characters one T to the next. So, we generate T's with 1000, 1100, 1200, 1300, etc. length.

This way we can plot the T's length along with the time it took to search for P in there. The result of plotting that data is:

As we can see with a 80, 000 character T, the times are still pretty good, aproximating 0.06 seconds in the search of a pattern.

Now, comparing with the Knutt-Morris-Pratt and the Boyer Moore algorithms:

KMP behaves the worst of the three, with BM placing second.

In conclusion, AGREP is a pretty good string searching tool, which has great computing times, compared to my other implemented string searching algorithms.

Code used to generate the computing times:

Code used to graph the data (adjusting sizes and stuff for each algorithm):