Generating Prime Numbers in a Range (Was: LINQ Challenge), Part 1 – Introduction
Justin Etheredge posted a nice LINQ challenge – write a LINQ query, using whatever chained lambdas you want, but no custom LINQ methods, to generate the prime numbers in a given range.
Over at his blog, Justin analyzes a couple of possible implementations with real LINQ queries (although the more efficient of them will have fairly complex lambdas in the query). I, on the other hand, would like to focus on the lessons you can learn from this kind of simple exercise, without using LINQ queries for now.
Here are the rules of engagement for this mini-series of posts: We want to implement a method that takes an IEnumerable<int> which is the range of numbers and returns an int[] which is an array of all prime numbers within that range. [It would be unfair to some algorithms if we require that an enumerable is returned, as it would be unfair to return a sieve that specifies which numbers are prime and which aren’t.]
Primality-wise, 1 is not a prime, but it doesn’t matter because no one will pass a range that starts with 1; 2 is a prime, so it 3, and so on.
Without further ado, here’s the most naive thing conceivable:
static int[] Full(IEnumerable<int> range)
{
return range.Where(n =>
{
if (n == 2) return true;
for (int i = 2; i < n; ++i)
if (n % i == 0)
return false;
return true;
}).ToArray();
}
This is rather horrible; we iterate over the entire sequence and for each element, check whether it’s prime by going through all possible candidates for a composite divisor and checking them individually. The ToArray() call at the end materializes the query.
Let’s attend to the complexity of this method. It does not use any additional memory (more accurately, it uses only a constant amount of memory that is not a function of the range size), which is a positive characteristic. However, its runtime complexity leaves much to be desired. Here’s a simple upper bound on its runtime, assuming that the range is {1…N} for some N:
For each number k in {1…N}, the algorithm performs k iterations of the loop if the number is prime, and less than k iterations if the number is composite. Therefore, the total number of steps performed by the algorithm is at most N(N+1)/2, or O(N^2). This is not a tight upper bound, because not all numbers in the range are prime (in fact, 50% of the numbers require only one step of the loop; another 33% of the numbers require only two steps; etc.).
Another approach for estimating the runtime complexity is evaluating the number of primes in the range {1…N}, because these are the numbers that are expensive to test. A known result for the number of primes less than X is roughly 2Sqrt[x]/Log[X]. For each of these numbers k we perform roughly k iterations inside the loop; the biggest possible one is N and therefore we perform at most N(2Sqrt[N]/Log[N]) operations, or O(N^(3/2)/Log[N])). This is much better than O(N^2) and is a tighter bound on the runtime complexity of our algorithm. It really grows much slower – here’s a plot of the two functions:
This doesn’t change the fact that it’s really horribly slow. Here are some results for relatively small values of N, times shown in milliseconds: [measured on my machine, of course]
| N | Runtime (ms) |
| 9998 | 52 |
| 12998 | 81 |
| 16898 | 139 |
| 21968 | 277 |
| 28559 | 371 |
| 37127 | 603 |
| 48265 | 954 |
| 62745 | 1554 |
| 81569 | 3051 |
There are some significant improvements that can be made to this. The most trivial of all is known to any high school student: If you’re checking whether K is prime, you don’t have to test all the factors {2…K} – it’s enough to test all the factors {2…Floor[Sqrt[K]]} and you’re good. We’ll attend to that in the next installment.