Generating Prime Numbers in a Range (Was: LINQ Challenge), Part 3 – Sieve Me One
It’s time for the big guns. We’ve already seen simple optimizations to the problem of generating the prime numbers in a range.
A completely different approach, the Sieve of Eratosthenes, is based on a very simple idea. Write down all the numbers in the range {1…N}. Mark 2 as prime. Now strike out every multiple of 2 as composite. Move on to the next number that is not marked as composite yet, and strike out every multiple of it as composite. (The next number, incidentally, is 3; then 5; then 7; and so on.)
After Sqrt[N] steps you have a list of all prime numbers; these are the numbers you did not strike out as composite. An implementation:
static int[] Sieve(IEnumerable<int> range)
{
int highBound = range.Last();
bool[] sieve = new bool[highBound];
int sqrt = (int)Math.Sqrt(highBound);
for (int i = 2; i <= sqrt; ++i)
{
if (sieve[i - 1])//this was already marked as composite
continue;
for (int j = 2 * i; j <= sieve.Length; j += i)
sieve[j - 1] = true;//composite
}
sieve[1 - 1] = true;//1 is not prime
List<int> numbers = new List<int>();
for (int i = 1; i <= sieve.Length; ++i)
if (!sieve[i - 1])
numbers.Add(i);
return numbers.ToArray();
}
The biggest drawback of using a sieve is its memory utilization. We’re using O(N) bytes of memory to store Boolean variables representing whether the number is prime or composite. [To save initialization costs, I assume that sieve[i] is false if the number is prime; otherwise, we’d have to initialize the sieve to true, which is a costly operation. BTW, it’s possible to use a single bit per integer instead of a whole bool. This is left as an exercise for the reader.]
As for the runtime complexity, well, we’re doing at most Sqrt[N] iterations through the sieve with values {2…Sqrt[N]}. For each value k in this range, we perform (N/k)-1 operations, marking the elements in the sieve as composite. One attempt to arrive to an upper bound is by saying that we have at most Sqrt[N] iterations and each iteration performs at most N/2 operations, hence we have ourselves an O(N^(3/2)) algorithm. However, this upper bound is very far from being tight, and the real upper bound is somewhere between O(N) and O(N(Log[Log[N]])).
Fortunately, the sieve fares very well in comparison to the other alternatives, because the whole purpose of our little exercise is to generate prime numbers in a big range. If we were concerned with primality testing of a small set of numbers, a sieve wouldn’t fit our purposes*; here, however, it scales much better than the iterative alternatives.
Here are some results: [run times in ms, “Full”, “Sqrt” and “SqrtN2” refer to the algorithms presented in the previous posts]
| N | Full | Sqrt | SqrtN2 | Sieve |
| 9998 | 52 | 2 | 1 | 0 |
| 12998 | 81 | 3 | 1 | 0 |
| 16898 | 139 | 4 | 2 | 0 |
| 21968 | 277 | 7 | 3 | 0 |
| 28559 | 371 | 8 | 4 | 0 |
| 37127 | 603 | 13 | 7 | 1 |
| 48265 | 954 | 17 | 9 | 1 |
| 62745 | 1554 | 24 | 13 | 2 |
| 81569 | 3051 | 38 | 21 | 3 |
| 106040 | | 55 | 30 | 4 |
| 137852 | | 81 | 43 | 5 |
| 179208 | | 117 | 62 | 7 |
| 232971 | | 173 | 89 | 9 |
| 302862 | | 237 | 132 | 12 |
| 393721 | | 346 | 182 | 17 |
| 511837 | | 482 | 250 | 23 |
| 665388 | | 680 | 352 | 31 |
| 865005 | | 963 | 506 | 41 |
| 1124507 | | 1452 | 775 | 74 |
| 1461859 | | 1943 | 999 | 76 |
| 1900417 | | 2743 | 1493 | 133 |
| 2470542 | | | 2008 | 134 |
| 3211705 | | | | 187 |
| 4175217 | | | | 349 |
| 5427782 | | | | 651 |
| 7056117 | | | | 1048 |
| 9172952 | | | | 1680 |
| 11924838 | | | | 2224 |
Note that the sieve beats the iterative algorithms by a factor of 10 and sometimes more, making it a much more likely option for generating prime numbers in a large range (more than 11,000,000 numbers scanned in just a little more than 2 seconds). In our final installment, we’ll examine the Sieve of Atkin, a final refinement to the idea of generating primes using a sieve.
* The whole idea of a sieve is feasible if you’re testing a very large range of numbers. If the method was called to calculate prime numbers in a range {K…K+100} where K is very large, creating a sieve up to K+100 would be prohibitive. Hence, for the measurements I used 1-based ranges.