DCSIMG
Generating Prime Numbers in a Range (Was: LINQ Challenge), Part 4 – More on Sieves - All Your Base Are Belong To Us

All Your Base Are Belong To Us

Mostly .NET internals and other kinds of gory details

Generating Prime Numbers in a Range (Was: LINQ Challenge), Part 4 – More on Sieves

As it is often the case with algorithms, there’s a better one yet for the problem at hand. The Sieve of Eratosthenes that we employed in the previous step is significantly faster than naively testing every prime number in the range, but there are additional improvements on this approach.

The Sieve of Atkin is a much smarter algorithm that relies on certain properties of modulo-sixty remainders of prime numbers. The algorithm is based on the following facts (and do feel free to check the Wikipedia entry for more details):

  • All numbers with a modulo-sixty remainder that is divisible by 2, 3, or 5 are not prime because they are divisible by 2, 3, or 5, respectively. (This leaves only a handful of modulo-sixty remainders to consider.)
  • All numbers with a modulo-sixty remainder 1, 13, 17, 29, 37, 41, 49, or 53 have a modulo-four remainder of 1 (because these remainders themselves have a modulo-four remainder of 1). These numbers are prime iff the number of solutions to 4x^2 + y^2 = n is odd and the number is not a square of another integer. (The proof to this and the following two claims is here, behind a paywall.)
  • All numbers with a modulo-sixty remainder 7, 19, 31, or 43 have a modulo-six remainder of 1. These numbers are prime iff the number of solutions to 3x^2 + y^2 = n is odd and the number is not a square of another integer.
  • All numbers with a modulo-sixty remainder 11, 23, 47, or 59 have a modulo-twelve remainder of 11. These numbers are prime iff the number solutions to 3x^2 - y^2 = n is odd and the number is not a square of another integer.

Hence, a relatively straightforward algorithm can be used to enumerate all solutions to the above quadratic equations in our range {2…N}. Another iteration of refinements will remove all numbers divisible by 2, 3, or 5 – and we have ourselves a sieve, the Sieve of Atkin.

Efficiently implementing the Sieve of Atkin is not trivial – the straightforward algorithm given on the Wikipedia page is in fact slower than the Sieve of Eratosthenes. Daniel J. Bernstein’s primegen is one of the most efficient implementations known, but it’s written in C and comparing its performance with that of a C# Sieve of Eratosthenes is not fair.

Therefore, I embarked on some refinements to the Wikipedia-provided algorithm and obtained the following results: [“Full” refers to the most naive algorithm, “Sqrt” and “SqrtN2” to its optimizations, “Sieve” to the Sieve of Eratosthenes and “Atkin” to the Sieve of Atkin]

N Full Sqrt SqrtN2 Sieve Atkin
9998 52 2 1 0 0
12998 81 3 1 0 0
16898 139 4 2 0 0
21968 277 7 3 0 0
28559 371 8 4 0 0
37127 603 13 7 1 1
48265 954 17 9 1 1
62745 1554 24 13 2 2
81569 3051 38 21 3 2
106040   55 30 4 3
137852   81 43 5 4
179208   117 62 7 6
232971   173 89 9 7
302862   237 132 12 10
393721   346 182 17 13
511837   482 250 23 17
665388   680 352 31 23
865005   963 506 41 30
1124507   1452 775 74 43
1461859   1943 999 76 52
1900417   2743 1493 133 116
2470542     2008 134 106
3211705       187 124
4175217       349 167
5427782       651 247
7056117       1048 329
9172952       1680 515
11924838       2224 643
15502290         841

In other words, the Sieve of Atkin is a significant improvement over the previous result, especially for very large ranges. A visualization is in place to show just how much better we got with these improvements:

image

This graph is rather hard to read because the naive algorithm explodes very quickly and looks like a vertical asymptote; there’s also an easily observable difference between the two sieves and the “sqrt”-optimizations.

The same on a log-log scale:

image

We can learn a few things from this graph. Recall that after plotting a function f(x) in a log-log scale, if we get a straight line with slope M, it means log x = M log y, or equivalently log x = log (y^M), so we get x = y^M. Therefore:

  1. The slopes of the two “sqrt”-optimizations are roughly the same, and therefore there’s no big-O* difference between the two algorithms.
  2. The slope of the naive algorithm is greater than 1, so the naive algorithm is “worse” than O(n).
  3. The slope of the Sieve of Eratosthenes goes up at ~2,000,000 while the slope of the Sieve of Atkin remains constant. Both have a slope < 1 for small ranges, so both are sub-linear.

In the next (and final) post I will summarize some of our findings and point out a couple of other ideas for implementing primality tests in ranges.


* Recall that a function f(n) is big-O of g(n), i.e. f(n) = O(g(n)) if there exists a constant c such that f(n) = c g(n) for all n. (The Wikipedia definition requires this criterion to hold for all n > some constant, but it’s really irrelevant in the big picture.) Therefore, if the slope of two functions f, g is the same on a log-log scale, but their y-intercept is different, then we have: y1 + log f(x) = y2 + log g(x). From this, log f(x) = (y1 – y2) + log g(x) and therefore f(x) = exp (y1 – y2) g(x), but exp (y1 – y2) is a constant and the definition of big-O holds.

Comments

Dew Drop – January 16, 2010 | Alvin Ashcraft's Morning Dew said:

Pingback from  Dew Drop &#8211; January 16, 2010 | Alvin Ashcraft&#039;s Morning Dew

# January 16, 2010 4:27 PM
Leave a Comment

(required) 

(required) 

(optional)

(required) 


Enter the numbers above: