Greatest common divisor

From PEGWiki
Jump to: navigation, search

The greatest common divisor (GCD) of a set of integers is the greatest integer that divides all the integers in the set, unless the set contains only zeroes, in which the GCD is defined to be zero. For example, the GCD of 12 and 20 is 4, because 4 divides both 12 and 20, and no integer larger than 4 divides both 12 and 20. The GCD is also sometimes called the greatest common factor (GCF) (perhaps because it avoids confusing schoolchildren with two different meanings of the word divisor).

Properties[edit]

  • \gcd(x,y) = \gcd(y,x) (commutativity)
  • \gcd(0,x) = |x|
  • \gcd(1,x) = 1
  • \gcd(x,x) = |x|
  • All common divisors of a set of integers divide the GCD of that set. For example, the common divisors of 24 and 42 are -6, -3, -2, -1, 1, 2, 3, and 6, and the GCD is 6. Notice that all the common divisors are also divisors of 6.
  • The GCD is never negative, because if -x is a common divisor, then so is +x. Also, \gcd(x,y) = \gcd(|x|,|y|).
  • If S and T are sets, then \gcd(S \cup T) = \gcd(\gcd(S), \gcd(T)). For example, \gcd(x,y,z) = \gcd(x,\gcd(y,z)) = \gcd(\gcd(x,y),z) (associativity).
  • \gcd(x,y)\,\operatorname{lcm}(x,y) = |xy|.
  • \gcd(x,\operatorname{lcm}(x,y)) = \operatorname{lcm}(x,\gcd(x,y)). (This property is the absorption law of the lattice over the positive integers with GCD as meet and LCM as join.)
  • If x has prime factorization (-1)^a p_1^{k_1} p_2^{k_2} ... p_m^{k_m} and y has prime factorization (-1)^b p_1^{l_1} p_2^{l_2} ... p_m^{l_m}, then \gcd(x,y) = p_1^{\min(k_1,l_1)} p_2^{\min(k_2,l_2)} ... p_n^{\min(k_n,l_n)}. Therefore, it is possible to find the GCD of two integers by factorizing them; but much more efficient methods exist.
  • \gcd(cx, cy) = |c|\gcd(x,y) whenever c \in \mathbb{Z}.
  • \gcd(x, y) = \gcd(x+ky, y) for any k \in \mathbb{Z}.
  • There exist integers a, b such that ax + by = 1 if and only if \gcd(x,y) = 1. (This is called Bézout's identity. We will not prove it directly, but we will exhibit an algorithm for determining a and b.)

Euclidean algorithm[edit]

Means of computing the GCD of two integers efficiently have been known since antiquity. The well-known Euclidean algorithm uses the property that \gcd(x,y) = \gcd(x-y,y) for any x, y. When given two positive integers, we repeatedly subtract the smaller from the larger, thus reducing the size of the larger integer; and we repeat this until the two integers become identical. For example, \gcd(24, 42) = \gcd(24, 18) = \gcd(6, 18) = \gcd(6, 12) = \gcd(6, 6) = 6. The following is a naive implementation:

int gcd(int x, int y)
{
    if (x < 0) x = -x;
    if (y < 0) y = -y;
    if (x == 0) return x;
    if (y == 0) return y;
    if (x == y) return x;
    if (x > y) return gcd(x - y, x);
    else return gcd(x, y - x);
}

In practice, negative numbers are rarely encountered. Also, when we obtain two identical integers, if we perform one additional subtraction, we will get zero, which we have to handle in any case, so:

int gcd(int x, int y)
{
    if (x == 0 || y == 0) return x + y;
    if (x >= y) return gcd(x - y, y);
    else return gcd(x, y - x);
}

To speed this up, we note that if we start out with x \gg y, then the final result of subtracting y from x over and over again will be the least non-negative residue of x modulo y. Hence, instead of performing repeated subtraction, we use the mod operator. Finally, with some cleverness, we obtain this common form:

int gcd(int x, int y)
{
    if (x == 0) return y;
    else return gcd(y%x, x);
}

Also, although we have written this algorithm recursively, we can also write it iteratively, since it is tail recursive:

while (x > 0)
{
    y %= x;
    swap(x, y);
}
/* y now contains the GCD of original (x, y) */

Now, regarding performance:

Theorem: The GCD of two positive integers a and b can be computed using the Euclidean algorithm in O(\log\max(a,b)) mod operations.

Sketch Proof: The idea is to use induction on \max(a,b). Without loss of generality, we assume a \geq b. Then, there are two possibilities. If b \leq \frac{a}{2}, then \max(a,b) will be cut at least in half after the first mod operation. But if b > \frac{a}{2}, then b goes into a only once, and we obtain a-b < \frac{a}{2} and b after the first mod operation; a second will then take b below \frac{a}{2}. So in either case, after at most two mod operations, \max(a,b) is reduced by at least a factor of 2.

Note: A more careful analysis shows that the worst-case number of mod operations required to find \gcd(a,b) using the Euclidean algorithm goes approximately as \log_{\varphi}(\max(a,b)), where \varphi = \frac{1+\sqrt{5}}{2}.

The bit complexity of the Euclidean algorithm is O(uv), so that finding the GCD of two large numbers with u and v bits, respectively, requires O(uv) time.[1] For two numbers of approximately equal length n, this is O(n^2). By comparison, the straightforward algorithm based on factorizing the input integers does not run in polynomial time, and hence will perform far worse for large n.

Extended Euclidean algorithm[edit]

The extended Euclidean algorithm is the name given to the well-known procedure used to find integer solutions to the equation ax + by = 1, where a and b are given with \gcd(a,b) = 1. For example, we can use it to solve the equation 13x + 17y = 1 for x and y (one possible solution is x = 4, y = -3.

This equation only has solutions when \gcd(a,b) = 1, because if \gcd(a,b) = d > 1, then ax + by will always be divisible by d, and hence can never be 1. However, in this case \gcd(a/d, b/d) = 1, so we can find solutions to the equation (a/d)x + (b/d)y = 1, that is, ax + by = d. So in general we can always solve ax + by = \gcd(a,b) in integers.

The reason why this is known as the extended Euclidean algorithm is that it is based on reducing a and b as in the Euclidean algorithm, recursively, until they become small enough so that the equation is easy to solve; then reconstructing the solution to the original equation using the results.

In particular, when we take the remainder of b modulo a and obtain r, this tells us that b = ka + r for some integer k. So each step of the Euclidean algorithm expresses the fact that \gcd(a, ka+r) = \gcd(a, r) = \gcd(r,a).

Now suppose we can find integers x', y' such that x'r + y'a = d. Then, recall that r = b - ka; therefore, x'(b - ka) + y'a = d. Expanding, we obtain (y' - x'k)a + x'b = d. So set x = y' - x'k, y = x', and we now have a solution to ax + by = d.

This suggests that, in general, to solve ax + by = d, we first compute quotient k and remainder r when b is divided by a; and then (recursively) find some solution to rx' + ay' = d; and finally obtain x = y' - kx', y = x'.

This gives the following implementation, where we assume that a and b are positive:

/* Returns the GCD. */
int gcd_extended(int a, int b, int* x, int* y)
{
    if (a == 0)
    {
        *x = 0, *y = 1; /* 0(0) + 1(b) = b = gcd(a,b) */
        return b;
    }
    else
    {
        int k = b/a;
        int r = b%a;
        int _x, _y;
        int d = gcd_extended(r, a, &_x, &_y);
        *x = _y - k*_x;
        *y = _x;
        return d;
    }
}

One specific application of the extended Euclidean algorithm is that of computing modular inverses. That is, given some integer p and some modulus m, we wish to find another integer q such that pq \equiv 1 \pmod{m}. For example, if p = 4 and m = 21, then q = -5 works, since pq = -20 \equiv 1 \pmod{21}.

All we need to do to solve this is to note that the statement that pq \equiv 1 \pmod{m} is equivalent to pq = km + 1 for some integer k, or pq + (-m)k = 1, to be solved for integers q, k (even though we might not care what k is). We then use the extended Euclidean algorithm. Note that this suggests that a solution exists only when \gcd(p,m) = 1, which is indeed the case.

Binary GCD algorithm[edit]

This is another popular GCD algorithm, also recursive.

  • If x and y are both even, then \gcd(x,y) = 2\gcd\left(\frac{x}{2},\frac{y}{2}\right).
  • If x is even, but y is odd, then \gcd(x,y) = \gcd\left(\frac{x}{2},y\right). This is because every common divisor of x and y is odd, and hence will also divide \frac{x}{2}.
  • If x and y are both odd, then \gcd(x,y) = \gcd(x-y, y), as always.

This suggests the following naive implementation, assuming that x and y are both nonnegative:

int gcd(int x, int y)
{
    if (x == 0) return y;
    if (x < y) return gcd(y, x);
    if (x%2 == 1)
        if (y%2 == 1)
            return gcd(x - y, y);
        else
            return gcd(x, y/2);
    else
        if (y%2 == 1)
            return gcd(x/2, y);
        else
            return gcd(x/2, y/2)*2;
}

Computing the GCD of a and b using this algorithm will take O(\log\max(a,b)) steps, just as the Euclidean algorithm. This becomes clear when we write out the binary representations of a and b:

  • If either a or b is even, then it will be divided by two, which reduces its length by one.
  • If both a and b are odd, then one of them will become even after a subtraction, and then it will be divided by two, which again reduces its length by one. In effect \gcd(a,b) = \gcd((a-b)/2, b) (for the case a = b).

Therefore, we obtain a bound of 2(\log_2 a + \log_2 b) = 2 \log_2 \max(a,b) iterations, since after every two iterations it is guaranteed that the binary length of at least one of the two numbers is reduced.

Furthermore, each operation (either halving an even integer, or performing a subtraction) has bit complexity proportional to the length of the integers involved. If we halve a, then O(\log a) time is required for this step; if we subtract b from a, then O(\log a) time is required for this step, and in the next step a will be halved, also requiring O(\log a) time. So at most O(\log a) operations of O(\log a) bit complexity each can be performed on a, and at most O(\log b) operations of O(\log b) bit complexity each; so the overall bit complexity is O(u^2 + v^2), where u = \log a and v = \log b. This will often be expressed as O((u+v)^2) (quadratic in total input size) or O(\max(u,v)^2), matching the bound on the Euclidean algorithm.

The binary GCD algorithm can be considered to be motivated by the fact that reduction modulo 2, division by 2, and subtraction admit simpler bit complexities than the general modulo operation that the Euclidean algorithm uses, which suggests that it should be faster in practice. However, the naive implementation shown above is unlikely to outperform the optimized Euclidean algorithm in the previous section; optimization is required in order for the binary GCD algorithm to be competitive.

References[edit]

  1. Jeffrey Shallit. The Greatest Common Divisor, pp. 22-22. CMI Introductory Workshop, Algorithmic Number Theory. Retrieved from www.cs.uwaterloo.ca/~shallit/Talks/gcd.ps