Editing Asymptotic analysis

Jump to: navigation, search

Warning: You are not logged in. Your IP address will be publicly visible if you make any edits. If you log in or create an account, your edits will be attributed to your username, along with other benefits.

The edit can be undone. Please check the comparison below to verify that this is what you want to do, and then save the changes below to finish undoing the edit.
Latest revision Your text
Line 101: Line 101:
  
 
===Little O notation===
 
===Little O notation===
Whereas the big O notation <math>f(n) \in O(g(n))</math> expresses that <math>g</math> grows ''at least as quickly'' as <math>f</math> in an asymptotic, constant-factor-oblivious sense, little O notation, as in <math>f(n) \in o(g(n))</math>, expresses that <math>g</math> grows ''strictly more quickly'' than <math>f</math> in the limit of infinite <math>n</math>. It means that <math>\lim_{n\to\infty} \frac{f(n)}{g(n)} = 0</math>. This notation is not used very often in computer science (simply because it is rarely useful). Additionally, the limit based notation is not entirely equivalent to the big/little-Oh notation since the limit of the quotient might not exist in case of non-continuous functions such as the step function, etc...
+
Whereas the big O notation <math>f(n) \in O(g(n))</math> expresses that <math>g</math> grows ''at least as quickly'' as <math>f</math> in an asymptotic, constant-factor-oblivious sense, little O notation, as in <math>f(n) \in o(g(n))</math>, expresses that <math>g</math> grows ''strictly more quickly'' than <math>f</math> in the limit of infinite <math>n</math>. It means that <math>\lim_{n\to\infty} \frac{f(n)}{g(n)} = 0</math>. This notation is not used very often in computer science (simply because it is rarely useful).
  
 
===Little omega notation===
 
===Little omega notation===
Line 227: Line 227:
 
* A cubic-time algorithm will suffice for <math>n</math> up to about 300.
 
* A cubic-time algorithm will suffice for <math>n</math> up to about 300.
 
* A quadratic-time algorithm will suffice for <math>n</math> up to about 5000.
 
* A quadratic-time algorithm will suffice for <math>n</math> up to about 5000.
* An <math>O(n \log n)</math> time algorithm will suffice for <math>n</math> up to about 200000. The number 100000 is a very common bound on input size in olympiad problems, and almost always suggests that a contestant should aim to invent an algorithm with time complexity of around <math>O(n \log n)</math>. (This aspect of olympiad contests has been criticized.)<ref>Ribiero, P. and Guerreiro, P. ''Improving the Automatic Evaluation of Problem Solutions in Programming Contests''. Olympiads in Informatics, 2009, Vol. 3, 132–143. Retrieved from http://www.mii.lt/olympiads_in_informatics/pdf/INFOL048.pdf</ref> <math>O(n^2)</math> is almost always too slow in this case.
+
* An <math>O(n \log n)</math> time algorithm will suffice for <math>n</math> up to about 200000. The number 100000 is a very common bound on input size in olympiad problems, and almost always suggests that a contestant should aim to invent an algorithm with time complexity of around <math>O(n \log n)</math>. (This aspect of olympiad contests has been criticized.) <math>O(n^2)</math> is almost always too slow in this case.
 
* A linear-time algorithm is typically required if the input can be even larger than this, usually 300000 or more lines.
 
* A linear-time algorithm is typically required if the input can be even larger than this, usually 300000 or more lines.
  
Line 276: Line 276:
  
 
===Recursive functions===
 
===Recursive functions===
Many algorithms work by breaking the input down into smaller pieces, [[recursively]] solving each smaller piece as though it were an independent problem, and then combining the results to obtain a solution to the original instance. This algorithmic paradigm is known as ''divide and conquer'', and its running time is usually analyzed slightly differently from that of the simpler algorithms from the previous section.
 
 
Examples:
 
* [[Merge sort]], which sorts a list by dividing it into two sub-lists, sorting each sub-list, and then merging the two sorted sub-lists to obtain the original list, sorted. The merge step takes <math>O(n)</math> time if the original list had <math>n</math> elements. The overall running time is <math>O(n \log n)</math>.
 
* [[Karatsuba multiplication]], which multiplies two <math>n</math>-digit numbers (see [[Big numbers]]) by dividing each number into two halves and recursively multiplying three pairs of halves, along with some additions and subtractions which take <math>O(n)</math> time in total. The overall running time is <math>O(n^{\log_2 3})</math>.
 
* One solution to the [[closest pair of points]] problem finds the closest pair by dividing the point set with a half plane to obtain two sets of points of approximately half the size, finding the closest pair of points within each set, and then sorting points close to the dividing line and sweeping to find the closest pair across the division, which takes <math>O(n \log n)</math> time (for the sorting). The overall running time is then <math>O(n \log^2 n)</math>. (NB: This can be improved to <math>O(n \log n)</math> if we concomitantly run merge sort, rather than re-sorting along the dividing line in each step.)
 
 
How do we analyze the running time of such algorithms? We can reason as follows:
 
* In the case of merge sort, we start out with one instance (the original list of size <math>n</math>), which gives rise to two subinstances of size <math>n/2</math>; these in turn give rise to a total of four subinstances of size <math>n/4</math>, and so on down to subinstances of size 1 or 2. There are about <math>\log_2 n</math> "levels" to this scheme, and at each level, we have about <math>2^k</math> subinstances of size <math>n/2^k</math>, and a linear amount of work is required in the "conquer" step of each; so that is about <math>cn</math> steps at each level, or about <math>cn \log n</math> steps in total, giving <math>O(n \log n)</math>.
 
* In Karatsuba multiplication, there will again be about <math>\log_2 n</math> levels. On the topmost level, we have one instance with size <math>n</math>. On the next level, we have three instances with size <math>n/2</math>, and then nine instances with size <math>n/4</math>, and so on down. A linear amount of additional work is required for the "conquer" step for each instance, so that is about <math>cn</math> work on the topmost level, about <math>\frac{3}{2}cn</math> on the next level, and so on down. So the total running time will be about <math>cn \sum_{k=0}^{\log_2 n} \left(\frac{3}{2}\right)^k = cn\left(\frac{\frac{3}{2}^{k+1}-1}{\frac{3}{2}-1}\right) = O\left(n\left(\frac{3}{2}\right)^{\log_2 n}\right) = O(n^{\log_2 3})</math>. (Several steps were omitted.)
 
* In the third algorithm listed above, there are again about <math>\log_2 n</math> levels; on the topmost level, the work done is <math>cn \log n</math>; then <math>2c\left(\frac{n}{2}\log\left(\frac{n}{2}\right)\right)</math> on the next level, since we will have two subinstances of size <math>\frac{n}{2}</math>, which will each require <math>O\left(\frac{n}{2}\log\left(\frac{n}{2}\right)\right)</math> work; then on the next level, <math>4c\left(\frac{n}{4}\log\left(\frac{n}{4}\right)\right)</math>, and so on down; the total work done is about <math>O\left(\sum_{k=0}^{\log_2 n} 2^k \frac{n}{2^k}\log\left(\frac{n}{2^k}\right)\right) = n\cdot O\left(\sum_{k=0}^{\log_2 n}\log\left(\frac{n}{2^k}\right)\right)</math>. This gives us an arithmetic series, with first term <math>\log n</math> (here <math>k = 0</math>) and last term about 0 (for when <math>\frac{n}{2^k} \approx \frac{n}{n} = 1</math> since <math>k \approx \log_2 n</math>) and a total of about <math>\log_2 n</math> terms; its sum is then about <math>\log n \log_2 n</math> giving <math>O(n \log^2 n)</math> overall.
 
We would like to have some way to tackle the general case, without so much messy mathematical reasoning in each individual case. By ''general case'', we mean that we assume that to solve an instance of size <math>n</math> using some algorithm, we divide it into chunks of size about <math>\frac{n}{b}</math>, and solve <math>a</math> subproblems using those chunks, along with extra work in the amount of <math>f(n)</math>, unless <math>n < c</math> for some constant <math>c</math>, which represents a base case that can be solved in <math>O(1)</math> time. For example, in the first of the three examples above, we have <math>a = 2, b = 2, f(n) = n</math>; in the second, <math>a = 3, b = 2, f(n) = n</math>, and in the third, <math>a = 2, b = 2, f(n) = n \log n</math>.
 
 
====The master theorem====
 
The authoritative statement of the '''master theorem''' is from CLRS<ref name="CLRS">Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. ''Introduction to Algorithms'', Second Edition. MIT Press and McGraw-Hill, 2001. ISBN 0-262-03293-7. Sections 4.3 (The master method) and 4.4 (Proof of the master theorem), pp. 73–90.</ref>. Suppose that <math>a \geq 1</math> and <math>b > 1</math> above, and <math>f</math> is positive. Then, there are three cases:
 
# If <math>f(n) \in O(n^{\log_b a - \epsilon})</math> for some <math>\epsilon > 0</math>, then the overall running time is <math>\Theta(n^{\log_b a})</math>.
 
# If <math>f(n) \in \Theta(n^{\log_b a} \log^k n)</math>, for some <math>k \geq 0</math>, then the overall running time is <math>\Theta(n^{\log_b a} log^{k+1} n)</math>.
 
# If <math>f(n) \in \Omega(n^{\log_b a + \epsilon})</math> for some <math>\epsilon > 0</math>, and, furthermore, <math>af\left(\frac{n}{b}\right) \leq cf(n)</math> for some <math>c < 1</math> and all sufficiently large <math>n</math>, then the overall running time is <math>\Theta(f(n))</math>.
 
 
For example, Karatsuba multiplication falls into the first case, since <math>f(n) = n</math> and <math>\log_2 3 \approx 1.58</math>. The theorem then tells us that the overall running time is <math>O(n^{\log_2 3})</math>. What is happening here is that as we proceed from one level to the next, the amount of work done on the current level increases exponentially or faster. In this particular case, each level entails 1.5 times as much work as the previous level. Intuitively, this means most of the work is done on the lowest level, on which we have <math>n</math> subproblems, each with size 1. The earlier analysis then applies. The reason why we need <math>f(n) \in O(n^{\log_b a - \epsilon})</math> is precisely to ensure this exponential growth. If <math>f(n)</math> is exactly <math>n^{\log_b a}</math>, then we will have <math>af\left(\frac{n}{b}\right) = a\frac{n^{\log_b a}}{b^{\log_b a}} = n^{\log_b a} = f(n)</math>, that is, the amount of work done on each level will be the same. But if <math>f(n)</math> grows more slowly than this, then <math>f(n)</math> will be less than <math>af\left(\frac{n}{b}\right)</math>. If it grows polynomially more slowly (that is, by a factor of <math>n^\epsilon</math> or more), then the desired exponential growth will be attained.
 
 
Merge sort and the closest pair of points algorithm are covered by case 2. For merge sort, we have <math>k = 0</math> (there is no log factor at all). Here the time spent on each level is the same, since <math>f(n) = af\left(\frac{n}{b}\right)</math> if <math>f(n) = n^{\log_b a}</math>, so we simply multiply <math>f(n)</math> by the number of levels, that is, <math>\log_b n \in \Theta(\log n)</math>. For the closest pair of points algorithm, we have <math>k = 1</math>. When <math>k > 0</math>, then there is a decrease in the amount of work done as we go further down, but it is slower than an exponential decay. In each case the average amount of work done per level is on the order of the amount of work done on the topmost level, so in each case we obtain <math>\Theta(f(n) \log n)</math> time.
 
 
Case 3 is not often encountered in practice. When <math>f(n) \geq \frac{1}{c}af\left(\frac{n}{b}\right)</math> (with <math>\frac{1}{c} > 1</math>), there is an exponential (or faster) decay as we proceed from the top to the lower levels, so most of the work is done at the top, and hence the <math>\Theta(f(n))</math> behaviour is obtained. The condition that <math>f(n) \in \Omega(n^{\log_b a + \epsilon})</math> is not actually needed in the statement of Case 3, since it is implied by the condition that <math>af\left(\frac{n}{b}\right) \leq cf(n)</math>. The reason why the theorem is stated as it is above is that if <math>f(n)</math> is actually of the form <math>cn^{\log_b a + \epsilon}\log^k n</math>, which covers most of the cases, then the other condition is implied, and we don't need to explicitly check it, but can instead just look at the form of <math>f(n)</math>. However, there are some possibilities for <math>f(n)</math> that are not "well-behaved", and will not satisfy the condition <math>af\left(\frac{n}{b}\right) \leq cf(n)</math>.
 
 
An actual proof of the master theorem is given in CLRS.
 
 
====Akra&ndash;Bazzi theorem====
 
 
TODO
 
TODO
  
 
===Amortized analysis===
 
===Amortized analysis===
 
TODO
 
TODO
 
==References==
 
<references/>
 
 
[[Category:Pages needing diagrams]]
 
[[Category:Pages needing diagrams]]
[[Category:Incomplete]]
 

Please note that all contributions to PEGWiki are considered to be released under the Attribution 3.0 Unported (see PEGWiki:Copyrights for details). If you do not want your writing to be edited mercilessly and redistributed at will, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource. Do not submit copyrighted work without permission!

Cancel | Editing help (opens in new window)