Editing Longest palindromic substring

Jump to: navigation, search

Warning: You are not logged in. Your IP address will be publicly visible if you make any edits. If you log in or create an account, your edits will be attributed to your username, along with other benefits.

The edit can be undone. Please check the comparison below to verify that this is what you want to do, and then save the changes below to finish undoing the edit.
Latest revision Your text
Line 2: Line 2:
  
 
The '''longest palindromic substring''' problem is exactly as it sounds: the problem of finding a maximal-length [[substring]] of a given string that is also a palindrome. For example, the longest palindromic substring of '''bananas''' is '''anana'''. The longest palindromic substring is not guaranteed to be unique; for example, in the string '''abracadabra''', there is no palindromic substring with length greater than three, but there are two palindromic substrings with length three, namely, '''aca''' and '''ada'''. In some applications it may be necessary to return ''all'' maximal-length palindromic substrings, in some, only one, and in some, only the maximum length itself. This article discusses algorithms for solving these problems.
 
The '''longest palindromic substring''' problem is exactly as it sounds: the problem of finding a maximal-length [[substring]] of a given string that is also a palindrome. For example, the longest palindromic substring of '''bananas''' is '''anana'''. The longest palindromic substring is not guaranteed to be unique; for example, in the string '''abracadabra''', there is no palindromic substring with length greater than three, but there are two palindromic substrings with length three, namely, '''aca''' and '''ada'''. In some applications it may be necessary to return ''all'' maximal-length palindromic substrings, in some, only one, and in some, only the maximum length itself. This article discusses algorithms for solving these problems.
 +
 +
==Theoretical discussion==
 +
===Lower bound===
 +
The worst-case running time of any correct algorithm must be at least <math>O(N)</math>, where <math>N</math> is the length of the string. This is because the algorithm cannot be correct unless it can verify whether the entire string is a palindrome (because in that case it is itself the longest palindromic substring), and it is not possible to conclude that a string is palindromic unless every character is examined.
 +
 +
One might ask whether the size of the output gives a better lower bound on the worst-case running time in the case in which it is necessary to return all maximal-length palindromic substrings (rather than simply reporting the maximum length). It turns out that it does not, because there can only be up to linearly many substrings of any given length. Specifically, there cannot be more than <math>N</math> different palindromic substrings of maximal length, because there cannot be more than <math>N</math> substrings of any given length in the first place.
 +
 +
===Upper bound===
 +
Given a substring of length <math>l</math>, we can determine whether it is a palindrome in <math>O(l)</math> time by iterating through it and checking whether the character at zero-based offset <math>i</math> always matches that at <math>l-i-1</math>. Furthermore, our original string, of length <math>N</math>, has <math>\Theta(N^2)</math> substrings each of length <math>O(N)</math>. Therefore, we conclude, by enumerating all substrings of our original string, checking whether each is a palindrome, and finding the longest one which is, we obtain an <math>O(N^3)</math> solution. Thus <math>O(N^3)</math> is an upper bound on the efficiency of the best algorithm, and we have established that the longest palindromic substring problem can be solved in polynomial time.
 +
 +
==Observation==
 +
Every palindromic substring has a ''centre''. For an odd-length substring, this is its middle character; for an even-length substring, this is the imaginary space between its two middle characters. Call each character and each space in the original string a ''position''. Then there are <math>2N+1</math> positions in any string of length <math>N</math> (to simplify things later on, we assume there is a position just before the first character and one just after the last), and any longest palindromic substring must have one of these as its centre.
 +
 +
Therefore, if we iterate through all positions of the string, and for each one determine the longest palindromic substring ''centered on that position'', then one of these is the longest palindromic substring of the whole string. For example, in the string '''opposes''', the longest palindromic substrings centered around each position are ['''o''', (empty), '''p''', '''oppo''', '''p''', (empty), '''o''', (empty), '''s''', (empty), '''ses''', (empty), '''s'''], and one of these --- '''oppo''' --- is the longest palindromic substring of the original string.
 +
 +
Being a bit clever in determining the longest palindromic substring centered about a given string improves the running time over the naive solution dramatically. We begin by noting that there definitely is a palindromic substring of length zero (the empty string) centered around the current center, if it is a space, or a palindromic substring of length one (the character itself) centered around the current center, if it is a character. Then, we repeatedly attempt to make a longer palindromic substring centered at our current location by adding the characters just before and just after the current longest palindromic substring. We can do this if and only if these characters match. If they do not match, there is no hope of any longer palindromic substring centered here existing, so we are done.
 +
 +
For example, suppose our current center is between the two '''b''''s in the word '''scabbards'''. We already know there exists a palindromic substring of length zero (the empty string) centered around our current location. To determine whether we can make it longer, we check the first character on either side, that is, the two '''b''''s. They match, so we know that '''bb''' is a palindromic substring centered at our current location. Then we try to extend it again, by checking the first character on either side. Now we find two '''a''''s, so we now know we have '''abba''', a palindromic substring of length 4, centered around our current location. Again we try to extend it, by checking one character on each side. We find the '''c''' and the '''r''', which do ''not'' match. Therefore we know that '''abba''' is the longest palindromic substring centered our current location --- substrings centered at our current location that are longer than '''cabbar''', such as '''scabbard''', have no hope of being palindromic.
 +
 +
Notice that once we have fixed the centre, we will not examine any character of the string more than once in determining the longest palindromic substring centered there. That is, determining the longest palindromic substring centered at any given location takes only <math>O(N)</math> time, and by iterating over all possible centres, an <math>O(N^2)</math> algorithm is obtained. (Note that this algorithm also gives ''all'' longest palindromic substrings: if the length of the palindrome centered around a given position is maximal, then we output the longest palindromic substring centered at this position.)
  
 
==Toward a better bound==
 
==Toward a better bound==
Line 38: Line 58:
 
However, there exists a simpler, shorter, faster, less memory-hungry algorithm<ref name="akalin"/> which accesses the string only by comparing its characters for equality (and hence has running time independent of alphabet size). This is known as ''Manacher's algorithm''.<ref name="manacher"/><ref name="galil"/> Again we will compute the longest palindromic substring centered around each possible position (so that returning all longest palindromic substrings is as easy as simply finding the maximum possible length for a palindromic substring). Our algorithm will return a ''longest palindrome array'': an array in which the <math>i</math><sup>th</sup> element contains the length of the longest palindrome centered around the <math>i</math><sup>th</sup> position from the left. (Note that the parities of elements in the longest palindrome array alternate.)
 
However, there exists a simpler, shorter, faster, less memory-hungry algorithm<ref name="akalin"/> which accesses the string only by comparing its characters for equality (and hence has running time independent of alphabet size). This is known as ''Manacher's algorithm''.<ref name="manacher"/><ref name="galil"/> Again we will compute the longest palindromic substring centered around each possible position (so that returning all longest palindromic substrings is as easy as simply finding the maximum possible length for a palindromic substring). Our algorithm will return a ''longest palindrome array'': an array in which the <math>i</math><sup>th</sup> element contains the length of the longest palindrome centered around the <math>i</math><sup>th</sup> position from the left. (Note that the parities of elements in the longest palindrome array alternate.)
  
The algorithm processes the string from left to right, and fills in the longest palindrome array in the same order, so that when any given position is being considered, the longest palindromes centered around all positions to the left are already known. The algorithm distinguishes two kinds of positions: "interesting" positions and "uninteresting" positions. The algorithm "hops" from each interesting position to the one immediately to the right. It spends amortized constant time on each interesting position. Interesting positions are so named because any maximal-length palindromic substring overall must be centered at one of them. Uninteresting positions' entries in the longest palindrome array can be computed in constant time "in passing" from one interesting position to the next.
+
The algorithm processes the string from left to right, and fills in the longest palindrome array in the same order, so that when any given position is being considered, the longest palindromes centered around all positions to the left are already known. The algorithm distinguishes two kinds of positions: "interesting" positions and "uninteresting" positions. The algorithm "hops" from each interesting position to the one immediately to the right. It spends amortized constant time on each interesting position. Interesting positions are so named because any maximal-length palindromic substring overall must be centered at one of them. Uninteresting positions' entries in the longest palindrome array can be computed in in constant time "in passing" from one interesting position to the next.
  
 
We will proceed through a series of observations to derive the algorithm. First, the designation of interesting and uninteresting positions is not unique; in particular, any uninteresting position can be relabelled interesting without affecting correctness or consistency. That being said, we begin by denoting the first position --- the one immediately to the left of the first character --- interesting, and going from there.
 
We will proceed through a series of observations to derive the algorithm. First, the designation of interesting and uninteresting positions is not unique; in particular, any uninteresting position can be relabelled interesting without affecting correctness or consistency. That being said, we begin by denoting the first position --- the one immediately to the left of the first character --- interesting, and going from there.
Line 44: Line 64:
 
''Lemma'' (Second observation): If <math>s</math> is the longest palindrome centered at the current position, and <math>t</math> is the longest palindromic proper suffix of <math>s</math>, then all positions between the current position and the centre of <math>t</math> may safely be denoted uninteresting.
 
''Lemma'' (Second observation): If <math>s</math> is the longest palindrome centered at the current position, and <math>t</math> is the longest palindromic proper suffix of <math>s</math>, then all positions between the current position and the centre of <math>t</math> may safely be denoted uninteresting.
  
''Proof'': Consider some position <math>i</math> between the centres of <math>s</math> and <math>t</math>. Suppose there exists a palindromic substring, <math>u</math>, of length greater than or equal to that of <math>s</math>, centered at <math>i</math>. Certainly the last character of <math>u</math> lies to the right of the last character of <math>s</math>, since <math>u</math> is centered further to the right and is longer. So we repeatedly remove one character from each end of <math>u</math> until <math>u</math> and <math>s</math> end at the same position. Since <math>u</math> is now shorter than <math>s</math> (since its centre is to the right but both <math>s</math> and <math>u</math> extend equally far on the right), it is a palindromic proper suffix of <math>s</math>. Since the centre of <math>u</math> is to the left of the centre of <math>t</math>, <math>u</math> is longer than <math>t</math>, a contradiction as <math>t</math> is maximal by hypothesis.
+
''Proof'': Consider some position <math>i</math> between the centres of <math>s</math> and <math>t</math>. Suppose there exists a palindromic substring, <math>u</math>, of length greater than or equal to that of <math>s</math>, centered at <math>i</math>. Certainly the last character of <math>u</math> lies to the right of the last character of <math>s</math>, since <math>u</math> is centered further to the right and is longer. So we repeatedly remove one character from each end of <math>u</math> until <math>u</math> and <math>s</math> end at the same position. Since <math>u</math> is now shorter than <math>s</math> (since its centre is to the right but both <math>s</math> and <math>t</math> extend equally far on the right), it is a palindromic proper suffix of <math>s</math>. Since the centre of <math>u</math> is to the left of the centre of <math>t</math>, <math>u</math> is longer than <math>t</math>, a contradiction as <math>t</math> is maximal by hypothesis.
  
 
To find the longest palindromic proper suffix, we make yet another observation: the longest palindromic proper suffix and the longest palindromic proper prefix are reflections of each other about the current centre. So we scan backward from the current position. For each position <math>i</math> to the left of the current position, we know the longest palindromic substring <math>u</math> centered there. If the first character of <math>u</math> is to the right of the first character of <math>s</math>, then the longest palindromic substring centered at the position obtained by reflecting <math>u</math> about the current position is identical to <math>s</math>. (If it were possible to extend it, it would also be possible to extend <math>s</math>.) This means we can fill in the corresponding position in the longest palindromic array immediately, and also that there is no palindromic prefix of <math>s</math> centered at <math>i</math> as <math>u</math>, and hence no corresponding palindromic suffix at the position obtained by reflecting <math>i</math> across the current centre. Otherwise, on the other hand, we know there ''is'' a palindromic (proper) prefix centered at <math>i</math>, which means there is a palindromic (proper) suffix centered at the position obtained by reflecting <math>i</math> across the current centre. Furthermore, since we are scanning backward from the current centre, the ''first'' such <math>i</math> we encounter is the centre of the ''longest'' palindromic proper prefix, so we have also found the ''longest'' palindromic proper suffix.
 
To find the longest palindromic proper suffix, we make yet another observation: the longest palindromic proper suffix and the longest palindromic proper prefix are reflections of each other about the current centre. So we scan backward from the current position. For each position <math>i</math> to the left of the current position, we know the longest palindromic substring <math>u</math> centered there. If the first character of <math>u</math> is to the right of the first character of <math>s</math>, then the longest palindromic substring centered at the position obtained by reflecting <math>u</math> about the current position is identical to <math>s</math>. (If it were possible to extend it, it would also be possible to extend <math>s</math>.) This means we can fill in the corresponding position in the longest palindromic array immediately, and also that there is no palindromic prefix of <math>s</math> centered at <math>i</math> as <math>u</math>, and hence no corresponding palindromic suffix at the position obtained by reflecting <math>i</math> across the current centre. Otherwise, on the other hand, we know there ''is'' a palindromic (proper) prefix centered at <math>i</math>, which means there is a palindromic (proper) suffix centered at the position obtained by reflecting <math>i</math> across the current centre. Furthermore, since we are scanning backward from the current centre, the ''first'' such <math>i</math> we encounter is the centre of the ''longest'' palindromic proper prefix, so we have also found the ''longest'' palindromic proper suffix.
Line 75: Line 95:
 
<ref name="ukkonen">E. Ukkonen. (1995). On-line construction of suffix trees. ''Algorithmica'' '''14'''(3):249-260. [http://www.cs.helsinki.fi/u/ukkonen/SuffixT1.pdf PDF] | [http://www.cs.helsinki.fi/u/ukkonen/SuffixT1withFigs.pdf PDF with figures]</ref>
 
<ref name="ukkonen">E. Ukkonen. (1995). On-line construction of suffix trees. ''Algorithmica'' '''14'''(3):249-260. [http://www.cs.helsinki.fi/u/ukkonen/SuffixT1.pdf PDF] | [http://www.cs.helsinki.fi/u/ukkonen/SuffixT1withFigs.pdf PDF with figures]</ref>
 
<ref name="tarjan">Gabow, H. N.; Tarjan, R. E. (1983), "A linear-time algorithm for a special case of disjoint set union", <i>Proceedings of the 15th ACM Symposium on Theory of Computing (STOC)</i>, pp.&#160;246–251, doi:[http://dx.doi.org/10.1145%2F800061.808753 10.1145/800061.808753]</ref>
 
<ref name="tarjan">Gabow, H. N.; Tarjan, R. E. (1983), "A linear-time algorithm for a special case of disjoint set union", <i>Proceedings of the 15th ACM Symposium on Theory of Computing (STOC)</i>, pp.&#160;246–251, doi:[http://dx.doi.org/10.1145%2F800061.808753 10.1145/800061.808753]</ref>
<ref name="akalin">Fred Akalin. Finding the longest palindromic substring in linear time. Retrieved 2016-10-01 from [https://www.akalin.com/longest-palindrome-linear-time https://www.akalin.com/longest-palindrome-linear-time]. ''(This source contains a Python implementation of the simple linear-time algorithm.)''</ref>
+
<ref name="akalin">Frederick Akalin. Finding the longest palindromic substring in linear time. Retrieved 2010-11-13 from [http://www.akalin.cx/2007/11/28/finding-the-longest-palindromic-substring-in-linear-time/ http://www.akalin.cx/2007/11/28/finding-the-longest-palindromic-substring-in-linear-time/]. ''(This source contains a Python implementation of the simple linear-time algorithm.)''</ref>
 
<ref name="manacher">Glenn Manacher. 1975. A New Linear-Time "On-Line" Algorithm for Finding the Smallest Initial Palindrome of a String. J. ACM 22, 3 (July 1975), 346-351. DOI=10.1145/321892.321896 http://doi.acm.org/10.1145/321892.321896 </ref>
 
<ref name="manacher">Glenn Manacher. 1975. A New Linear-Time "On-Line" Algorithm for Finding the Smallest Initial Palindrome of a String. J. ACM 22, 3 (July 1975), 346-351. DOI=10.1145/321892.321896 http://doi.acm.org/10.1145/321892.321896 </ref>
 
<ref name="galil">Alberto Apostolico, Dany Breslauer, and Zvi Galil (1994), "Parallel Detection of all Palindromes in a String", Comput. Sci.</ref>
 
<ref name="galil">Alberto Apostolico, Dany Breslauer, and Zvi Galil (1994), "Parallel Detection of all Palindromes in a String", Comput. Sci.</ref>
Line 88: Line 108:
 
** {{SPOJ|PALIM|Yet Another Longest Palindrome Problem}}
 
** {{SPOJ|PALIM|Yet Another Longest Palindrome Problem}}
 
** {{SPOJ|PALDR|Even Palindrome}}
 
** {{SPOJ|PALDR|Even Palindrome}}
* The problem Calf Flac from [http://train.usaco.org/ the USACO training pages] can be solved using the naive algorithm. ilove youuuuuuuuuuuuu
+
* The problem Calf Flac from [http://train.usaco.org/ the USACO training pages] can be solved using the naive algorithm.

Please note that all contributions to PEGWiki are considered to be released under the Attribution 3.0 Unported (see PEGWiki:Copyrights for details). If you do not want your writing to be edited mercilessly and redistributed at will, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource. Do not submit copyrighted work without permission!

Cancel | Editing help (opens in new window)

Templates used on this page: