Editing Longest palindromic substring

Jump to: navigation, search

Warning: You are not logged in. Your IP address will be publicly visible if you make any edits. If you log in or create an account, your edits will be attributed to your username, along with other benefits.

The edit can be undone. Please check the comparison below to verify that this is what you want to do, and then save the changes below to finish undoing the edit.
Latest revision Your text
Line 2: Line 2:
  
 
The '''longest palindromic substring''' problem is exactly as it sounds: the problem of finding a maximal-length [[substring]] of a given string that is also a palindrome. For example, the longest palindromic substring of '''bananas''' is '''anana'''. The longest palindromic substring is not guaranteed to be unique; for example, in the string '''abracadabra''', there is no palindromic substring with length greater than three, but there are two palindromic substrings with length three, namely, '''aca''' and '''ada'''. In some applications it may be necessary to return ''all'' maximal-length palindromic substrings, in some, only one, and in some, only the maximum length itself. This article discusses algorithms for solving these problems.
 
The '''longest palindromic substring''' problem is exactly as it sounds: the problem of finding a maximal-length [[substring]] of a given string that is also a palindrome. For example, the longest palindromic substring of '''bananas''' is '''anana'''. The longest palindromic substring is not guaranteed to be unique; for example, in the string '''abracadabra''', there is no palindromic substring with length greater than three, but there are two palindromic substrings with length three, namely, '''aca''' and '''ada'''. In some applications it may be necessary to return ''all'' maximal-length palindromic substrings, in some, only one, and in some, only the maximum length itself. This article discusses algorithms for solving these problems.
 +
 +
==Theoretical discussion==
 +
===Lower bound===
 +
The worst-case running time of any correct algorithm must be at least <math>O(N)</math>, where <math>N</math> is the length of the string. This is because the algorithm cannot be correct unless it can verify whether the entire string is a palindrome (because in that case it is itself the longest palindromic substring), and it is not possible to conclude that a string is palindromic unless every character is examined.
 +
 +
One might ask whether the size of the output gives a better lower bound on the worst-case running time in the case in which it is necessary to return all maximal-length palindromic substrings (rather than simply reporting the maximum length). It turns out that it does not, because there can only be up to linearly many substrings of any given length. Specifically, there cannot be more than <math>N</math> different palindromic substrings of maximal length, because there cannot be more than <math>N</math> substrings of any given length in the first place.
 +
 +
===Upper bound===
 +
Given a substring of length <math>l</math>, we can determine whether it is a palindrome in <math>O(l)</math> time by iterating through it and checking whether the character at zero-based offset <math>i</math> always matches that at <math>l-i-1</math>. Furthermore, our original string, of length <math>N</math>, has <math>\Theta(N^2)</math> substrings each of length <math>O(N)</math>. Therefore, we conclude, by enumerating all substrings of our original string, checking whether each is a palindrome, and finding the longest one which is, we obtain an <math>O(N^3)</math> solution. Thus <math>O(N^3)</math> is an upper bound on the efficiency of the best algorithm, and we have established that the longest palindromic substring problem can be solved in polynomial time.
 +
 +
==Observation==
 +
Every palindromic substring has a ''centre''. For an odd-length substring, this is its middle character; for an even-length substring, this is the imaginary space between its two middle characters. Call each character and each space in the original string a ''position''. Then there are <math>2N+1</math> positions in any string of length <math>N</math> (to simplify things later on, we assume there is a position just before the first character and one just after the last), and any longest palindromic substring must have one of these as its centre.
 +
 +
Therefore, if we iterate through all positions of the string, and for each one determine the longest palindromic substring ''centered on that position'', then one of these is the longest palindromic substring of the whole string. For example, in the string '''opposes''', the longest palindromic substrings centered around each position are [(empty), '''o''', (empty), '''p''', '''oppo''', '''p''', (empty), '''o''', (empty), '''s''', (empty), '''ses''', (empty), '''s''', (empty)], and one of these --- '''oppo''' --- is the longest palindromic substring of the original string.
 +
 +
Being a bit clever in determining the longest palindromic substring centered about a given string improves the running time over the naive solution dramatically. We begin by noting that there definitely is a palindromic substring of length zero (the empty string) centered around the current center, if it is a space, or a palindromic substring of length one (the character itself) centered around the current center, if it is a character. Then, we repeatedly attempt to make a longer palindromic substring centered at our current location by adding the characters just before and just after the current longest palindromic substring. We can do this if and only if these characters match. If they do not match, there is no hope of any longer palindromic substring centered here existing, so we are done.
 +
 +
For example, suppose our current center is between the two '''b''''s in the word '''scabbards'''. We already know there exists a palindromic substring of length zero (the empty string) centered around our current location. To determine whether we can make it longer, we check the first character on either side, that is, the two '''b''''s. They match, so we know that '''bb''' is a palindromic substring centered at our current location. Then we try to extend it again, by checking the first character on either side. Now we find two '''a''''s, so we now know we have '''abba''', a palindromic substring of length 4, centered around our current location. Again we try to extend it, by checking one character on each side. We find the '''c''' and the '''r''', which do ''not'' match. Therefore we know that '''abba''' is the longest palindromic substring centered our current location --- substrings centered at our current location that are longer than '''cabbar''', such as '''scabbard''', have no hope of being palindromic.
 +
 +
Notice that once we have fixed the centre, we will not examine any character of the string more than once in determining the longest palindromic substring centered there. That is, determining the longest palindromic substring centered at any given location takes only <math>O(N)</math> time, and by iterating over all possible centres, an <math>O(N^2)</math> algorithm is obtained. (Note that this algorithm also gives ''all'' longest palindromic substrings: if the length of the palindrome centered around a given position is maximal, then we output the longest palindromic substring centered at this position.)
  
 
==Toward a better bound==
 
==Toward a better bound==
Line 75: Line 95:
 
<ref name="ukkonen">E. Ukkonen. (1995). On-line construction of suffix trees. ''Algorithmica'' '''14'''(3):249-260. [http://www.cs.helsinki.fi/u/ukkonen/SuffixT1.pdf PDF] | [http://www.cs.helsinki.fi/u/ukkonen/SuffixT1withFigs.pdf PDF with figures]</ref>
 
<ref name="ukkonen">E. Ukkonen. (1995). On-line construction of suffix trees. ''Algorithmica'' '''14'''(3):249-260. [http://www.cs.helsinki.fi/u/ukkonen/SuffixT1.pdf PDF] | [http://www.cs.helsinki.fi/u/ukkonen/SuffixT1withFigs.pdf PDF with figures]</ref>
 
<ref name="tarjan">Gabow, H. N.; Tarjan, R. E. (1983), "A linear-time algorithm for a special case of disjoint set union", <i>Proceedings of the 15th ACM Symposium on Theory of Computing (STOC)</i>, pp.&#160;246–251, doi:[http://dx.doi.org/10.1145%2F800061.808753 10.1145/800061.808753]</ref>
 
<ref name="tarjan">Gabow, H. N.; Tarjan, R. E. (1983), "A linear-time algorithm for a special case of disjoint set union", <i>Proceedings of the 15th ACM Symposium on Theory of Computing (STOC)</i>, pp.&#160;246–251, doi:[http://dx.doi.org/10.1145%2F800061.808753 10.1145/800061.808753]</ref>
<ref name="akalin">Fred Akalin. Finding the longest palindromic substring in linear time. Retrieved 2016-10-01 from [https://www.akalin.com/longest-palindrome-linear-time https://www.akalin.com/longest-palindrome-linear-time]. ''(This source contains a Python implementation of the simple linear-time algorithm.)''</ref>
+
<ref name="akalin">Frederick Akalin. Finding the longest palindromic substring in linear time. Retrieved 2010-11-13 from [http://www.akalin.cx/2007/11/28/finding-the-longest-palindromic-substring-in-linear-time/ http://www.akalin.cx/2007/11/28/finding-the-longest-palindromic-substring-in-linear-time/]. ''(This source contains a Python implementation of the simple linear-time algorithm.)''</ref>
 
<ref name="manacher">Glenn Manacher. 1975. A New Linear-Time "On-Line" Algorithm for Finding the Smallest Initial Palindrome of a String. J. ACM 22, 3 (July 1975), 346-351. DOI=10.1145/321892.321896 http://doi.acm.org/10.1145/321892.321896 </ref>
 
<ref name="manacher">Glenn Manacher. 1975. A New Linear-Time "On-Line" Algorithm for Finding the Smallest Initial Palindrome of a String. J. ACM 22, 3 (July 1975), 346-351. DOI=10.1145/321892.321896 http://doi.acm.org/10.1145/321892.321896 </ref>
 
<ref name="galil">Alberto Apostolico, Dany Breslauer, and Zvi Galil (1994), "Parallel Detection of all Palindromes in a String", Comput. Sci.</ref>
 
<ref name="galil">Alberto Apostolico, Dany Breslauer, and Zvi Galil (1994), "Parallel Detection of all Palindromes in a String", Comput. Sci.</ref>
Line 88: Line 108:
 
** {{SPOJ|PALIM|Yet Another Longest Palindrome Problem}}
 
** {{SPOJ|PALIM|Yet Another Longest Palindrome Problem}}
 
** {{SPOJ|PALDR|Even Palindrome}}
 
** {{SPOJ|PALDR|Even Palindrome}}
* The problem Calf Flac from [http://train.usaco.org/ the USACO training pages] can be solved using the naive algorithm. ilove youuuuuuuuuuuuu
+
* The problem Calf Flac from [http://train.usaco.org/ the USACO training pages] can be solved using the naive algorithm.

Please note that all contributions to PEGWiki are considered to be released under the Attribution 3.0 Unported (see PEGWiki:Copyrights for details). If you do not want your writing to be edited mercilessly and redistributed at will, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource. Do not submit copyrighted work without permission!

Cancel | Editing help (opens in new window)

Templates used on this page: