Editing String
Warning: You are not logged in. Your IP address will be publicly visible if you make any edits. If you log in or create an account, your edits will be attributed to your username, along with other benefits.
The edit can be undone.
Please check the comparison below to verify that this is what you want to do, and then save the changes below to finish undoing the edit.
Latest revision | Your text | ||
Line 2: | Line 2: | ||
==Definitions== | ==Definitions== | ||
− | An '''alphabet''', usually denoted <math>\Sigma</math> is a finite nonempty set (whose size is denoted <math>|\Sigma|</math>). It is often assumed that the alphabet is totally ordered, but this is not always | + | An '''alphabet''', usually denoted <math>\Sigma</math> is a finite nonempty set (whose size is denoted <math>|\Sigma|</math>). It is often assumed that the alphabet is totally ordered, but this is not always necesary. An element of this the alphabet is known as a '''character'''. |
The set of <math>n</math>-tuples of <math>\Sigma</math> is denoted <math>\Sigma^n</math>. A '''string of length <math>n</math>''' is an element of <math>\Sigma^n</math>. The set <math>\Sigma^*</math> is defined <math>\Sigma^0 \cup \Sigma^1 \cup ...</math>; an element of <math>\Sigma^*</math> is known simply as a '''string''' over <math>\Sigma</math>. The '''empty string''', denoted <math>\epsilon</math> or <math>\lambda</math>, is the unique element of <math>\Sigma^0</math>. The length of a string <math>S</math> is denoted <math>|S|</math>. (Note that the usual definition of "string" requires strings to have finite length, although arbitrarily long strings exist.) | The set of <math>n</math>-tuples of <math>\Sigma</math> is denoted <math>\Sigma^n</math>. A '''string of length <math>n</math>''' is an element of <math>\Sigma^n</math>. The set <math>\Sigma^*</math> is defined <math>\Sigma^0 \cup \Sigma^1 \cup ...</math>; an element of <math>\Sigma^*</math> is known simply as a '''string''' over <math>\Sigma</math>. The '''empty string''', denoted <math>\epsilon</math> or <math>\lambda</math>, is the unique element of <math>\Sigma^0</math>. The length of a string <math>S</math> is denoted <math>|S|</math>. (Note that the usual definition of "string" requires strings to have finite length, although arbitrarily long strings exist.) | ||
− | |||
− | |||
For ease of conceptualization, we shall usually assign a ''symbol'', a graphical representation, to each character of the alphabet and, considering a string as a sequence of characters, render it as a sequence of symbols. We shall usually do so in '''boldface''', but in this section we shall use ''italics'' to avoid confusion with terms being defined, hence, ''PEG''. On other occasions we may choose to represent them with the ordered list notation, hence, [''P'',''E'',''G'']. (This form is useful when the symbols consist of more than one glyph, as can be the case when they are integers; see below.) We will often number the characters of strings, sometimes starting from zero, sometimes starting from one. | For ease of conceptualization, we shall usually assign a ''symbol'', a graphical representation, to each character of the alphabet and, considering a string as a sequence of characters, render it as a sequence of symbols. We shall usually do so in '''boldface''', but in this section we shall use ''italics'' to avoid confusion with terms being defined, hence, ''PEG''. On other occasions we may choose to represent them with the ordered list notation, hence, [''P'',''E'',''G'']. (This form is useful when the symbols consist of more than one glyph, as can be the case when they are integers; see below.) We will often number the characters of strings, sometimes starting from zero, sometimes starting from one. | ||
Line 18: | Line 16: | ||
* The set of universal proteinogenic amino acids, {''A'', ''C'', ''D'', ''E'', ''F'', ''G'', ''H'', ''I'', ''K'', ''L'', ''M'', ''N'', ''P'', ''Q'', ''R'', ''S'', ''T'', ''V'', ''W'', ''Y''} (<math>|\Sigma|=20</math>). The primary structure of a protein is represented as a string over this alphabet. This and the preceding example are important alphabets in bioinformatics. | * The set of universal proteinogenic amino acids, {''A'', ''C'', ''D'', ''E'', ''F'', ''G'', ''H'', ''I'', ''K'', ''L'', ''M'', ''N'', ''P'', ''Q'', ''R'', ''S'', ''T'', ''V'', ''W'', ''Y''} (<math>|\Sigma|=20</math>). The primary structure of a protein is represented as a string over this alphabet. This and the preceding example are important alphabets in bioinformatics. | ||
− | A '''substring''' of a string <math>S = c_1 c_2 ... c_n</math> is | + | A '''substring''' of a string <math>S = c_1 c_2 ... c_n</math> is given by <math>s = c_i c_{i+1} ... c_j</math> for some <math>i, j \in \mathbb{N}</math> with <math>i \leq j \leq n</math>. Note that the empty string is considered a substring of all other strings. In other words, it is a series of (possibly zero) characters occurring consecutively in a string, taken in order. For example, ''the'', ''in'', ''here'', and ''therein'' are all substrings of ''therein'', as is <math>\epsilon</math>; however, ''tin'' is not (the characters are not consecutive in the original string), nor is ''rine'' (the characters do not occur in order in the original string). A '''prefix''' is a substring with <math>i = 1</math>, so that <math>\epsilon</math>, ''the'', ''there'', and ''therein'' are prefixes of ''therein''. A '''suffix''' is a substring with <math>j = n</math>, so that <math>\epsilon</math>, ''in'', ''rein'', ''herein'', and ''therein'' are suffixes of ''therein''. We write <math>s \sqsubset S</math> to indicate that <math>s</math> is a prefix of <math>S</math>. Likewise the notation <math>s \sqsupset S</math> denotes that <math>s</math> is a suffix of <math>S</math>. We say that <math>s</math> is a '''proper prefix''' of <math>S</math> when <math>s \sqsubset S</math> and <math>s \neq S</math>, with '''proper suffix''' defined analogously. |
These examples may have given the misleading impression that strings which do not represent "valid" words, such as ''ther'', are not valid strings. This is entirely untrue; ''ther'' is also a valid prefix of ''therein''. The definition of a string says nothing about the ''validity'' or ''meaning'' of a string. A '''language''' is a (possibly empty, often infinite) subset of <math>\Sigma^*</math>; an element of a language is often called a ''word'' (although this term should be used with caution). Thus, while every element of <math>\Sigma^*</math> is a valid string, not all are necessarily valid words in a language over <math>\Sigma^*</math>. For example, let <math>\Sigma</math> be defined as the Latin alphabet and <math>L \subseteq \Sigma^*</math> be the language consisting of the representations of all valid English words. (Note that we have been careful to distinguish the words themselves from their representations as strings.) Then ''ther'' is certainly a valid ''string'', being an element of <math>\Sigma^*</math>, but is not a valid ''word'' in <math>L</math>, since it does not represent an English word. | These examples may have given the misleading impression that strings which do not represent "valid" words, such as ''ther'', are not valid strings. This is entirely untrue; ''ther'' is also a valid prefix of ''therein''. The definition of a string says nothing about the ''validity'' or ''meaning'' of a string. A '''language''' is a (possibly empty, often infinite) subset of <math>\Sigma^*</math>; an element of a language is often called a ''word'' (although this term should be used with caution). Thus, while every element of <math>\Sigma^*</math> is a valid string, not all are necessarily valid words in a language over <math>\Sigma^*</math>. For example, let <math>\Sigma</math> be defined as the Latin alphabet and <math>L \subseteq \Sigma^*</math> be the language consisting of the representations of all valid English words. (Note that we have been careful to distinguish the words themselves from their representations as strings.) Then ''ther'' is certainly a valid ''string'', being an element of <math>\Sigma^*</math>, but is not a valid ''word'' in <math>L</math>, since it does not represent an English word. | ||
Line 24: | Line 22: | ||
Thus, a string of length <math>N</math> has <math>N+1</math> prefixes, <math>N+1</math> suffixes, <math>N</math> proper prefixes, <math>N</math> proper suffixes, and <math>1+N(N+1)/2</math> substrings. (Here we consider two substrings to be distinct if they start or end at different indices, except that the empty string is counted only once; see also [[counting distinct substrings]].) | Thus, a string of length <math>N</math> has <math>N+1</math> prefixes, <math>N+1</math> suffixes, <math>N</math> proper prefixes, <math>N</math> proper suffixes, and <math>1+N(N+1)/2</math> substrings. (Here we consider two substrings to be distinct if they start or end at different indices, except that the empty string is counted only once; see also [[counting distinct substrings]].) | ||
− | + | Strings can be '''concatenated'''. To concatenate two strings is to link them together to give a new string which begins with the first and ends with the second with no overlap. Formally, the concatenation of strings <math>S</math> and <math>T</math> is denoted <math>ST</math> and has the following three properties: | |
− | + | # <math>S \sqsubset ST</math> | |
− | + | # <math>T \sqsupset ST</math> | |
− | + | # <math>|ST| = |S|+|T|</math> | |
For example, concatenating ''there'' and ''in'' gives ''therein''. So do concatenating ''the'' and ''rein'', or <math>\epsilon</math> and ''therein'', or ''therein'' and <math>\epsilon</math>. Note that in general, concatenation is not commutative. | For example, concatenating ''there'' and ''in'' gives ''therein''. So do concatenating ''the'' and ''rein'', or <math>\epsilon</math> and ''therein'', or ''therein'' and <math>\epsilon</math>. Note that in general, concatenation is not commutative. | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− |