Editing String
Warning: You are not logged in. Your IP address will be publicly visible if you make any edits. If you log in or create an account, your edits will be attributed to your username, along with other benefits.
The edit can be undone.
Please check the comparison below to verify that this is what you want to do, and then save the changes below to finish undoing the edit.
Latest revision | Your text | ||
Line 2: | Line 2: | ||
==Definitions== | ==Definitions== | ||
− | An '''alphabet''', usually denoted <math>\Sigma</math> is a finite nonempty set (whose size is denoted <math>|\Sigma|</math>). It is often assumed that the alphabet is totally ordered, but this is not always | + | An '''alphabet''', usually denoted <math>\Sigma</math> is a finite nonempty set (whose size is denoted <math>|\Sigma|</math>). It is often assumed that the alphabet is totally ordered, but this is not always necesary. An element of this the alphabet is known as a '''character'''. |
− | The set of <math>n</math>-tuples of <math>\Sigma</math> is denoted <math>\Sigma^n</math>. A '''string of length <math>n</math>''' is an element of <math>\Sigma^n</math>. The set <math>\Sigma^*</math> is defined <math>\Sigma^0 \cup \Sigma^1 \cup ...</math>; an element of <math>\Sigma^*</math> is known simply as a '''string''' over <math>\Sigma</math>. The '''empty string''', denoted | + | The set of <math>n</math>-tuples of <math>\Sigma</math> is denoted <math>\Sigma^n</math>. A '''string of length <math>n</math>''' is an element of <math>\Sigma^n</math>. The set <math>\Sigma^*</math> is defined <math>\Sigma^0 \cup \Sigma^1 \cup ...</math>; an element of <math>\Sigma^*</math> is known simply as a '''string''' over <math>\Sigma</math>. The '''empty string''', denoted <math>\lambda</math>, is the unique element of <math>\Sigma^0</math>. (The usual definition of "string", then, requires strings to have finite but unbounded length.) |
− | + | ||
− | + | ||
For ease of conceptualization, we shall usually assign a ''symbol'', a graphical representation, to each character of the alphabet and, considering a string as a sequence of characters, render it as a sequence of symbols. We shall usually do so in '''boldface''', but in this section we shall use ''italics'' to avoid confusion with terms being defined, hence, ''PEG''. On other occasions we may choose to represent them with the ordered list notation, hence, [''P'',''E'',''G'']. (This form is useful when the symbols consist of more than one glyph, as can be the case when they are integers; see below.) We will often number the characters of strings, sometimes starting from zero, sometimes starting from one. | For ease of conceptualization, we shall usually assign a ''symbol'', a graphical representation, to each character of the alphabet and, considering a string as a sequence of characters, render it as a sequence of symbols. We shall usually do so in '''boldface''', but in this section we shall use ''italics'' to avoid confusion with terms being defined, hence, ''PEG''. On other occasions we may choose to represent them with the ordered list notation, hence, [''P'',''E'',''G'']. (This form is useful when the symbols consist of more than one glyph, as can be the case when they are integers; see below.) We will often number the characters of strings, sometimes starting from zero, sometimes starting from one. | ||
Examples of alphabets include: | Examples of alphabets include: | ||
− | * The binary alphabet, with exactly two characters (<math>|\Sigma|=2</math>), usually denoted ''0'' and ''1''. In virtually all computers ever built, data are strings over the binary alphabet, and are known as bit strings. | + | * The binary alphabet, with exactly two characters (<math>|\Sigma|=2</math>), usually denoted ''0'' and '''1'''. In virtually all computers ever built, data are strings over the binary alphabet, and are known as bit strings. |
* The Latin alphabet, with the characters ''a'', ''A'', ''b'', ''B'', ..., ''z'', ''Z'' (<math>|\Sigma|=52</math>). English words may be considered strings over the Latin alphabet. | * The Latin alphabet, with the characters ''a'', ''A'', ''b'', ''B'', ..., ''z'', ''Z'' (<math>|\Sigma|=52</math>). English words may be considered strings over the Latin alphabet. | ||
* The Unicode character set (the size of this alphabet depends somewhat on what we consider to be a valid Unicode character). Text files may be considered strings over this alphabet. | * The Unicode character set (the size of this alphabet depends somewhat on what we consider to be a valid Unicode character). Text files may be considered strings over this alphabet. | ||
* The set of integers <math>\Sigma = \{0, 1, ..., N-1\}</math> for some <math>N \in \mathbb{N}</math>, for which <math>|\Sigma| = N</math>. Here the characters are also integers, and their corresponding symbols might have more than one glyph when <math>N > 10</math>. (Recall that the definition of ''character'' is broader than the common concept of the character as the smallest, indivisible element of writing.) | * The set of integers <math>\Sigma = \{0, 1, ..., N-1\}</math> for some <math>N \in \mathbb{N}</math>, for which <math>|\Sigma| = N</math>. Here the characters are also integers, and their corresponding symbols might have more than one glyph when <math>N > 10</math>. (Recall that the definition of ''character'' is broader than the common concept of the character as the smallest, indivisible element of writing.) | ||
* The set of nitrogenous bases in DNA, {''A'', ''C'', ''G'', ''T''}, with <math>|\Sigma|=4</math>. Codons are considered members of <math>\Sigma^3</math>, and DNA sequences members of <math>\Sigma^*</math>. | * The set of nitrogenous bases in DNA, {''A'', ''C'', ''G'', ''T''}, with <math>|\Sigma|=4</math>. Codons are considered members of <math>\Sigma^3</math>, and DNA sequences members of <math>\Sigma^*</math>. | ||
− | |||
− | A '''substring''' of a string <math>S = c_1 c_2 ... c_n</math> is | + | A '''substring''' of a string <math>S = c_1 c_2 ... c_n</math> is given by <math>s = c_i c_{i+1} ... c_j</math> for some <math>i, j \in \mathbb{N}</math> with <math>i \leq j \leq n</math>. Note that the empty string is considered a substring of all other strings. In other words, it is a series of (possibly zero) characters occurring consecutively in a string, taken in order. For example, ''the'', ''in'', ''here'', and ''therein'' are all substrings of ''therein'', as is <math>\lambda</math>; however, ''tin'' is not (the characters are not consecutive in the original string), nor is ''rine'' (the characters do not occur in order in the original string). A ''prefix'' is a substring with <math>i = 1</math>, so the empty string, ''the'', ''there'', and ''therein'' are prefixes of |
+ | ''therein''. A ''suffix'' is a substring with <math>j = n</math>, so the empty string, ''in'', ''rein'', ''herein'', and ''therein'' are suffixes of ''therein''. | ||
These examples may have given the misleading impression that strings which do not represent "valid" words, such as ''ther'', are not valid strings. This is entirely untrue; ''ther'' is also a valid prefix of ''therein''. The definition of a string says nothing about the ''validity'' or ''meaning'' of a string. A '''language''' is a (possibly empty, often infinite) subset of <math>\Sigma^*</math>; an element of a language is often called a ''word'' (although this term should be used with caution). Thus, while every element of <math>\Sigma^*</math> is a valid string, not all are necessarily valid words in a language over <math>\Sigma^*</math>. For example, let <math>\Sigma</math> be defined as the Latin alphabet and <math>L \subseteq \Sigma^*</math> be the language consisting of the representations of all valid English words. (Note that we have been careful to distinguish the words themselves from their representations as strings.) Then ''ther'' is certainly a valid ''string'', being an element of <math>\Sigma^*</math>, but is not a valid ''word'' in <math>L</math>, since it does not represent an English word. | These examples may have given the misleading impression that strings which do not represent "valid" words, such as ''ther'', are not valid strings. This is entirely untrue; ''ther'' is also a valid prefix of ''therein''. The definition of a string says nothing about the ''validity'' or ''meaning'' of a string. A '''language''' is a (possibly empty, often infinite) subset of <math>\Sigma^*</math>; an element of a language is often called a ''word'' (although this term should be used with caution). Thus, while every element of <math>\Sigma^*</math> is a valid string, not all are necessarily valid words in a language over <math>\Sigma^*</math>. For example, let <math>\Sigma</math> be defined as the Latin alphabet and <math>L \subseteq \Sigma^*</math> be the language consisting of the representations of all valid English words. (Note that we have been careful to distinguish the words themselves from their representations as strings.) Then ''ther'' is certainly a valid ''string'', being an element of <math>\Sigma^*</math>, but is not a valid ''word'' in <math>L</math>, since it does not represent an English word. | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− |