What is the difference between Regex, Normal & Extended Replacement?
When you enter the Replacements menu you will encounter these terms. Let’s look closer!
Normal: This option allows you to replace the letter, number or word with any other letter, number or word you want.
Extended: In extended mode, these escape sequences (a backslash followed by a single character and optional material) have special meaning, and will not be interpreted literally.
\n: the Line Feed control character LF (ASCII 0x0A)
\r: The Carriage Return control character CR (ASCII 0x0D)
\t: the TAB control character (ASCII 0x09)
\0: the NUL control character (ASCII 0x00)
\\: the literal backslash character (ASCII 0x5C)
\b: the binary representation of a byte, made of 8 digits which are either 1’s or 0’s. †
\o: the octal representation of a byte, made of 3 digits in the 0-7 range
\d: the decimal representation of a byte, made of 3 digits in the 0-9 range
\x: the hexadecimal representation of a byte, made of 2 digits in the 0-9, A-F/a-f range.
\u: The hexadecimal representation of a two-byte character, made of 4 digits in the 0-9, A-F/a-f range. In Unicode builds, finds a Unicode character (for instance, \u2020 matches the † char, in an UTF-8 encoded file). In ANSI builds, finds characters requiring two bytes, like in the Shift-JIS encoding. †
†NOTE: While some of these Extended Search Mode escape sequences look like regular expression escape sequences, they are not identical. Ones marked with † are different from or not available in regular expressions.
Regex: The Replace function replaces the text of an input field in accordance with a regular expression. Details of the syntax of regular expressions can be found at https://en.wikipedia.org/wiki/Regular_expression
Example: If you had a list of features for a product description that were separated by two lines and a tab, and you wanted to modify them so that they appeared with no double spacing and a bullet point, example below:
Before:
Feature one
Feature two
Feature three
After
- Feature one
- Feature two
- Feature three
You could do this in extended with two replacements.
First you would replace the double carriage return, in this case \r \r, with a single \r.
Secondly you would replace the tab, in this case \t, with \u2022, which is the unicode character code for a bullet point.