Regular expressions

A regular expression, also known as a regex or regexp, is a string whose pattern (template) describes a set of strings. The pattern determines which strings belong to the set. A pattern consists of literal characters and metacharacters, which are characters that have special meaning instead of a literal meaning.

 

Matching Symbols

Regular Expression Description

.

Matches any character

^regex

Finds regex that must match at the beginning of the line.

regex$

Finds regex that must match at the end of the line.

[abc]

Set definition, can match the letter a or b or c.

[abc][vz]

Set definition, can match a or b or c followed by either v or z.

[^abc]

When a caret appears as the first character inside square brackets, it negates the pattern. This pattern matches any character except a or b or c.

[a-d1-7]

Ranges: matches a letter between a and d and figures from 1 to 7, but not d1.

X|Z

Finds X or Z.

XZ

Finds X directly followed by Z.

$

Checks if a line end follows.

 

Meta Characters

The following meta characters have a pre-defined meaning and make certain common patterns easier to use, e.g., \d instead of [0..9].

Regular Expression Description

\d

Any digit, short for [0-9]

\D

A non-digit, short for [^0-9]

\s

A whitespace character, short for [ \t\n\x0b\r\f]

\S

A non-whitespace character, short for

\w

A word character, short for [a-zA-Z_0-9]

\W

A non-word character [^\w]

\S+

Several non-whitespace characters

\b

Matches a word boundary where a word character is [a-zA-Z0-9_]

These meta characters have the same first letter as their representation, e.g., digit, space, word, and boundary. Uppercase symbols define the opposite.

 

Quantifier

A quantifier defines how often an element can occur. The symbols ?, *, + and {} define the quantity of the regular expressions

Regular Expression Description Examples

*

Occurs zero or more times, is short for {0,}

X* finds no or several letter X, <sbr /> .* finds any character sequence

+

Occurs one or more times, is short for {1,}

X+- Finds one or several letter X

?

Occurs no or one times, ? is short for {0,1}.

X? finds no or exactly one letter X

{X}

Occurs X number of times, {}describes the order of the preceding liberal

\d{3} searches for three digits, .{10} for any character sequence of length 10.

{X,Y}

Occurs between X and Y times,

\d{1,4} means \d must occur at least once and at a maximum of four.

*?

? after a quantifier makes it a reluctant quantifier. It tries to find the smallest match. This makes the regular expression stop at the first match.

 

Grouping and back reference

You can group parts of your regular expression. In your pattern you group elements with round brackets, e.g., (). This allows you to assign a repetition operator to a complete group.

In addition these groups also create a back reference to the part of the regular expression. This captures the group. A back reference stores the part of the string which matched the group. This allows you to use this part in the replacement.

Via the ${} you can refer to a group. ${1} is the first group, ${2} the second, etc.

 

 

Negative look ahead

Negative look ahead provides the possibility to exclude a pattern. With this you can say that a string should not be followed by another string.

Negative look ahead are defined via (?!pattern). For example, the following will match "a" if "a" is not followed by "b".

a(?!b)

Regular expresions examples

You can experiment with regular expressions at regexplanet.com

 
Have more questions? Submit a request

Comments

Powered by Zendesk