Python RegEx


RegEx, or Regular Expression, is a sequence of characters that form a search pattern.

RegEx can be used to check if a character unit contains a specified search pattern.


RegEx Module

Python has a built-in package called re, which can be used to work with Regular Expressions.

Import the re module:


import re


RegEx in Python

Once you re the module, you can start using standard expressions:


Example

Search the string to see if it starts with "The" and ends with "Spain":

import re

txt = "The rain in Spain"
x = re.search("^The.*Spain$", txt)


RegEx Functions

The re module provides a set of functions that allow us to search for a match string:

Function Description
findall Returns a list containing all matches
search Returns a Match object if there is a match anywhere in the string
split Returns a list where the string has been split at each match
sub Replaces one or many matches with a string


Metacharacters

Metacharacters are characters with a special meaning:

Character Description Example
[] A set of characters "[a-m]"
\ Signals a special sequence (can also be used to escape special characters) "\d"
. Any character (except newline character) "he..o"
^ Starts with "^hello"
$ Ends with "planet$"
* Zero or more occurrences "he.*o"
+ One or more occurrences "he.+o"
? Zero or one occurrences "he.?o"
{} Exactly the specified number of occurrences "he{2}o"
| Either or "falls|stays"
() Capture and group    


Special Sequences

The special sequence is \ followed by one of the characters in the list below, and has a special meaning:

Character Description Example
\A Returns a match if the specified characters are at the beginning of the string "\AThe"
\b Returns a match where the specified characters are at the beginning or at the end of a word
(the "r" in the beginning is making sure that the string is being treated as a "raw string")
r"\bain"
r"ain\b"
\B Returns a match where the specified characters are present, but NOT at the beginning (or at the end) of a word
(the "r" in the beginning is making sure that the string is being treated as a "raw string")
r"\Bain"
r"ain\B"
\d Returns a match where the string contains digits (numbers from 0-9) "\d"
\D Returns a match where the string DOES NOT contain digits "\D"
\s Returns a match where the string contains a white space character "\s"
\S Returns a match where the string DOES NOT contain a white space character "\S"
\w Returns a match where the string contains any word characters (characters from a to Z, digits from 0-9, and the underscore _ character) "\w"
\W Returns a match where the string DOES NOT contain any word characters "\W"
\Z Returns a match if the specified characters are at the end of the string "Spain\Z"


Sets

A set of characters within a pair of square brackets [] with special meaning:

Set Description
[arn] Returns a match where one of the specified characters (a, r, or n) are present
[a-n] Returns a match for any lower case character, alphabetically between a and n
[^arn] Returns a match for any character EXCEPT a, r, and n
[0123] Returns a match where any of the specified digits (0, 1, 2, or 3) are present
[0-9] Returns a match for any digit between 0 and 9
[0-5][0-9] Returns a match for any two-digit numbers from 00 and 59
[a-zA-Z] Returns a match for any character alphabetically between a and z, lower case OR upper case
[+] In sets, +, *, ., |, (), $,{} has no special meaning, so [+] means: return a match for any + character in the string


The findall() Function

The findall() function returns a list containing all the same.


Example

Print a list of all matches:

import re

txt = "The rain in Spain"
x = re.findall("Portugal", txt)
print(x)

The list contains the same as how it is obtained.

If no match was found, the blank list is returned:


Example

Return an empty list if no match was found:

import re

txt = "The rain in Spain"
x = re.search("\s", txt)

print("The first white-space character is located in position:", x.start())


The search() Function

The search() function searches for a thread to find the match, then returns the Match item if there is a match.

If there is more than one match, only the first action of the game will be returned:


Example

Search for the first white-space character in the string:

import re

txt = "The rain in Spain"
x = re.search("Portugal", txt)
print(x)

If no match was found, the None is returned:


Example

Make a search that returns no match:

import re

txt = "The rain in Spain"
x = re.split("\s", txt)
print(x)


The split() Function

The split() function returns the list where the character unit is separated for each game:


Example

Split at each white-space character:

import re

txt = "The rain in Spain"
x = re.split("\s", txt, 1)
print(x)

You can control the number of occurrences by specifying a maxsplit parameter:


Example

Split the string only at the first occurrence:

import re

txt = "The rain in Spain"
x = re.sub("\s", "9", txt)
print(x)


the sub() Function

The sub() function replaces the match with the text of your choice:


Example

Replace every white-space character with the number 9:

import re

txt = "The rain in Spain"
x = re.sub("\s", "9", txt, 2)
print(x)

You can control the amount of changes by specifying a count parameter:


Example

Replace the first 2 occurrences:

import re

txt = "The rain in Spain"
x = re.search("ai", txt)
print(x) #this will print an object


Match Object

Match Object is an item that contains information about the search and the result.


Note: If no match, None will be returned, instead of the Same Object.



Example

Do a search that will return a Match Object:

import re

txt = "The rain in Spain"
x = re.search(r"\bS\w+", txt)
print(x.span())

The object object has features and methods used to retrieve information about the search, as well as the result:

.span() returns the tuple containing the first, last place for a game.

.string() returns the unit of characters that was transferred to the function

.group() returns the part of the character unit where it was the same


Example

Print the position (start- and end-position) of the first match occurrence.

The regular expression looks for any words that starts with an upper case "S":

import re

txt = "The rain in Spain"
x = re.search(r"\bS\w+", txt)
print(x.string)


Example

Print the string passed into the function:

import re

txt = "The rain in Spain"
x = re.search(r"\bS\w+", txt)
print(x.group())

Note: If no match, None will be returned, instead of the Same Object.