Python Tutorials
Python File Handling
Python Modules
RegEx, or Regular Expression, is a sequence of characters that form a search pattern.
RegEx can be used to check if a character unit contains a specified search pattern.
Python has a built-in package called re, which can be used to work with Regular Expressions.
Import the re module:
import re 
    Once you re the module, you can start using standard expressions:
Search the string to see if it starts with "The" and ends with "Spain":
            import 
            re
txt = "The rain in Spain"
x = re.search("^The.*Spain$", txt) 
    The re module provides a set of functions that allow us to search for a match string:
| Function | Description | 
|---|---|
| findall | Returns a list containing all matches | 
| search | Returns a Match object if there is a match anywhere in the string | 
| split | Returns a list where the string has been split at each match | 
| sub | Replaces one or many matches with a string | 
Metacharacters are characters with a special meaning:
| Character | Description | Example | |
|---|---|---|---|
| [] | A set of characters | "[a-m]" | |
| \ | Signals a special sequence (can also be used to escape special characters) | "\d" | |
| . | Any character (except newline character) | "he..o" | |
| ^ | Starts with | "^hello" | |
| $ | Ends with | "planet$" | |
| * | Zero or more occurrences | "he.*o" | |
| + | One or more occurrences | "he.+o" | |
| ? | Zero or one occurrences | "he.?o" | |
| {} | Exactly the specified number of occurrences | "he{2}o" | |
| | | Either or | "falls|stays" | |
| () | Capture and group | 
The special sequence is \ followed by one of the characters in the list below, and has a special meaning:
| Character | Description | Example | 
|---|---|---|
| \A | Returns a match if the specified characters are at the beginning of the string | "\AThe" | 
| \b | Returns a match where the specified characters are at the beginning or at the 
        end of a word (the "r" in the beginning is making sure that the string is being treated as a "raw string")  | 
        r"\bain" r"ain\b"  | 
          
| \B | Returns a match where the specified characters are present, but NOT at the beginning 
        (or at 
        the end) of a word (the "r" in the beginning is making sure that the string is being treated as a "raw string")  | 
        r"\Bain" r"ain\B"  | 
          
| \d | Returns a match where the string contains digits (numbers from 0-9) | "\d" | 
| \D | Returns a match where the string DOES NOT contain digits | "\D" | 
| \s | Returns a match where the string contains a white space character | "\s" | 
| \S | Returns a match where the string DOES NOT contain a white space character | "\S" | 
| \w | Returns a match where the string contains any word characters (characters from a to Z, digits from 0-9, and the underscore _ character) | "\w" | 
| \W | Returns a match where the string DOES NOT contain any word characters | "\W" | 
| \Z | Returns a match if the specified characters are at the end of the string | "Spain\Z" | 
A set of characters within a pair of square brackets [] with special meaning:
| Set | Description | 
|---|---|
| [arn] | Returns a match where one of the specified characters (a,
        r, or n) are 
        present | 
          
| [a-n] | Returns a match for any lower case character, alphabetically between
        a and n | 
          
| [^arn] | Returns a match for any character EXCEPT a,
        r, and n | 
          
| [0123] | Returns a match where any of the specified digits (0,
        1, 2, or 
        3) are 
        present | 
          
| [0-9] | Returns a match for any digit between
        0 and 9 | 
          
| [0-5][0-9] | Returns a match for any two-digit numbers from 00 and 
        59 | 
        
| [a-zA-Z] | Returns a match for any character alphabetically between
        a and z, lower case OR upper case | 
          
| [+] | In sets, +, *,
        ., |,
        (), $,{} 
        has no special meaning, so [+] means: return a match for any
        + character in the string | 
          
The findall() function returns a list containing all the same.
Print a list of all matches:
          import re
txt = "The rain in Spain"
x = re.findall("Portugal", 
          txt)
          print(x) 
    The list contains the same as how it is obtained.
If no match was found, the blank list is returned:
Return an empty list if no match was found:
          import re
txt = "The rain in Spain"
x = re.search("\s", 
          txt)
          
print("The first white-space character is located in 
          position:", x.start())  
    The search() function searches for a thread to find the match, then returns the Match item if there is a match.
If there is more than one match, only the first action of the game will be returned:
Search for the first white-space character in the string:
          import re
txt = "The rain in Spain"
x = re.search("Portugal", 
          txt)
          print(x) 
    If no match was found, the None is returned:
Make a search that returns no match:
          import re
txt = "The rain in Spain"
x = re.split("\s", 
          txt)
          print(x) 
    The split() function returns the list where the character unit is separated for each game:
Split at each white-space character:
          import re
txt = "The rain in Spain"
x = re.split("\s", 
          txt, 
          1)
          print(x) 
    You can control the number of occurrences by specifying a maxsplit parameter:
Split the string only at the first occurrence:
          import re
txt = "The rain in Spain"
x = re.sub("\s", 
          "9", txt)
          print(x) 
    The sub() function replaces the match with the text of your choice:
Replace every white-space character with the number 9:
          import re
txt = "The rain in Spain"
x = re.sub("\s", 
          "9", txt, 2)
          print(x) 
    You can control the amount of changes by specifying a count parameter:
Replace the first 2 occurrences:
          import re
txt = "The rain in Spain"
x = re.search("ai", 
          txt)
          print(x) #this will print an object 
    Match Object is an item that contains information about the search and the result.
Note: If no match, None will be returned, instead of the Same Object.
Do a search that will return a Match Object:
The object object has features and methods used to retrieve information about the search, as well as the result:
.span() returns the tuple containing the first, last place for a game.
.string() returns the unit of characters that was transferred to the function
.group() returns the part of the character unit where it was the same
Print the position (start- and end-position) of the first match occurrence.
The regular expression looks for any words that starts with an upper case "S":
            import re
            txt = "The rain in Spain"
            x = re.search(r"\bS\w+", txt)
            print(x.string) 
    Print the string passed into the function:
            import re
            txt = "The rain in Spain"
            x = re.search(r"\bS\w+", txt)
            print(x.group()) 
    Note: If no match, None will be returned, instead of the Same Object.