Regular Expressions
Regular expressions (regex) are powerful pattern matching tools that
let you search, extract, and manipulate text using complex patterns. Python's
re module provides functions for working with regex patterns. Master
regex and you can validate input, extract data, find/replace text, and process
structured information efficiently!
Basic Pattern Matching
The re module provides several functions for pattern matching. The
most common are re.search() (finds first match), re.findall()
(finds all matches), and re.match() (matches at string start).
Click Run to execute your code
-
re.search(pattern, string): Finds first match anywhere in string,
returns Match object or None-
re.match(pattern, string): Matches only at string start, returns
Match object or None-
re.findall(pattern, string): Finds all matches, returns list of
strings-
re.finditer(pattern, string): Finds all matches, returns iterator of
Match objects
Common Regex Patterns
Regex patterns use special characters and sequences to match text. Here are the most commonly used patterns:
Click Run to execute your code
r"pattern") for regex
patterns to avoid escaping backslashes. In raw strings, \d stays as
\d instead of being interpreted as an escape sequence!
Groups and Capturing
Parentheses () create groups that capture parts of the match. You can
access captured groups using the group() method on Match objects, or
use them in substitutions.
Click Run to execute your code
Substitution with re.sub
The re.sub() function replaces matches with replacement text. You can
use captured groups in the replacement string using \1, \2,
etc.
Click Run to execute your code
re.compile(). This is faster than calling
re.search() repeatedly with the same pattern!
Common Mistakes
1. Using match() instead of search()
# Wrong - match only works at start
import re
text = "Contact: [email protected]"
result = re.match(r"\S+@\S+", text) # Returns None!
print(result) # None - no match at start
# Correct - use search for anywhere in string
result = re.search(r"\S+@\S+", text) # Finds email
print(result.group()) # '[email protected]'
2. Not using raw strings for patterns
# Wrong - backslashes need escaping
import re
# This tries to match tab character, not word boundary!
pattern = "\bword\b" # \b is interpreted as backspace character
text = "a word here"
result = re.search(pattern, text) # Won't work as expected
# Correct - use raw string
pattern = r"\bword\b" # \b is word boundary
result = re.search(pattern, text) # Works correctly
print(result.group()) # 'word'
3. Forgetting that search/match return Match objects or None
# Wrong - calling .group() on None crashes
import re
text = "No email here"
match = re.search(r"\S+@\S+", text)
email = match.group() # AttributeError: 'NoneType' has no attribute 'group'
# Correct - check for None first
match = re.search(r"\S+@\S+", text)
if match:
email = match.group()
print(email)
else:
print("No email found")
4. Greedy vs non-greedy matching
# Wrong - greedy matching takes too much
import re
text = "First
and Second
"
# Greedy - matches from first < to last >
match = re.search(r".*
", text)
print(match.group()) # 'First
and Second
' - too much!
# Correct - use non-greedy *?
match = re.search(r".*?
", text)
print(match.group()) # 'First
' - just first match
Exercise: Extract Phone Numbers
Task: Create a function that extracts phone numbers from text. Handle both formats: (123) 456-7890 and 123-456-7890.
Requirements:
- Import the
remodule - Create a function
extract_phones(text)that finds all phone numbers - Use
re.findall()to extract matches - Support formats: (123) 456-7890 and 123-456-7890
- Return a list of all found phone numbers
- Test with text containing multiple phone numbers
Click Run to execute your code
Show Solution
import re
def extract_phones(text):
"""Extract phone numbers in formats (123) 456-7890 or 123-456-7890."""
# Pattern matches: (optional area code in parens) digits-digits-digits
pattern = r"\(?\d{3}\)?\s?-?\s?\d{3}-\d{4}"
phones = re.findall(pattern, text)
return phones
# Test the function
text = """
Contact us at (555) 123-4567 or 555-987-6543.
Our office is at 123-456-7890.
Call (888) 555-0000 for support.
"""
phones = extract_phones(text)
print("Found phone numbers:")
for phone in phones:
print(f" - {phone}")
Summary
- re Module: Python's module for regular expression pattern matching
- re.search(): Finds first match anywhere in string, returns Match object or None
- re.match(): Matches only at string start, returns Match object or None
- re.findall(): Finds all matches, returns list of strings or tuples (if groups)
- re.sub(): Replaces matches with replacement text, supports
group references like
\1 - Raw Strings: Use
r"pattern"to avoid escaping backslashes - Groups: Use parentheses
()to capture parts of matches, access withgroup() - Common Patterns:
\d(digit),\w(word),\s(space),+(1+),*(0+),?(0 or 1),{n,m}(range) - Compiled Regex: Use
re.compile()for better performance when reusing patterns
What's Next?
Regular expressions are powerful tools for text processing! Next, we'll explore type hinting, which allows you to add type annotations to your Python code. While Python is dynamically typed, type hints improve code readability, enable better IDE support, and can catch errors with type checkers like mypy!
Enjoying these tutorials?