Capturing groups can be useful for tasks like data extraction, validation, or manipulation. A regex capture group is a way to group parts of a regex pattern inside parentheses. This allows us to get only the matched parts or substring of the string, rather than the complete string.
This post provides an in-depth guide on Python regex capturing groups using the below contents:
Python Regex Capturing Groups
A Python regex capturing group extracts parts of a string that match a specified/particular pattern. Capturing groups can be defined by enclosing rules/patterns inside the parentheses. For example, to capture the uppercase letters in a string like ‘PYTHON’, the pattern “\b[A-Z]+\b” is used inside the parentheses.
Let’s see different examples to capture various groups from strings using Python regex:
Example 1: Capturing Uppercase Word Group
The below code is used to capture the uppercase word groups from the given strings:
Code:
import re
string_value = "Python Guide provide by ITSLINUXFOSS"
result = re.search(r"(\b[A-Z]+\b)", string_value)
print(result.group(1))
- The module named “re” is imported.
- The string value is initialized.
- The “re.search()” function extracts the first match for the specified pattern from the given string.
- The pattern r”(\b[A-Z]+\b)” is used inside the “re.search()” function. It will return one or more upper-case letters from the string.
Output:
The first word in the string that matches the specified pattern has been printed.
Example 2: Capturing Number Group
The below code is used to capture the number groups from the given string:
Code:
import re
string_value = "Python 2023 Guide provide by ITSLINUXFOSS"
result = re.search(r"(\b\d+)", string_value)
print(result.group(1))
- The “re” module is imported and the string is initialized.
- The pattern “\b\d+” is used to get the first matched number/digits group from the string.
- The “re.search()” function is used to retrieve a match object if it finds a specified match, or None if it does not.
- The “result.group(1)” returns the first group.
Output:
The number that has been matched is “2023”.
Example 3: Capturing Multiple Groups
The below code is used to capture multiple groups from the given string:
Code:
import re
string_value = "Python 2023 Guide provide by ITSLINUXFOSS"
output = re.search(r"(\b\d+).+(\b[A-Z]+\b)", string_value)
print(output.groups())
print('Capturing Numbers: ',output.group(1))
print('Capturing Uppercase Word: ',output.group(2))
- The module named “re” is imported at the start.
- The “re.search()” function takes the multiple patterns and finds the matched string value from the given string.
- The pattern “(\b\d+)” is used to capture the digits groups and the pattern “(\b[A-Z]+\b)” is used to capture the uppercase letters group.
- The “output.groups()” method is used to return the captured groups as a tuple.
- The “output.group()” function is used to display the first and second capture groups separately.
Output:
The multiple patterns have been captured from the given string.
Example 4: Capturing Multiple Uppercase Word Groups
The following code is utilized to capture the multiple regex patterns from the given string:
Code:
import re
string_value= "PYTHON guide PROVIDED by ITSLINUXFOSS"
output= re.compile (r"(\b[A-Z]+\b)")
for m in output.finditer(string_value):
print(m.group())
- The re module is imported and the string is initialized.
- The “re.compile()” function compiles the regular expression pattern “(\b[A-Z]+\b)” and assigns it to a variable named “output”.
- The for loop is used to iterate all the matches of the pattern in the specified string using the “finditer()” method.
- For each matching object, the function retrieves the matched substring and prints it using the “group()” method.
Output:
The words in the given “string” that match the pattern have been returned.
Example 5: Capturing Multiple Number Groups
The below code is used to capture the multiple groups containing digits/numbers:
Code:
import re
string_value= "Python Guide 2023 Provided by JOSEPH to 300 Students"
output= re.compile (r"(\b\d+\b)")
for m in output.finditer(string_value):
print (m.group())
- The “re.compile()” function compiles the regular expression pattern “(\b\d+\b)” and assigned it to a variable named “output”
- The for loop uses the “finditer()” method to iterate through all matches of the pattern in the specified string.
- The “group()” method is used to retrieve the matched substring for each matching object.
Output:
The matched digits groups have been displayed successfully.
Conclusion
Python regex capturing groups enable you to capture specific parts of a string based on a specified pattern such as using the (\b\d+) pattern for capturing digits. The “regex capturing groups” can be defined by placing parentheses “( )” around the rule/pattern that defines or matches the specific group. To access the captured groups, we can use methods like finditer(), group(), or groups() on the match object. This guide presented various examples to capture specified groups using the Python regex module.