regex-grupos-de-captura

Capture Groups in Regex

  • 4 min

Capture groups in Regex are a tool that allows us to identify and extract specific parts of a text string.

To define a capture group in Regex, we simply need to delimit a portion of a regular expression with parentheses ( ).

This group allows us to capture the match of the delimited pattern. That is, in addition to matching parts of the text, we can store those matches and reuse them later in the expression or in the code.

Basic Syntax of Capture Groups

The basic use of capture groups involves delimiting patterns with parentheses. From there, we can extract the captured values to use them in subsequent operations.

The syntax of a capture group is simply wrapping the pattern in parentheses:

(pattern)
Copied!

Where pattern is any character sequence we want to capture. These groups are numbered automatically, starting from 1. The number 0 always refers to the complete match of the regular expression.

Capture Group Examples

Let’s see it better with some examples.

(\d{3})-(\d{2})-(\d{4})
Copied!

This regular expression captures a simplified phone number format, where:

  • (\d{3}) captures the first three digits.
  • (\d{2}) captures the next two digits.
  • (\d{4}) captures the last four digits.

Another example, suppose we have this pattern, which captures two words separated by spaces.

(\w+)\s+(\w+)
Copied!

The sequence (\w+) captures any group of alphanumeric characters, and \s+ matches one or more whitespace characters.

If we apply the expression to the following text:

Hello World
Copied!

The resulting captures would be:

  • Group 1: Hello
  • Group 2: World

Quantifiers Applied to Groups

One of the advantages of using groups is that we can apply quantifiers to the entire group instead of just an individual character. This allows us to define repetitions of complete sequences more precisely.

(\d{2}-){3}
Copied!

This pattern captures a sequence of two digits followed by a dash, and requires that sequence to be repeated exactly three times. It is equivalent to:

\d{2}-\d{2}-\d{2}-
Copied!

But much cleaner and easier to read.

References to Capture Groups

One of the most useful features of capture groups is the ability to refer to them within the same expression or in subsequent operations. This can be done in several ways, depending on the context in which we are using the regular expression.

References in the Same Expression: Backreferences

Backreferences allow us to use a captured group later in the same expression. This is useful when we want to find parts of the text that repeat.

(\w+)\s+\1
Copied!

In this expression, \1 refers to the first captured group (\w+). This means the pattern will look for a word followed by a space and the same word repeated immediately after.

Applied to this text:

hello hello world
Copied!

The capture group is activated in the sequence "hello hello", since the word "hello" appears twice consecutively.

References in Replacements

When working with replacement functions in programming languages, capture groups allow us to access the captured matches and use them to form new strings. In most languages, these matches are numbered sequentially.

For example, in JavaScript, we can use capture groups inside the replace function:

let text = "2024-09-27";
let newText = text.replace(/(\d{4})-(\d{2})-(\d{2})/, "$3/$2/$1");
console.log(newText); // "27/09/2024"
Copied!

Here, the pattern (\d{4})-(\d{2})-(\d{2}) captures the date format "2024-09-27", and the replace method reorders it into "day/month/year" format using the references $1, $2, and $3.

Non-Capturing Groups

In some cases, we don’t need to capture the match of a group, but simply use parentheses to group parts of the pattern. For these cases, we can use non-capturing groups, which are defined using the syntax (?:pattern).

Example of a non-capturing group:

(?:\d{3})-\d{2}-\d{4}
Copied!

Here, the first set of three digits (?:\d{3}) will not be captured, but it is still part of the expression.

Named Capture Groups

In more advanced regular expressions, it is possible to use named capture groups to improve readability and handle more complex patterns. Instead of referring to a group by its number, we can assign it a name and refer to it explicitly.

The syntax for named groups varies by language, but in many cases it is like this:

(?<groupName>pattern)
Copied!

For example:

(?<firstName>\w+)\s(?<lastName>\w+)
Copied!

In this case, we can capture two words, assigning the first group the name "firstName" and the second the name "lastName". This is useful in languages like Python, where we can access the captures this way:

import re

text = "Luis Perez"
pattern = r"(?P<firstName>\w+)\s(?P<lastName>\w+)"
match = re.search(pattern, text)
print(match.group("firstName"))  # Luis
print(match.group("lastName"))  # Perez
Copied!