Language: EN

regex-lookarounds

Lookarounds in Regex

Lookarounds are patterns that allow for conditional matching, based on what is around (before or after) the pattern we are looking for.

They are divided into two main categories:

  • Lookaheads: Check if a pattern follows another pattern.
  • Lookbehinds: Check if a pattern precedes another pattern.

Both types of lookarounds do not consume characters in the text string. That is, they are not part of the final match.

To use lookarounds, we employ the following syntax

TypeSyntax
Positive Lookahead(?=pattern)
Negative Lookahead(?!pattern)
Positive Lookbehind(?<=pattern)
Negative Lookbehind(?<!pattern)

Could they have made it more complicated and less intuitive? Probably not 😆

How to use Lookahead

Positive Lookahead

The positive lookahead (?=pattern) checks if a specific pattern immediately follows another pattern. If the condition is met, the match occurs.

For example, let’s say we want to find all words that are followed by an exclamation mark.

¡Hola! ¿Cómo estas? ¡Esto es genial!

In this case,

  • \w+ matches the words
  • (?=!) ensures they are followed by an exclamation mark.

Negative Lookahead

The negative lookahead (?!pattern) checks that a specific pattern does not follow another pattern. If the condition is met (i.e., the pattern is not found), the match occurs.

For example, suppose we want to find words that are not followed by a question mark.

¡Hola! ¿Como estas? ¡Esto es increible!

Here,

  • \w+\b matches the words
  • (?!\?) ensures they are not followed by a question mark.

How to use Lookbehind

Positive Lookbehind

The positive lookbehind (?<=pattern) checks that a specific pattern precedes another pattern. If the condition is met, the match occurs.

Let’s say we want to find all numbers that are preceded by a dollar sign.

text = "El precio es $10 y el descuento es $2."
pattern = r'(?<=\$)\d+'

matches = re.findall(pattern, text)
print(matches)  # ['10', '2']
El precio es $10 y el descuento es $2.

In this case,

  • (?<=\$) ensures that the number is preceded by a dollar sign
  • In this case, it matches 10 and 2

Negative Lookbehind

The negative lookbehind (?<!pattern) checks that a specific pattern does not precede another pattern. If the condition is met (i.e., the pattern is not found), the match occurs.

Suppose we want to find numbers that are not preceded by a dollar sign.

El precio es $10 y el descuento es $2.

Here,

  • (?<!\$) ensures that the number is not preceded by a dollar sign
  • In this case, it only matches the 0 from the first $10.