Regular expressions are text patterns used to search and manipulate strings of characters. They are widely used in programming for tasks such as searching, validation, and text substitution.
Syntax
|
Alternation, matches one of several patterns.
g|h #will match "g" or "h".
()
Grouping patterns.
(hi)+ #will match "hi", "hihi", "hihihi", etc.
\\
Escape character for special characters.
(hi)+ #will match "hi", "hihi", "hihihi", etc.
Characters
.
Matches any character except newline.
a.c #will match "abc", "a+c", "a!c", etc.
\w
Matches any word character (letter, number, underscore).
\w+ #will match "word123", "hello_world", "123", etc.
\W
Matches any non-word character.
\W+ #will match "_!@#", "-$%", etc.
\d
Matches any digit.
\d{3} #will match "123", "456", etc.
\D
Matches any non-digit character.
\D+ #will match "abc", "_!@", etc.
Character Classes
[]
Defines a character class.
[abc]
Matches any character within the brackets.
[abc]+ #will match "a", "abc", "caba", etc.
[^]
Defines a negative character class.
E.g. [^abc]
Matches any character not within the brackets.
[^abc]+ #will match "123", "_!@", etc.
[-]
Defines a character range.
E.g. [a-z]
Matches any character in the range from a to z.
[a-z]+ #will match "hello", "example", etc.
E.g. [A-Z]
Matches any character in the range from A to Z.
[A-Z]+ #will match "UPPER", "CASE", etc.
E.g. [0-9]
Matches any digit.
[0-9]+ #will match "123", "4567", etc.
[,]
Defines multiple character classes.
E.g. [a-z,A-Z]
Matches any character in the range from a to z or A-Z.
[a-z,A-Z]+ #will match "Upper", "CASE", "lower" etc.
Whitespace and Line Breaks
\n
Matches a newline.
\t
Matches a tab.
\s
Matches any whitespace.
\s+ #will match " ", " ", "\t\t", etc.
\S
Matches any non-whitespace character.
\S+ #will match "word", "123", "_!@", etc.
Quantifiers
\*
Matches 0 or more occurrences of the preceding pattern.
a* #will match "", "a", "aa", "aaa", etc.
+
Matches 1 or more occurrences of the preceding pattern.
b+ #will match "b", "bb", "bbb", etc.
?
Matches 0 or 1 occurrence of the preceding pattern.
c? #will match "", "c", etc.
{n}
Matches exactly n occurrences of the preceding pattern.
d{3} #will match "ddd".
{n,}
Matches at least n occurrences of the preceding pattern.
e{2,} #will match "ee", "eee", "eeee", etc.
{n,m}
Matches between n and m occurrences of the preceding pattern.
f{1,3} #will match "f", "ff", "fff", etc.
Anchors and Boundaries
^
Matches the start of a line.
^start #will match "start of line", "start_here", etc.
$
Matches the end of a line.
end$ #will match "end of line", "goes to end", etc.
\b
Matches a word boundary.
\bword\b #will match "word", "wording", "my_word", but not
"sword".
\B
Matches a non-word boundary.
\Bnon\B #will match "non-stop", "intrinsic", but not "nonprofit".
Inline Modifiers
(?i)
Case-insensitive modifier.
(?i)hello #will match "hello", "HELLO", "hElLo", etc.
(?m)
Multiline modifier.
(?m)^start #will match "start of line", "start here", etc.
(?s)
Dotall modifier.
(?s)start.*end #will match "start\nmiddle\nend".
(?x)
Verbose modifier.
(?x) a b c # will match "a b c", ignoring spaces.
Lookarounds
(?=pattern)
Positive lookahead, matches if the following text matches pattern
.
(?!pattern)
Negative lookahead, matches if the following text does NOT match pattern
.
(?<=pattern)
Positive lookbehind, matches if the preceding text matches pattern
.
(?<!pattern)
Negative lookbehind, matches if the preceding text does NOT match pattern
.
Flags
Flags are used with regular expressions to modify their behavior during pattern matching in a string.
In C#, flags can be specified as additional arguments when compiling the regular expression with Regex.Compile(). For example:
Regex.Compile("pattern", RegexOptions.IgnoreCase) for the IgnoreCase flag
Regex.Compile("pattern", RegexOptions.Multiline) for the Multiline flag
Regex.Compile("pattern", RegexOptions.Singleline) for the Singleline flag
Regex.Compile("pattern", RegexOptions.IgnorePatternWhitespace) for the IgnorePatternWhitespace flag
/pattern/i for the i (insensitive) flag
/pattern/g for the g (global) flag
/pattern/m for the m (multiline) flag
/pattern/s for the s (dotall) flag
/pattern/u for the u (unicode) flag
In Python, flags can be specified as additional arguments when compiling the regular expression with re.compile(). For example:
re.compile(r'pattern', re.I) for the I (insensitive) flag
re.compile(r'pattern', re.M) for the M (multiline) flag
re.compile(r'pattern', re.S) for the S (dotall) flag
re.compile(r'pattern', re.U) for the U (unicode) flag
Examples in Different Languages
In these examples, the regular expression used is \broja\b
, which searches for the word “roja” as a whole word. The examples search for this word in the text “La casa es roja y azul.” and display the matches found.
using System;
using System.Text.RegularExpressions;
class Program
{
static void Main()
{
string text = "La casa es roja y azul.";
// Pattern to search for the word "roja"
string pattern = @"\broja\b";
// Create the Regex object
Regex regex = new Regex(pattern);
// Find matches
MatchCollection matches = regex.Matches(text);
// Print the matches
foreach (Match match in matches)
{
Console.WriteLine($"Found: {match.Value}");
}
}
}
let text = "La casa es roja y azul.";
// Pattern to search for the word "roja"
let pattern = /\broja\b/g;
// Find matches
let matches = text.match(pattern);
// Print the matches
matches.forEach(match => {
console.log(`Found: ${match}`);
});
import re
text = "La casa es roja y azul."
# Pattern to search for the word "roja"
pattern = r'\broja\b'
# Find matches
matches = re.findall(pattern, text)
# Print the matches
for match in matches:
print(f"Found: {match}")
Try it online
https://www.demo.com
email@domain.com
Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum