Regular Expressions (Regex) are powerful text pattern matching tools supported by virtually all modern programming languages. Whether it's data validation, text search and replace, or log analysis, regular expressions are essential skills for every developer. This guide will take you from zero to mastery of regular expressions.
Table of Contents
- Key Takeaways
- What is a Regular Expression?
- Basic Regex Syntax
- Advanced Features
- Common Regex Patterns
- Code Examples
- Regex Best Practices
- FAQ
- Summary
Key Takeaways
- Pattern Matching: Regular expressions are a language for describing string patterns, used for searching, matching, and manipulating text.
- Universal Support: Almost all programming languages support regular expressions with similar syntax.
- Powerful Capabilities: Complex text patterns can be described with concise expressions.
- Performance Considerations: Complex regex can cause performance issues; design carefully.
- Readability: Regex can be hard to read; consider adding comments or splitting complex patterns.
- Testing is Critical: Always thoroughly test regex before using in production.
Want to quickly test your regular expressions? Try our free online tool with real-time matching and support for multiple programming language syntaxes.
Test Your Regex Now - Free Online Regex Tester
What is a Regular Expression?
A Regular Expression (Regex) is a formal language for describing string patterns. It originated from mathematical theory in the 1950s, first proposed by mathematician Stephen Cole Kleene. Today, regular expressions have become the standard tool for text processing, widely used for:
- Data Validation: Validating user input (email, phone, password, etc.)
- Text Search: Finding specific patterns in large amounts of text
- Text Replacement: Batch modifying text that matches patterns
- Data Extraction: Extracting structured information from text
- Log Analysis: Parsing and analyzing log files
The core idea of regular expressions is to use special characters and rules to describe a class of strings, rather than a specific string.
Basic Regex Syntax
Character Matching
The most basic regular expressions are literal characters that match themselves:
| Pattern | Description | Example |
|---|---|---|
abc |
Matches literal string "abc" | "abc" ✓, "abcd" ✓ |
. |
Matches any single character (except newline) | "a.c" matches "abc", "a1c" |
\d |
Matches any digit [0-9] | "\d\d" matches "42" |
\D |
Matches any non-digit | "\D" matches "a" |
\w |
Matches word character [a-zA-Z0-9_] | "\w+" matches "hello_123" |
\W |
Matches non-word character | "\W" matches "@" |
\s |
Matches whitespace (space, tab, etc.) | "a\sb" matches "a b" |
\S |
Matches non-whitespace | "\S+" matches "hello" |
\\ |
Matches backslash itself | "\\" matches "\" |
Quantifiers
Quantifiers specify how many times the preceding element can occur:
| Quantifier | Description | Example |
|---|---|---|
* |
Matches 0 or more times | a* matches "", "a", "aaa" |
+ |
Matches 1 or more times | a+ matches "a", "aaa", not "" |
? |
Matches 0 or 1 time | a? matches "", "a" |
{n} |
Matches exactly n times | a{3} matches "aaa" |
{n,} |
Matches at least n times | a{2,} matches "aa", "aaa", "aaaa" |
{n,m} |
Matches n to m times | a{2,4} matches "aa", "aaa", "aaaa" |
Anchors
Anchors match positions rather than characters:
| Anchor | Description | Example |
|---|---|---|
^ |
Matches start of string | ^hello matches strings starting with "hello" |
$ |
Matches end of string | world$ matches strings ending with "world" |
\b |
Matches word boundary | \bcat\b matches "cat" but not "category" |
\B |
Matches non-word boundary | \Bcat matches "cat" in "category" |
Groups and Capturing
Groups allow you to treat multiple characters as a single unit:
| Syntax | Description | Example |
|---|---|---|
(abc) |
Capturing group, matches and remembers "abc" | (ab)+ matches "abab" |
(?:abc) |
Non-capturing group, matches but doesn't remember | (?:ab)+ matches "abab" |
\1, \2 |
Backreference to nth capturing group | (a)(b)\1\2 matches "abab" |
(?<name>abc) |
Named capturing group | (?<year>\d{4}) |
(a|b) |
Alternation, matches a or b | (cat|dog) matches "cat" or "dog" |
Character Classes
Character classes define a set of characters that can match:
| Syntax | Description | Example |
|---|---|---|
[abc] |
Matches any one of a, b, or c | [aeiou] matches vowels |
[^abc] |
Matches any character except a, b, c | [^0-9] matches non-digits |
[a-z] |
Matches any character from a to z | [A-Za-z] matches any letter |
[0-9] |
Matches any digit from 0 to 9 | Equivalent to \d |
Advanced Features
Lookaround Assertions
Lookaround assertions match positions without consuming characters:
| Syntax | Name | Description |
|---|---|---|
(?=pattern) |
Positive Lookahead | Matches position followed by pattern |
(?!pattern) |
Negative Lookahead | Matches position not followed by pattern |
(?<=pattern) |
Positive Lookbehind | Matches position preceded by pattern |
(?<!pattern) |
Negative Lookbehind | Matches position not preceded by pattern |
Examples:
# Positive lookahead: Match digits followed by "USD"
\d+(?=USD)
Input: "100USD" → Matches "100"
# Negative lookahead: Match "foo" not followed by "bar"
foo(?!bar)
Input: "foobaz" → Matches "foo"
Input: "foobar" → No match
# Positive lookbehind: Match digits preceded by "$"
(?<=\$)\d+
Input: "$100" → Matches "100"
# Negative lookbehind: Match "happy" not preceded by "un"
(?<!un)happy
Input: "happy" → Matches
Input: "unhappy" → No match
Greedy vs Non-Greedy Matching
By default, quantifiers are greedy and match as many characters as possible. Adding ? after a quantifier makes it non-greedy (lazy):
| Greedy | Non-Greedy | Description |
|---|---|---|
* |
*? |
Match 0 or more, as few as possible |
+ |
+? |
Match 1 or more, as few as possible |
? |
?? |
Match 0 or 1, as few as possible |
{n,m} |
{n,m}? |
Match n to m, as few as possible |
Example:
Input: "<div>hello</div><div>world</div>"
Greedy: <div>.*</div>
Result: "<div>hello</div><div>world</div>" (matches entire string)
Non-greedy: <div>.*?</div>
Result: "<div>hello</div>" (matches first div only)
Flags/Modifiers
Flags modify how the regex engine matches:
| Flag | Description |
|---|---|
i |
Case-insensitive matching |
g |
Global matching (find all matches) |
m |
Multiline mode (^ and $ match line start/end) |
s |
Single-line mode (. matches newlines too) |
u |
Unicode mode |
x |
Extended mode (ignore whitespace, allow comments) |
Common Regex Patterns
Email Validation
^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$
Breakdown:
^- Start of string[a-zA-Z0-9._%+-]+- Username part with letters, digits, and special chars@- At symbol[a-zA-Z0-9.-]+- Domain part\.- Dot[a-zA-Z]{2,}- TLD, at least 2 letters$- End of string
Test Cases:
- ✓
user@example.com - ✓
john.doe+tag@company.co.uk - ✗
invalid@ - ✗
@nodomain.com
Phone Number Validation
US Phone Number:
^(\+1)?[-.\s]?\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}$
International Phone (E.164 format):
^\+?[1-9]\d{1,14}$
Breakdown:
^\+?- Optional plus sign at start[1-9]- First digit cannot be zero\d{1,14}- 1 to 14 more digits$- End of string
URL Validation
^(https?:\/\/)?([\da-z\.-]+)\.([a-z\.]{2,6})([\/\w \.-]*)*\/?$
More Complete URL Validation:
^(https?|ftp):\/\/[^\s/$.?#].[^\s]*$
Breakdown:
^(https?|ftp):\/\/- Protocol part[^\s/$.?#]- First character of domain[^\s]*- Rest of URL$- End of string
IP Address Validation
IPv4 Address:
^((25[0-5]|2[0-4]\d|[01]?\d\d?)\.){3}(25[0-5]|2[0-4]\d|[01]?\d\d?)$
Breakdown:
25[0-5]- Matches 250-2552[0-4]\d- Matches 200-249[01]?\d\d?- Matches 0-199\.- Dot separator{3}- First three octets- Last octet without trailing dot
IPv6 Address:
^([0-9a-fA-F]{1,4}:){7}[0-9a-fA-F]{1,4}$
Password Strength Validation
At least 8 characters with uppercase, lowercase, and digit:
^(?=.*[a-z])(?=.*[A-Z])(?=.*\d)[a-zA-Z\d]{8,}$
Stronger Password (with special characters):
^(?=.*[a-z])(?=.*[A-Z])(?=.*\d)(?=.*[@$!%*?&])[A-Za-z\d@$!%*?&]{8,}$
Breakdown:
(?=.*[a-z])- At least one lowercase letter(?=.*[A-Z])- At least one uppercase letter(?=.*\d)- At least one digit(?=.*[@$!%*?&])- At least one special character{8,}- At least 8 characters
Credit Card Validation
Visa:
^4[0-9]{12}(?:[0-9]{3})?$
Mastercard:
^5[1-5][0-9]{14}$
General Credit Card (Luhn algorithm validation needed separately):
^(?:4[0-9]{12}(?:[0-9]{3})?|5[1-5][0-9]{14}|3[47][0-9]{13}|6(?:011|5[0-9]{2})[0-9]{12})$
Code Examples
JavaScript
// Basic matching
const emailRegex = /^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$/;
const email = "user@example.com";
console.log(emailRegex.test(email)); // true
// Using match to extract matches
const text = "Contact: 123-456-7890 or 098-765-4321";
const phoneRegex = /\d{3}-\d{3}-\d{4}/g;
const phones = text.match(phoneRegex);
console.log(phones); // ["123-456-7890", "098-765-4321"]
// Using capturing groups
const urlRegex = /^(https?):\/\/([^\/]+)(\/.*)?$/;
const url = "https://example.com/path/to/page";
const match = url.match(urlRegex);
if (match) {
console.log("Protocol:", match[1]); // "https"
console.log("Domain:", match[2]); // "example.com"
console.log("Path:", match[3]); // "/path/to/page"
}
// Using named capturing groups
const dateRegex = /(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})/;
const dateMatch = "2026-01-12".match(dateRegex);
console.log(dateMatch.groups.year); // "2026"
console.log(dateMatch.groups.month); // "01"
console.log(dateMatch.groups.day); // "12"
// Replacement operations
const masked = "1234567890".replace(/(\d{3})\d{4}(\d{3})/, "$1****$2");
console.log(masked); // "123****890"
// Using exec for iterative matching
const regex = /\d+/g;
const str = "Price: $100, Quantity: 50";
let result;
while ((result = regex.exec(str)) !== null) {
console.log(`Found ${result[0]} at position ${result.index}`);
}
// Output:
// Found 100 at position 8
// Found 50 at position 22
Python
import re
# Basic matching
email_pattern = r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$'
email = "user@example.com"
if re.match(email_pattern, email):
print("Email is valid")
# Find all matches
text = "Contact: 123-456-7890 or 098-765-4321"
phones = re.findall(r'\d{3}-\d{3}-\d{4}', text)
print(phones) # ['123-456-7890', '098-765-4321']
# Using capturing groups
url_pattern = r'^(https?):\/\/([^\/]+)(\/.*)?$'
url = "https://example.com/path/to/page"
match = re.match(url_pattern, url)
if match:
print(f"Protocol: {match.group(1)}") # https
print(f"Domain: {match.group(2)}") # example.com
print(f"Path: {match.group(3)}") # /path/to/page
# Using named capturing groups
date_pattern = r'(?P<year>\d{4})-(?P<month>\d{2})-(?P<day>\d{2})'
date_match = re.match(date_pattern, "2026-01-12")
if date_match:
print(date_match.group('year')) # 2026
print(date_match.group('month')) # 01
print(date_match.group('day')) # 12
# Replacement operations
phone = "1234567890"
masked = re.sub(r'(\d{3})\d{4}(\d{3})', r'\1****\2', phone)
print(masked) # 123****890
# Compile regex for better performance
pattern = re.compile(r'\d+')
numbers = pattern.findall("Price: $100, Quantity: 50")
print(numbers) # ['100', '50']
# Using finditer to get match objects
for match in re.finditer(r'\d+', "Price: $100, Quantity: 50"):
print(f"Found {match.group()} at position {match.start()}-{match.end()}")
Java
import java.util.regex.*;
import java.util.ArrayList;
import java.util.List;
public class RegexExample {
public static void main(String[] args) {
// Basic matching
String emailPattern = "^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}$";
String email = "user@example.com";
boolean isValid = email.matches(emailPattern);
System.out.println("Email valid: " + isValid);
// Find all matches
String text = "Contact: 123-456-7890 or 098-765-4321";
Pattern phonePattern = Pattern.compile("\\d{3}-\\d{3}-\\d{4}");
Matcher matcher = phonePattern.matcher(text);
List<String> phones = new ArrayList<>();
while (matcher.find()) {
phones.add(matcher.group());
}
System.out.println(phones); // [123-456-7890, 098-765-4321]
// Using capturing groups
String urlPattern = "^(https?)://([^/]+)(/.*)?$";
String url = "https://example.com/path/to/page";
Pattern pattern = Pattern.compile(urlPattern);
Matcher urlMatcher = pattern.matcher(url);
if (urlMatcher.matches()) {
System.out.println("Protocol: " + urlMatcher.group(1)); // https
System.out.println("Domain: " + urlMatcher.group(2)); // example.com
System.out.println("Path: " + urlMatcher.group(3)); // /path/to/page
}
// Using named capturing groups (Java 7+)
String datePattern = "(?<year>\\d{4})-(?<month>\\d{2})-(?<day>\\d{2})";
Pattern dateRegex = Pattern.compile(datePattern);
Matcher dateMatcher = dateRegex.matcher("2026-01-12");
if (dateMatcher.matches()) {
System.out.println("Year: " + dateMatcher.group("year"));
System.out.println("Month: " + dateMatcher.group("month"));
System.out.println("Day: " + dateMatcher.group("day"));
}
// Replacement operations
String phone = "1234567890";
String masked = phone.replaceAll("(\\d{3})\\d{4}(\\d{3})", "$1****$2");
System.out.println(masked); // 123****890
}
}
Go
package main
import (
"fmt"
"regexp"
)
func main() {
// Basic matching
emailPattern := `^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$`
emailRegex := regexp.MustCompile(emailPattern)
email := "user@example.com"
fmt.Println("Email valid:", emailRegex.MatchString(email))
// Find all matches
text := "Contact: 123-456-7890 or 098-765-4321"
phoneRegex := regexp.MustCompile(`\d{3}-\d{3}-\d{4}`)
phones := phoneRegex.FindAllString(text, -1)
fmt.Println(phones) // [123-456-7890 098-765-4321]
// Using capturing groups
urlPattern := `^(https?)://([^/]+)(/.*)?$`
urlRegex := regexp.MustCompile(urlPattern)
url := "https://example.com/path/to/page"
matches := urlRegex.FindStringSubmatch(url)
if len(matches) > 0 {
fmt.Println("Protocol:", matches[1]) // https
fmt.Println("Domain:", matches[2]) // example.com
fmt.Println("Path:", matches[3]) // /path/to/page
}
// Using named capturing groups
datePattern := `(?P<year>\d{4})-(?P<month>\d{2})-(?P<day>\d{2})`
dateRegex := regexp.MustCompile(datePattern)
dateMatch := dateRegex.FindStringSubmatch("2026-01-12")
names := dateRegex.SubexpNames()
for i, name := range names {
if name != "" && i < len(dateMatch) {
fmt.Printf("%s: %s\n", name, dateMatch[i])
}
}
// Replacement operations
phone := "1234567890"
replaceRegex := regexp.MustCompile(`(\d{3})\d{4}(\d{3})`)
masked := replaceRegex.ReplaceAllString(phone, "$1****$2")
fmt.Println(masked) // 123****890
// Using ReplaceAllStringFunc for complex replacements
text2 := "Price $100"
numRegex := regexp.MustCompile(`\d+`)
result := numRegex.ReplaceAllStringFunc(text2, func(s string) string {
return "[" + s + "]"
})
fmt.Println(result) // Price $[100]
}
Regex Best Practices
1. Keep It Simple
Complex regex is hard to maintain and debug. If possible, split complex patterns into multiple simple ones:
// Not recommended: One complex regex
const complexRegex = /^(?=.*[a-z])(?=.*[A-Z])(?=.*\d)(?=.*[@$!%*?&])[A-Za-z\d@$!%*?&]{8,}$/;
// Recommended: Multiple simple checks
function validatePassword(password) {
if (password.length < 8) return false;
if (!/[a-z]/.test(password)) return false;
if (!/[A-Z]/.test(password)) return false;
if (!/\d/.test(password)) return false;
if (!/[@$!%*?&]/.test(password)) return false;
return true;
}
2. Use Non-Capturing Groups
If you don't need to capture the match, use non-capturing groups (?:...) for better performance:
// Capturing group (saves match result)
/(cat|dog) food/
// Non-capturing group (doesn't save, more efficient)
/(?:cat|dog) food/
3. Avoid Catastrophic Backtracking
Certain regex patterns can cause exponential backtracking and severe performance issues:
// Dangerous: Can cause catastrophic backtracking
/(a+)+$/
// Safe: Use atomic groups or more precise patterns
/a+$/
4. Pre-compile Regular Expressions
When using regex in loops, pre-compile for better performance:
import re
# Not recommended: Compiles every iteration
for line in lines:
if re.match(r'\d+', line):
process(line)
# Recommended: Pre-compile
pattern = re.compile(r'\d+')
for line in lines:
if pattern.match(line):
process(line)
5. Use Anchors
When you know the match position, use anchors to improve performance:
// Not recommended: Searches entire string
/hello/
// Recommended: If you know it's at the start
/^hello/
6. Test Edge Cases
Always test various edge cases before production use:
- Empty strings
- Very long strings
- Special characters
- Unicode characters
- Newline characters
FAQ
What's the difference between regex and wildcards?
Wildcards (like * and ?) are simplified pattern matching, mainly used for filename matching. Regular expressions are more powerful, supporting complex pattern descriptions, capturing groups, assertions, and other advanced features.
| Feature | Wildcards | Regular Expressions |
|---|---|---|
* |
Matches any characters | Matches preceding char 0+ times |
? |
Matches single character | Matches preceding char 0 or 1 time |
| Complexity | Simple | Powerful but complex |
| Use Case | Filename matching | Text processing, validation |
How do I debug complex regular expressions?
- Use online tools: Like our Regex Tester to see matches in real-time
- Build incrementally: Start with simple patterns and add complexity gradually
- Add comments: Use extended mode (x flag) to add comments
- Use visualization tools: Convert regex to visual diagrams
How can I optimize regex performance?
- Use anchors to limit search scope
- Avoid unnecessary capturing groups
- Use non-greedy matching when appropriate
- Pre-compile regular expressions
- Avoid nested quantifiers (like
(a+)+) - Use more specific character classes
What are the differences between regex in different programming languages?
Most programming languages use similar regex syntax (PCRE-style), but there are subtle differences:
| Feature | JavaScript | Python | Java | Go |
|---|---|---|---|---|
| Lookbehind | ✓ (ES2018+) | ✓ | ✓ | ✗ |
| Named Groups | (?<name>) |
(?P<name>) |
(?<name>) |
(?P<name>) |
| Unicode | Needs u flag | Default | Default | Default |
| Atomic Groups | ✗ | ✗ | ✓ | ✗ |
How can I test regex without writing code?
You can use online tools like our free Regex Tester to:
- Test regular expressions in real-time
- View matches and capturing groups
- Get code examples in multiple programming languages
- Save and share your regular expressions
Summary
Regular expressions are a core skill every developer should master. While the learning curve may be steep, once mastered, they will greatly improve your text processing efficiency.
Quick Summary:
- Start with basic syntax: character matching, quantifiers, anchors
- Master groups and capturing groups
- Learn advanced features like lookaround assertions
- Memorize common patterns (email, phone, URL, etc.)
- Follow performance optimization best practices
- Practice and test frequently
Ready to test your regular expressions? Try our free online tool: