Regular Expressions (Regex) are powerful text pattern matching tools supported by virtually all modern programming languages. Whether it's data validation, text search and replace, or log analysis, regular expressions are essential skills for every developer. This guide will take you from zero to mastery of regular expressions.

Table of Contents

Key Takeaways

  • Pattern Matching: Regular expressions are a language for describing string patterns, used for searching, matching, and manipulating text.
  • Universal Support: Almost all programming languages support regular expressions with similar syntax.
  • Powerful Capabilities: Complex text patterns can be described with concise expressions.
  • Performance Considerations: Complex regex can cause performance issues; design carefully.
  • Readability: Regex can be hard to read; consider adding comments or splitting complex patterns.
  • Testing is Critical: Always thoroughly test regex before using in production.

Want to quickly test your regular expressions? Try our free online tool with real-time matching and support for multiple programming language syntaxes.

Test Your Regex Now - Free Online Regex Tester

What is a Regular Expression?

A Regular Expression (Regex) is a formal language for describing string patterns. It originated from mathematical theory in the 1950s, first proposed by mathematician Stephen Cole Kleene. Today, regular expressions have become the standard tool for text processing, widely used for:

  1. Data Validation: Validating user input (email, phone, password, etc.)
  2. Text Search: Finding specific patterns in large amounts of text
  3. Text Replacement: Batch modifying text that matches patterns
  4. Data Extraction: Extracting structured information from text
  5. Log Analysis: Parsing and analyzing log files

The core idea of regular expressions is to use special characters and rules to describe a class of strings, rather than a specific string.

Basic Regex Syntax

Character Matching

The most basic regular expressions are literal characters that match themselves:

Pattern Description Example
abc Matches literal string "abc" "abc" ✓, "abcd" ✓
. Matches any single character (except newline) "a.c" matches "abc", "a1c"
\d Matches any digit [0-9] "\d\d" matches "42"
\D Matches any non-digit "\D" matches "a"
\w Matches word character [a-zA-Z0-9_] "\w+" matches "hello_123"
\W Matches non-word character "\W" matches "@"
\s Matches whitespace (space, tab, etc.) "a\sb" matches "a b"
\S Matches non-whitespace "\S+" matches "hello"
\\ Matches backslash itself "\\" matches "\"

Quantifiers

Quantifiers specify how many times the preceding element can occur:

Quantifier Description Example
* Matches 0 or more times a* matches "", "a", "aaa"
+ Matches 1 or more times a+ matches "a", "aaa", not ""
? Matches 0 or 1 time a? matches "", "a"
{n} Matches exactly n times a{3} matches "aaa"
{n,} Matches at least n times a{2,} matches "aa", "aaa", "aaaa"
{n,m} Matches n to m times a{2,4} matches "aa", "aaa", "aaaa"

Anchors

Anchors match positions rather than characters:

Anchor Description Example
^ Matches start of string ^hello matches strings starting with "hello"
$ Matches end of string world$ matches strings ending with "world"
\b Matches word boundary \bcat\b matches "cat" but not "category"
\B Matches non-word boundary \Bcat matches "cat" in "category"

Groups and Capturing

Groups allow you to treat multiple characters as a single unit:

Syntax Description Example
(abc) Capturing group, matches and remembers "abc" (ab)+ matches "abab"
(?:abc) Non-capturing group, matches but doesn't remember (?:ab)+ matches "abab"
\1, \2 Backreference to nth capturing group (a)(b)\1\2 matches "abab"
(?<name>abc) Named capturing group (?<year>\d{4})
(a|b) Alternation, matches a or b (cat|dog) matches "cat" or "dog"

Character Classes

Character classes define a set of characters that can match:

Syntax Description Example
[abc] Matches any one of a, b, or c [aeiou] matches vowels
[^abc] Matches any character except a, b, c [^0-9] matches non-digits
[a-z] Matches any character from a to z [A-Za-z] matches any letter
[0-9] Matches any digit from 0 to 9 Equivalent to \d

Advanced Features

Lookaround Assertions

Lookaround assertions match positions without consuming characters:

Syntax Name Description
(?=pattern) Positive Lookahead Matches position followed by pattern
(?!pattern) Negative Lookahead Matches position not followed by pattern
(?<=pattern) Positive Lookbehind Matches position preceded by pattern
(?<!pattern) Negative Lookbehind Matches position not preceded by pattern

Examples:

# Positive lookahead: Match digits followed by "USD"
\d+(?=USD)
Input: "100USD" → Matches "100"

# Negative lookahead: Match "foo" not followed by "bar"
foo(?!bar)
Input: "foobaz" → Matches "foo"
Input: "foobar" → No match

# Positive lookbehind: Match digits preceded by "$"
(?<=\$)\d+
Input: "$100" → Matches "100"

# Negative lookbehind: Match "happy" not preceded by "un"
(?<!un)happy
Input: "happy" → Matches
Input: "unhappy" → No match

Greedy vs Non-Greedy Matching

By default, quantifiers are greedy and match as many characters as possible. Adding ? after a quantifier makes it non-greedy (lazy):

Greedy Non-Greedy Description
* *? Match 0 or more, as few as possible
+ +? Match 1 or more, as few as possible
? ?? Match 0 or 1, as few as possible
{n,m} {n,m}? Match n to m, as few as possible

Example:

Input: "<div>hello</div><div>world</div>"

Greedy: <div>.*</div>
Result: "<div>hello</div><div>world</div>" (matches entire string)

Non-greedy: <div>.*?</div>
Result: "<div>hello</div>" (matches first div only)

Flags/Modifiers

Flags modify how the regex engine matches:

Flag Description
i Case-insensitive matching
g Global matching (find all matches)
m Multiline mode (^ and $ match line start/end)
s Single-line mode (. matches newlines too)
u Unicode mode
x Extended mode (ignore whitespace, allow comments)

Common Regex Patterns

Email Validation

^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$

Breakdown:

  • ^ - Start of string
  • [a-zA-Z0-9._%+-]+ - Username part with letters, digits, and special chars
  • @ - At symbol
  • [a-zA-Z0-9.-]+ - Domain part
  • \. - Dot
  • [a-zA-Z]{2,} - TLD, at least 2 letters
  • $ - End of string

Test Cases:

  • user@example.com
  • john.doe+tag@company.co.uk
  • invalid@
  • @nodomain.com

Phone Number Validation

US Phone Number:

^(\+1)?[-.\s]?\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}$

International Phone (E.164 format):

^\+?[1-9]\d{1,14}$

Breakdown:

  • ^\+? - Optional plus sign at start
  • [1-9] - First digit cannot be zero
  • \d{1,14} - 1 to 14 more digits
  • $ - End of string

URL Validation

^(https?:\/\/)?([\da-z\.-]+)\.([a-z\.]{2,6})([\/\w \.-]*)*\/?$

More Complete URL Validation:

^(https?|ftp):\/\/[^\s/$.?#].[^\s]*$

Breakdown:

  • ^(https?|ftp):\/\/ - Protocol part
  • [^\s/$.?#] - First character of domain
  • [^\s]* - Rest of URL
  • $ - End of string

IP Address Validation

IPv4 Address:

^((25[0-5]|2[0-4]\d|[01]?\d\d?)\.){3}(25[0-5]|2[0-4]\d|[01]?\d\d?)$

Breakdown:

  • 25[0-5] - Matches 250-255
  • 2[0-4]\d - Matches 200-249
  • [01]?\d\d? - Matches 0-199
  • \. - Dot separator
  • {3} - First three octets
  • Last octet without trailing dot

IPv6 Address:

^([0-9a-fA-F]{1,4}:){7}[0-9a-fA-F]{1,4}$

Password Strength Validation

At least 8 characters with uppercase, lowercase, and digit:

^(?=.*[a-z])(?=.*[A-Z])(?=.*\d)[a-zA-Z\d]{8,}$

Stronger Password (with special characters):

^(?=.*[a-z])(?=.*[A-Z])(?=.*\d)(?=.*[@$!%*?&])[A-Za-z\d@$!%*?&]{8,}$

Breakdown:

  • (?=.*[a-z]) - At least one lowercase letter
  • (?=.*[A-Z]) - At least one uppercase letter
  • (?=.*\d) - At least one digit
  • (?=.*[@$!%*?&]) - At least one special character
  • {8,} - At least 8 characters

Credit Card Validation

Visa:

^4[0-9]{12}(?:[0-9]{3})?$

Mastercard:

^5[1-5][0-9]{14}$

General Credit Card (Luhn algorithm validation needed separately):

^(?:4[0-9]{12}(?:[0-9]{3})?|5[1-5][0-9]{14}|3[47][0-9]{13}|6(?:011|5[0-9]{2})[0-9]{12})$

Code Examples

JavaScript

// Basic matching
const emailRegex = /^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$/;
const email = "user@example.com";
console.log(emailRegex.test(email)); // true

// Using match to extract matches
const text = "Contact: 123-456-7890 or 098-765-4321";
const phoneRegex = /\d{3}-\d{3}-\d{4}/g;
const phones = text.match(phoneRegex);
console.log(phones); // ["123-456-7890", "098-765-4321"]

// Using capturing groups
const urlRegex = /^(https?):\/\/([^\/]+)(\/.*)?$/;
const url = "https://example.com/path/to/page";
const match = url.match(urlRegex);
if (match) {
  console.log("Protocol:", match[1]); // "https"
  console.log("Domain:", match[2]);   // "example.com"
  console.log("Path:", match[3]);     // "/path/to/page"
}

// Using named capturing groups
const dateRegex = /(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})/;
const dateMatch = "2026-01-12".match(dateRegex);
console.log(dateMatch.groups.year);  // "2026"
console.log(dateMatch.groups.month); // "01"
console.log(dateMatch.groups.day);   // "12"

// Replacement operations
const masked = "1234567890".replace(/(\d{3})\d{4}(\d{3})/, "$1****$2");
console.log(masked); // "123****890"

// Using exec for iterative matching
const regex = /\d+/g;
const str = "Price: $100, Quantity: 50";
let result;
while ((result = regex.exec(str)) !== null) {
  console.log(`Found ${result[0]} at position ${result.index}`);
}
// Output:
// Found 100 at position 8
// Found 50 at position 22

Python

import re

# Basic matching
email_pattern = r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$'
email = "user@example.com"
if re.match(email_pattern, email):
    print("Email is valid")

# Find all matches
text = "Contact: 123-456-7890 or 098-765-4321"
phones = re.findall(r'\d{3}-\d{3}-\d{4}', text)
print(phones)  # ['123-456-7890', '098-765-4321']

# Using capturing groups
url_pattern = r'^(https?):\/\/([^\/]+)(\/.*)?$'
url = "https://example.com/path/to/page"
match = re.match(url_pattern, url)
if match:
    print(f"Protocol: {match.group(1)}")  # https
    print(f"Domain: {match.group(2)}")    # example.com
    print(f"Path: {match.group(3)}")      # /path/to/page

# Using named capturing groups
date_pattern = r'(?P<year>\d{4})-(?P<month>\d{2})-(?P<day>\d{2})'
date_match = re.match(date_pattern, "2026-01-12")
if date_match:
    print(date_match.group('year'))   # 2026
    print(date_match.group('month'))  # 01
    print(date_match.group('day'))    # 12

# Replacement operations
phone = "1234567890"
masked = re.sub(r'(\d{3})\d{4}(\d{3})', r'\1****\2', phone)
print(masked)  # 123****890

# Compile regex for better performance
pattern = re.compile(r'\d+')
numbers = pattern.findall("Price: $100, Quantity: 50")
print(numbers)  # ['100', '50']

# Using finditer to get match objects
for match in re.finditer(r'\d+', "Price: $100, Quantity: 50"):
    print(f"Found {match.group()} at position {match.start()}-{match.end()}")

Java

import java.util.regex.*;
import java.util.ArrayList;
import java.util.List;

public class RegexExample {
    public static void main(String[] args) {
        // Basic matching
        String emailPattern = "^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}$";
        String email = "user@example.com";
        boolean isValid = email.matches(emailPattern);
        System.out.println("Email valid: " + isValid);

        // Find all matches
        String text = "Contact: 123-456-7890 or 098-765-4321";
        Pattern phonePattern = Pattern.compile("\\d{3}-\\d{3}-\\d{4}");
        Matcher matcher = phonePattern.matcher(text);
        List<String> phones = new ArrayList<>();
        while (matcher.find()) {
            phones.add(matcher.group());
        }
        System.out.println(phones); // [123-456-7890, 098-765-4321]

        // Using capturing groups
        String urlPattern = "^(https?)://([^/]+)(/.*)?$";
        String url = "https://example.com/path/to/page";
        Pattern pattern = Pattern.compile(urlPattern);
        Matcher urlMatcher = pattern.matcher(url);
        if (urlMatcher.matches()) {
            System.out.println("Protocol: " + urlMatcher.group(1)); // https
            System.out.println("Domain: " + urlMatcher.group(2));   // example.com
            System.out.println("Path: " + urlMatcher.group(3));     // /path/to/page
        }

        // Using named capturing groups (Java 7+)
        String datePattern = "(?<year>\\d{4})-(?<month>\\d{2})-(?<day>\\d{2})";
        Pattern dateRegex = Pattern.compile(datePattern);
        Matcher dateMatcher = dateRegex.matcher("2026-01-12");
        if (dateMatcher.matches()) {
            System.out.println("Year: " + dateMatcher.group("year"));
            System.out.println("Month: " + dateMatcher.group("month"));
            System.out.println("Day: " + dateMatcher.group("day"));
        }

        // Replacement operations
        String phone = "1234567890";
        String masked = phone.replaceAll("(\\d{3})\\d{4}(\\d{3})", "$1****$2");
        System.out.println(masked); // 123****890
    }
}

Go

package main

import (
    "fmt"
    "regexp"
)

func main() {
    // Basic matching
    emailPattern := `^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$`
    emailRegex := regexp.MustCompile(emailPattern)
    email := "user@example.com"
    fmt.Println("Email valid:", emailRegex.MatchString(email))

    // Find all matches
    text := "Contact: 123-456-7890 or 098-765-4321"
    phoneRegex := regexp.MustCompile(`\d{3}-\d{3}-\d{4}`)
    phones := phoneRegex.FindAllString(text, -1)
    fmt.Println(phones) // [123-456-7890 098-765-4321]

    // Using capturing groups
    urlPattern := `^(https?)://([^/]+)(/.*)?$`
    urlRegex := regexp.MustCompile(urlPattern)
    url := "https://example.com/path/to/page"
    matches := urlRegex.FindStringSubmatch(url)
    if len(matches) > 0 {
        fmt.Println("Protocol:", matches[1]) // https
        fmt.Println("Domain:", matches[2])   // example.com
        fmt.Println("Path:", matches[3])     // /path/to/page
    }

    // Using named capturing groups
    datePattern := `(?P<year>\d{4})-(?P<month>\d{2})-(?P<day>\d{2})`
    dateRegex := regexp.MustCompile(datePattern)
    dateMatch := dateRegex.FindStringSubmatch("2026-01-12")
    names := dateRegex.SubexpNames()
    for i, name := range names {
        if name != "" && i < len(dateMatch) {
            fmt.Printf("%s: %s\n", name, dateMatch[i])
        }
    }

    // Replacement operations
    phone := "1234567890"
    replaceRegex := regexp.MustCompile(`(\d{3})\d{4}(\d{3})`)
    masked := replaceRegex.ReplaceAllString(phone, "$1****$2")
    fmt.Println(masked) // 123****890

    // Using ReplaceAllStringFunc for complex replacements
    text2 := "Price $100"
    numRegex := regexp.MustCompile(`\d+`)
    result := numRegex.ReplaceAllStringFunc(text2, func(s string) string {
        return "[" + s + "]"
    })
    fmt.Println(result) // Price $[100]
}

Regex Best Practices

1. Keep It Simple

Complex regex is hard to maintain and debug. If possible, split complex patterns into multiple simple ones:

// Not recommended: One complex regex
const complexRegex = /^(?=.*[a-z])(?=.*[A-Z])(?=.*\d)(?=.*[@$!%*?&])[A-Za-z\d@$!%*?&]{8,}$/;

// Recommended: Multiple simple checks
function validatePassword(password) {
  if (password.length < 8) return false;
  if (!/[a-z]/.test(password)) return false;
  if (!/[A-Z]/.test(password)) return false;
  if (!/\d/.test(password)) return false;
  if (!/[@$!%*?&]/.test(password)) return false;
  return true;
}

2. Use Non-Capturing Groups

If you don't need to capture the match, use non-capturing groups (?:...) for better performance:

// Capturing group (saves match result)
/(cat|dog) food/

// Non-capturing group (doesn't save, more efficient)
/(?:cat|dog) food/

3. Avoid Catastrophic Backtracking

Certain regex patterns can cause exponential backtracking and severe performance issues:

// Dangerous: Can cause catastrophic backtracking
/(a+)+$/

// Safe: Use atomic groups or more precise patterns
/a+$/

4. Pre-compile Regular Expressions

When using regex in loops, pre-compile for better performance:

import re

# Not recommended: Compiles every iteration
for line in lines:
    if re.match(r'\d+', line):
        process(line)

# Recommended: Pre-compile
pattern = re.compile(r'\d+')
for line in lines:
    if pattern.match(line):
        process(line)

5. Use Anchors

When you know the match position, use anchors to improve performance:

// Not recommended: Searches entire string
/hello/

// Recommended: If you know it's at the start
/^hello/

6. Test Edge Cases

Always test various edge cases before production use:

  • Empty strings
  • Very long strings
  • Special characters
  • Unicode characters
  • Newline characters

FAQ

What's the difference between regex and wildcards?

Wildcards (like * and ?) are simplified pattern matching, mainly used for filename matching. Regular expressions are more powerful, supporting complex pattern descriptions, capturing groups, assertions, and other advanced features.

Feature Wildcards Regular Expressions
* Matches any characters Matches preceding char 0+ times
? Matches single character Matches preceding char 0 or 1 time
Complexity Simple Powerful but complex
Use Case Filename matching Text processing, validation

How do I debug complex regular expressions?

  1. Use online tools: Like our Regex Tester to see matches in real-time
  2. Build incrementally: Start with simple patterns and add complexity gradually
  3. Add comments: Use extended mode (x flag) to add comments
  4. Use visualization tools: Convert regex to visual diagrams

How can I optimize regex performance?

  1. Use anchors to limit search scope
  2. Avoid unnecessary capturing groups
  3. Use non-greedy matching when appropriate
  4. Pre-compile regular expressions
  5. Avoid nested quantifiers (like (a+)+)
  6. Use more specific character classes

What are the differences between regex in different programming languages?

Most programming languages use similar regex syntax (PCRE-style), but there are subtle differences:

Feature JavaScript Python Java Go
Lookbehind ✓ (ES2018+)
Named Groups (?<name>) (?P<name>) (?<name>) (?P<name>)
Unicode Needs u flag Default Default Default
Atomic Groups

How can I test regex without writing code?

You can use online tools like our free Regex Tester to:

  • Test regular expressions in real-time
  • View matches and capturing groups
  • Get code examples in multiple programming languages
  • Save and share your regular expressions

Summary

Regular expressions are a core skill every developer should master. While the learning curve may be steep, once mastered, they will greatly improve your text processing efficiency.

Quick Summary:

  • Start with basic syntax: character matching, quantifiers, anchors
  • Master groups and capturing groups
  • Learn advanced features like lookaround assertions
  • Memorize common patterns (email, phone, URL, etc.)
  • Follow performance optimization best practices
  • Practice and test frequently

Ready to test your regular expressions? Try our free online tool:

Test Your Regex Now - Free Online Regex Tester