Dev Tools

Regex Syntax: A Complete Guide to JavaScript Regular Expressions

Introduction: Why Regex Is Worth Learning

Regular expressions look intimidating — a dense run of slashes, brackets, and backslashes — but they are one of the highest-leverage skills a developer can build. A single well-formed pattern can replace dozens of lines of string-scanning code, validate user input, extract fields from a log line, or rewrite text in one pass. This guide builds JavaScript regex from first principles, with patterns you can paste straight into our Regex Tester to watch them work.

Everything here uses the JavaScript (ECMAScript) flavor, the engine in every browser and in Node.js. If you have only ever copied regex from forums, this article will turn that guesswork into understanding.

The Building Blocks: Literals and Metacharacters

At its simplest, a regex matches literal text. The pattern cat matches the substring "cat" anywhere it appears. What makes regex powerful is metacharacters — symbols with special meaning. The dot . matches any single character except a newline. The characters ^ and $ anchor to the start and end of the input. To match a literal metacharacter, escape it with a backslash: \. matches a real dot, \$ a real dollar sign.

Forgetting to escape is the single most common beginner mistake. The pattern 3.14 does not just match "3.14" — the unescaped dot also matches "3x14" or "3914". Write 3\.14 when you mean a literal point.

Character Classes

Square brackets define a set of characters, any one of which may match. [aeiou] matches a single vowel. Ranges shorten this: [a-z] matches any lowercase letter, [0-9] any digit, [A-Za-z0-9_] a typical identifier character. A caret at the start negates the set: [^0-9] matches any character that is not a digit.

Common classes have shorthand escapes: \d for a digit, \w for a word character (letters, digits, underscore), and \s for whitespace. Their uppercase forms negate them, so \D is any non-digit. These shorthands keep patterns readable: \d{4}-\d{2}-\d{2} describes an ISO date far more clearly than spelling out every range.

Quantifiers: How Many Times

Quantifiers control repetition. * means zero or more, + means one or more, and ? means zero or one (optional). Braces give exact counts: {3} means exactly three, {2,5} between two and five, and {2,} two or more.

By default quantifiers are greedy: they consume as much as possible, then backtrack if the rest of the pattern fails. Appending ? makes them lazy, matching as little as possible. The difference is dramatic. Against the text <a><b>, the greedy <.*> matches the entire string in one match, while the lazy <.*?> matches each tag separately. Paste both into the Regex Tester and watch the highlighting change — it is the fastest way to internalize the concept.

Groups and Captures

Parentheses group part of a pattern and, by default, capture what they match. In (\d{4})-(\d{2}), group 1 captures the year and group 2 the month. You reference captures in replacements as $1 and $2. Named groups make this self-documenting: (?<year>\d{4})-(?<month>\d{2}) lets you refer to $<year> instead of a number.

When you only need grouping — for a quantifier or alternation — but do not want a capture slot, use a non-capturing group (?:…). For example, (?:ab)+ matches one or more "ab" sequences without creating a numbered group. This keeps your capture numbering clean in complex patterns.

Alternation

The pipe | means "or". The pattern cat|dog|bird matches any of the three words. Combine it with groups to scope the alternation: gr(a|e)y matches both "gray" and "grey". Be careful with precedence — alternation has very low priority, so ^cat|dog$ means "starts with cat" or "ends with dog", not "cat or dog on its own line". Wrap with a group when in doubt: ^(cat|dog)$.

Anchors and Boundaries

Anchors match positions, not characters. ^ is the start of the string (or line, with the m flag), $ the end. The word boundary \b matches the position between a word character and a non-word character, which is invaluable for whole-word matching. \bcat\b matches "cat" in "the cat sat" but not in "category". Its negation \B matches positions that are not word boundaries.

Lookaround: Match Without Consuming

Lookahead and lookbehind assert that something does or does not appear, without including it in the match. A positive lookahead (?=…) requires what follows; a negative lookahead (?!…) forbids it. For example, \d+(?= dollars) matches the number in "50 dollars" but not the word. Lookbehind works the same in reverse: (?<=\$)\d+ matches the digits in "$50" without capturing the dollar sign. Modern JavaScript engines support both, which you can confirm live in the Regex Tester.

The Flags That Change Everything

Flags modify how the whole pattern behaves. The global flag g finds every match instead of the first. The ignore-case flag i makes matching case-insensitive. The multiline flag m makes anchors match per line. The dotall flag s lets the dot match newlines. The unicode flag u enables full code-point handling and property escapes like \p{L}. The sticky flag y anchors each match to the previous one's end, which tokenizers rely on. Toggling these one at a time in the tester is the clearest way to feel their effect.

Substitution: Find and Replace with Power

Regex shines in search-and-replace. In JavaScript, str.replace(/(\w+)@(\w+)/g, '$2 at $1') swaps the two halves of a simple address. Replacement strings understand $1 for numbered groups, $<name> for named groups, $& for the entire match, and $$ for a literal dollar. The Regex Tester's Substitution panel previews exactly how these tokens expand, so you can refine a transformation before committing it to code.

Common Pitfalls and How to Avoid Them

  • Catastrophic backtracking. Nested quantifiers like (a+)+$ can take exponential time on certain inputs, freezing your program. Prefer specific patterns, and avoid overlapping repetition.
  • Reusing a global regex. A /g object carries its lastIndex between calls. Reusing the same object across loop iterations can skip matches. Create a fresh regex or reset lastIndex.
  • Validating complex formats with one pattern. Email and HTML are famously hard to validate fully with regex. Use a pattern for a quick shape check, then verify semantically.
  • Assuming the dot matches newlines. It does not without the s flag, which trips up multiline parsing.
  • Unescaped special characters. Dots, plus signs, and parentheses in your literal text must be escaped or they change the pattern's meaning.

When Not to Use Regex

Regular expressions match flat, lexical patterns brilliantly, but they cannot parse arbitrarily nested structures. Balanced parentheses, full HTML documents, and JSON are not regular languages — they need a real parser. If you find yourself writing a monstrous pattern to handle nesting, stop and reach for the appropriate tool: a JSON parser, an HTML parser, or a proper grammar. Knowing this boundary is itself a mark of regex fluency.

Worked Examples You Can Test Right Now

Theory sticks when you apply it. Here are five small problems with patterns to paste into the Regex Tester, each demonstrating a concept from above.

1. Extract every hashtag from a post. Pattern: #\w+ with the g flag. Against "Loving #regex and #javascript today" it highlights "#regex" and "#javascript". Add the i flag if you want case-insensitive tags.

2. Pull the year, month, and day from ISO dates. Pattern: (?<y>\d{4})-(?<m>\d{2})-(?<d>\d{2}). The named-group columns in the match table show each component separately, which is far clearer than counting positional groups.

3. Match a price but not the currency word. Pattern: \d+(?:\.\d{2})?(?= USD). The lookahead requires " USD" to follow without including it in the match, so "49.99 USD" yields just "49.99".

4. Find duplicated words. Pattern: \b(\w+)\s+\1\b with the i and g flags. The backreference \1 matches the same text the first group captured, catching "the the" in a sentence — a classic proofreading aid.

5. Normalize whitespace. In the Substitution panel, set the pattern to \s+ with the g flag and the replacement to a single space. Any run of spaces, tabs, or newlines collapses to one space, a common cleanup step before further processing.

Each of these is intentionally small. Real patterns grow by composing exactly these pieces — character classes, quantifiers, groups, anchors, and lookaround — one constraint at a time, verifying at every step.

Putting It Together

The fastest way to learn regex is to experiment with immediate feedback. Take any pattern from this guide, paste it into the Regex Tester, and edit it character by character while watching the highlights and the capture-group table respond. Within an afternoon, the dense syntax stops looking like noise and starts reading like the precise, powerful language it is. Pair it with our JSON Formatter when you are extracting structured data, and you have a complete, private, browser-based toolkit for wrangling text.

Keep this guide bookmarked as a reference. The syntax you use rarely — lookbehind, Unicode property escapes, sticky matching — is exactly the syntax you will forget, and a quick test in the tool is faster than searching documentation. Over time you will build a personal library of patterns that solve recurring problems in your own codebase, and reading an unfamiliar expression written by a colleague will become second nature rather than a chore. That fluency compounds: every pattern you understand makes the next one easier to read and write.

← Back to Blog