Regex Fundamentals
Match text patterns with the core regex syntax used across tools and languages.
Theory
A regular expression describes a text pattern. The same core syntax works in grep, Python, JavaScript, and most editors. The building blocks:
- Literals match themselves; metacharacters (
. * + ? [] () ^ $) are special - Character classes:
\ddigit,\wword char,\swhitespace,[a-z]range - Quantifiers:
*(0+),+(1+),?(0 or 1),{2,4}(range) - Anchors:
^start of line,$end of line,\bword boundary
Build patterns incrementally and test them on real samples — a tool like regex101 shows exactly what each part matches.
Real-World Example
\d{3}-\d{4} matches 555-1234
^ERROR lines starting with ERROR
\.log$ strings ending in .log
[A-Za-z0-9._%+-]+@… the local part of an email
\bcat\b the word "cat", not "category"grep -E '\bERROR\b' app.log # whole word ERROR
grep -E '^[0-9]{4}-[0-9]{2}-[0-9]{2}' log # lines starting with a date Hands-On Exercise
- Write a pattern that matches a 5-digit US ZIP code.
- Match all lines that start with a timestamp like
2026-06-22. - Match a filename ending in
.logor.txt. - Explain why
\.is needed to match an IP address literally.
Cheat Sheet▾
| Token | Matches |
|---|---|
. | any character |
\d \w \s | digit / word / whitespace |
[abc] [^abc] | set / negated set |
* + ? | 0+, 1+, 0 or 1 |
{n,m} | between n and m |
^ $ | line start / end |
\b | word boundary |
a|b | a or b |
Common Interview Questions▾
What does the dot . match, and how do you match a literal dot?
. matches any single character (except newline by default). Escape it as \.
to match a literal period.
What's the difference between * and + ?
* matches zero or more of the preceding element; + matches one or more (so it
requires at least one occurrence).
What does \b do?
It’s a word boundary anchor — \bcat\b matches the standalone word “cat” but not
“category” or “scatter”.
Official Documentation
📝 My notes on this topic
Auto-saves as you type