python.regex_compile
Performance
Medium
Detects regex patterns that are compiled repeatedly inside loops or frequently-called functions instead of being compiled once at module level.
Why It Matters
Section titled “Why It Matters”Compiling regex patterns repeatedly:
- Wastes CPU cycles — Regex compilation is expensive
- Increases latency — Each request pays the compilation cost
- Ignores caching — Python’s
remodule caches patterns, but cache is limited - Degrades under load — Performance impact scales with request rate
Example
Section titled “Example”# ❌ Before (compiled every call)def validate_email(email): pattern = re.compile(r'^[\w\.-]+@[\w\.-]+\.\w+$') return pattern.match(email) is not None
def process_items(items): for item in items: if re.match(r'\d{4}-\d{2}-\d{2}', item.date): # Compiled per iteration process(item)# ✅ After (compiled once)EMAIL_PATTERN = re.compile(r'^[\w\.-]+@[\w\.-]+\.\w+$')DATE_PATTERN = re.compile(r'\d{4}-\d{2}-\d{2}')
def validate_email(email): return EMAIL_PATTERN.match(email) is not None
def process_items(items): for item in items: if DATE_PATTERN.match(item.date): process(item)What Unfault Detects
Section titled “What Unfault Detects”re.compile()inside functions or loopsre.match(),re.search(),re.findall()with literal patterns in loops- Repeated pattern compilation in hot paths
Auto-Fix
Section titled “Auto-Fix”Unfault generates patches that move regex compilation to module level:
# Patched: moved to module level_PATTERN_1 = re.compile(r'\d{4}-\d{2}-\d{2}')
def process_dates(items): for item in items: if _PATTERN_1.match(item.date): process(item)Performance Impact
Section titled “Performance Impact”| Pattern | 10,000 matches |
|---|---|
| Compiled in loop | ~150ms |
| Pre-compiled | ~12ms |