Skip to content

python.regex_compile

Performance Medium

Detects regex patterns that are compiled repeatedly inside loops or frequently-called functions instead of being compiled once at module level.

Compiling regex patterns repeatedly:

  • Wastes CPU cycles — Regex compilation is expensive
  • Increases latency — Each request pays the compilation cost
  • Ignores caching — Python’s re module caches patterns, but cache is limited
  • Degrades under load — Performance impact scales with request rate
# ❌ Before (compiled every call)
def validate_email(email):
pattern = re.compile(r'^[\w\.-]+@[\w\.-]+\.\w+$')
return pattern.match(email) is not None
def process_items(items):
for item in items:
if re.match(r'\d{4}-\d{2}-\d{2}', item.date): # Compiled per iteration
process(item)
# ✅ After (compiled once)
EMAIL_PATTERN = re.compile(r'^[\w\.-]+@[\w\.-]+\.\w+$')
DATE_PATTERN = re.compile(r'\d{4}-\d{2}-\d{2}')
def validate_email(email):
return EMAIL_PATTERN.match(email) is not None
def process_items(items):
for item in items:
if DATE_PATTERN.match(item.date):
process(item)
  • re.compile() inside functions or loops
  • re.match(), re.search(), re.findall() with literal patterns in loops
  • Repeated pattern compilation in hot paths

Unfault generates patches that move regex compilation to module level:

# Patched: moved to module level
_PATTERN_1 = re.compile(r'\d{4}-\d{2}-\d{2}')
def process_dates(items):
for item in items:
if _PATTERN_1.match(item.date):
process(item)
Pattern10,000 matches
Compiled in loop~150ms
Pre-compiled~12ms