Charcuterie: A Visual Unicode Similarity Explorer for Finding Confusable Characters
Charcuterie: Visual Explorer for Unicode Character Similarity and Confusability
Charcuterie is an interactive web tool that helps users explore visual similarity between Unicode characters — finding lookalikes that could be used for spoofing, homograph attacks, or creative typography. The project has gained 70 points on Hacker News with 10 comments.
The Problem It Solves
Unicode contains over 149,000 characters, many of which look identical or nearly identical to each other:
- Latin A (U+0041) vs Cyrillic А (U+0410) — visually identical
- Latin O (U+004F) vs Cyrillic О (U+041E) — visually identical
- Greek micro sign vs Latin mu — nearly identical
- Various dashes: en dash, em dash, hyphen, minus sign — all look similar
These confusable characters are exploited in:
- Domain spoofing: Creating fake URLs that look legitimate
- Phishing attacks: Using lookalike characters in emails
- Code obfuscation: Mixing lookalike characters to hide malicious code
- Social media impersonation: Creating usernames that look identical to real ones
How Charcuterie Works
The tool provides:
- Visual similarity search: Find characters that look like a given character
- Confusability groups: Characters grouped by visual similarity
- Interactive exploration: Browse and compare characters side by side
- Unicode metadata: See codepoints, categories, and properties
Real-World Applications
- Security: Identify potential homograph attacks in domains and URLs
- Font design: Understand which characters need distinct glyphs
- Localization: Discover encoding issues caused by confusable characters
- Typography: Explore the richness of the Unicode character set
- Data cleaning: Find and fix character substitution errors in datasets
The Name
Charcuterie is a French term for cooked meats — a playful reference to the tool slicing and examining the character set.
Technical Background
Unicode confusability is formally defined in the Unicode Security Mechanism specification (UAX #39), which provides algorithms for identifying confusable characters. Charcuterie makes this data accessible through a visual interface rather than technical documentation.
Source: elastiq.ch / HN — 70 points, 10 comments