|
| 1 | +# Homograph / Homoglyph Attacks in Phishing |
| 2 | + |
| 3 | +{{#include ../../banners/hacktricks-training.md}} |
| 4 | + |
| 5 | +## Overview |
| 6 | + |
| 7 | +A homograph (aka homoglyph) attack abuses the fact that many **Unicode code points from non-Latin scripts are visually identical or extremely similar to ASCII characters**. By replacing one or more Latin characters with their look-alike counterparts, an attacker can craft: |
| 8 | + |
| 9 | +* Display names, subjects or message bodies that look legitimate to the human eye but bypass keyword-based detections. |
| 10 | +* Domains, sub-domains or URL paths that fool victims into believing they are visiting a trusted site. |
| 11 | + |
| 12 | +Because every glyph is identified internally by its **Unicode code point**, a single substituted character is enough to defeat naïve string comparisons (e.g., `"Παypal.com"` vs. `"Paypal.com"`). |
| 13 | + |
| 14 | +## Typical Phishing Workflow |
| 15 | + |
| 16 | +1. **Craft message content** – Replace specific Latin letters in the impersonated brand / keyword with visually indistinguishable characters from another script (Greek, Cyrillic, Armenian, Cherokee, etc.). |
| 17 | +2. **Register supporting infrastructure** – Optionally register a homoglyph ___domain and obtain a TLS certificate (most CAs do no visual similarity checks). |
| 18 | +3. **Send email / SMS** – The message contains homoglyphs in one or more of the following locations: |
| 19 | + * Sender display name (e.g., `Ηеlрdеѕk`) |
| 20 | + * Subject line (`Urgеnt Аctіon Rеquіrеd`) |
| 21 | + * Hyperlink text or fully qualified ___domain name |
| 22 | +4. **Redirect chain** – Victim is bounced through seemingly benign websites or URL shorteners before landing on the malicious host that harvests credentials / delivers malware. |
| 23 | + |
| 24 | +## Unicode Ranges Commonly Abused |
| 25 | + |
| 26 | +| Script | Range | Example glyph | Looks like | |
| 27 | +|--------|-------|---------------|------------| |
| 28 | +| Greek | U+0370-03FF | `Η` (U+0397) | Latin `H` | |
| 29 | +| Greek | U+0370-03FF | `ρ` (U+03C1) | Latin `p` | |
| 30 | +| Cyrillic | U+0400-04FF | `а` (U+0430) | Latin `a` | |
| 31 | +| Cyrillic | U+0400-04FF | `е` (U+0435) | Latin `e` | |
| 32 | +| Armenian | U+0530-058F | `օ` (U+0585) | Latin `o` | |
| 33 | +| Cherokee | U+13A0-13FF | `Ꭲ` (U+13A2) | Latin `T` | |
| 34 | + |
| 35 | +> Tip: Full Unicode charts are available at [unicode.org](https://home.unicode.org/). |
| 36 | +
|
| 37 | +## Detection Techniques |
| 38 | + |
| 39 | +### 1. Mixed-Script Inspection |
| 40 | + |
| 41 | +Phishing emails aimed at an English-speaking organisation should rarely mix characters from multiple scripts. A simple but effective heuristic is to: |
| 42 | + |
| 43 | +1. Iterate each character of the inspected string. |
| 44 | +2. Map the code point to its Unicode block. |
| 45 | +3. Raise an alert if more than one script is present **or** if non-Latin scripts appear where they are not expected (display name, ___domain, subject, URL, etc.). |
| 46 | + |
| 47 | +Python proof-of-concept: |
| 48 | + |
| 49 | +```python |
| 50 | +import unicodedata as ud |
| 51 | +from collections import defaultdict |
| 52 | + |
| 53 | +SUSPECT_FIELDS = { |
| 54 | + "display_name": "Ηоmоgraph Illusion", # example data |
| 55 | + "subject": "Finаnꮯiаl Տtatеmеnt", |
| 56 | + "url": "https://xn--messageconnecton-2kb.blob.core.windows.net" # punycode |
| 57 | +} |
| 58 | + |
| 59 | +for field, value in SUSPECT_FIELDS.items(): |
| 60 | + blocks = defaultdict(int) |
| 61 | + for ch in value: |
| 62 | + if ch.isascii(): |
| 63 | + blocks['Latin'] += 1 |
| 64 | + else: |
| 65 | + name = ud.name(ch, 'UNKNOWN') |
| 66 | + block = name.split(' ')[0] # e.g., 'CYRILLIC' |
| 67 | + blocks[block] += 1 |
| 68 | + if len(blocks) > 1: |
| 69 | + print(f"[!] Mixed scripts in {field}: {dict(blocks)} -> {value}") |
| 70 | +``` |
| 71 | + |
| 72 | +### 2. Punycode Normalisation (Domains) |
| 73 | + |
| 74 | +Internationalised Domain Names (IDNs) are encoded with **punycode** (`xn--`). Converting every hostname to punycode and then back to Unicode allows matching against a whitelist or performing similarity checks (e.g., Levenshtein distance) **after** the string has been normalised. |
| 75 | + |
| 76 | +```python |
| 77 | +import idna |
| 78 | +hostname = "Ρаypal.com" # Greek Rho + Cyrillic a |
| 79 | +puny = idna.encode(hostname).decode() |
| 80 | +print(puny) # xn--yl8hpyal.com |
| 81 | +``` |
| 82 | + |
| 83 | +### 3. Homoglyph Dictionaries / Algorithms |
| 84 | + |
| 85 | +Tools such as **dnstwist** (`--homoglyph`) or **urlcrazy** can enumerate visually-similar ___domain permutations and are useful for proactive takedown / monitoring. |
| 86 | + |
| 87 | +## Prevention & Mitigation |
| 88 | + |
| 89 | +* Enforce strict DMARC/DKIM/SPF policies – prevent spoofing from unauthorised domains. |
| 90 | +* Implement the detection logic above in **Secure Email Gateways** and **SIEM/XSOAR** playbooks. |
| 91 | +* Flag or quarantine messages where display name ___domain ≠ sender ___domain. |
| 92 | +* Educate users: copy-paste suspicious text into a Unicode inspector, hover links, never trust URL shorteners. |
| 93 | + |
| 94 | +## Real-World Examples |
| 95 | + |
| 96 | +* Display name: `Сonfidеntiаl Ꭲiꮯkеt` (Cyrillic `С`, `е`, `а`; Cherokee `Ꭲ`; Latin small capital `ꮯ`). |
| 97 | +* Domain chain: `bestseoservices.com` ➜ municipal `/templates` directory ➜ `kig.skyvaulyt.ru` ➜ fake Microsoft login at `mlcorsftpsswddprotcct.approaches.it.com` protected by custom OTP CAPTCHA. |
| 98 | +* Spotify impersonation: `Sρօtifս` sender with link hidden behind `redirects.ca`. |
| 99 | + |
| 100 | +These samples originate from Unit 42 research (July 2025) and illustrate how homograph abuse is combined with URL redirection and CAPTCHA evasion to bypass automated analysis. |
| 101 | + |
| 102 | +## References |
| 103 | + |
| 104 | +- [The Homograph Illusion: Not Everything Is As It Seems](https://unit42.paloaltonetworks.com/homograph-attacks/) |
| 105 | +- [Unicode Character Database](https://home.unicode.org/) |
| 106 | +- [dnstwist – ___domain permutation engine](https://github.com/elceef/dnstwist) |
| 107 | + |
| 108 | +{{#include ../../banners/hacktricks-training.md}} |
0 commit comments