Skip to content

Commit bfaada8

Browse files
authored
Merge pull request #1191 from HackTricks-wiki/update_The_Homograph_Illusion__Not_Everything_Is_As_It_Se_20250726_013005
The Homograph Illusion Not Everything Is As It Seems
2 parents 137ecc9 + 6fd514c commit bfaada8

File tree

3 files changed

+113
-0
lines changed

3 files changed

+113
-0
lines changed

src/SUMMARY.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,7 @@
3232
- [Clone a Website](generic-methodologies-and-resources/phishing-methodology/clone-a-website.md)
3333
- [Detecting Phishing](generic-methodologies-and-resources/phishing-methodology/detecting-phising.md)
3434
- [Discord Invite Hijacking](generic-methodologies-and-resources/phishing-methodology/discord-invite-hijacking.md)
35+
- [Homograph Attacks](generic-methodologies-and-resources/phishing-methodology/homograph-attacks.md)
3536
- [Mobile Phishing Malicious Apps](generic-methodologies-and-resources/phishing-methodology/mobile-phishing-malicious-apps.md)
3637
- [Phishing Files & Documents](generic-methodologies-and-resources/phishing-methodology/phishing-documents.md)
3738
- [Basic Forensic Methodology](generic-methodologies-and-resources/basic-forensic-methodology/README.md)

src/generic-methodologies-and-resources/phishing-methodology/README.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,10 @@
2525
- **hypened subdomain**: Change the **dot for a hyphen** of a subdomain (e.g., www-zelster.com).
2626
- **New TLD**: Same ___domain using a **new TLD** (e.g., zelster.org)
2727
- **Homoglyph**: It **replaces** a letter in the ___domain name with **letters that look similar** (e.g., zelfser.com).
28+
29+
{{#ref}}
30+
homograph-attacks.md
31+
{{#endref}}
2832
- **Transposition:** It **swaps two letters** within the ___domain name (e.g., zelsetr.com).
2933
- **Singularization/Pluralization**: Adds or removes “s” at the end of the ___domain name (e.g., zeltsers.com).
3034
- **Omission**: It **removes one** of the letters from the ___domain name (e.g., zelser.com).
Lines changed: 108 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,108 @@
1+
# Homograph / Homoglyph Attacks in Phishing
2+
3+
{{#include ../../banners/hacktricks-training.md}}
4+
5+
## Overview
6+
7+
A homograph (aka homoglyph) attack abuses the fact that many **Unicode code points from non-Latin scripts are visually identical or extremely similar to ASCII characters**. By replacing one or more Latin characters with their look-alike counterparts, an attacker can craft:
8+
9+
* Display names, subjects or message bodies that look legitimate to the human eye but bypass keyword-based detections.
10+
* Domains, sub-domains or URL paths that fool victims into believing they are visiting a trusted site.
11+
12+
Because every glyph is identified internally by its **Unicode code point**, a single substituted character is enough to defeat naïve string comparisons (e.g., `"Παypal.com"` vs. `"Paypal.com"`).
13+
14+
## Typical Phishing Workflow
15+
16+
1. **Craft message content** – Replace specific Latin letters in the impersonated brand / keyword with visually indistinguishable characters from another script (Greek, Cyrillic, Armenian, Cherokee, etc.).
17+
2. **Register supporting infrastructure** – Optionally register a homoglyph ___domain and obtain a TLS certificate (most CAs do no visual similarity checks).
18+
3. **Send email / SMS** – The message contains homoglyphs in one or more of the following locations:
19+
* Sender display name (e.g., `Ηеlрdеѕk`)
20+
* Subject line (`Urgеnt Аctіon Rеquіrеd`)
21+
* Hyperlink text or fully qualified ___domain name
22+
4. **Redirect chain** – Victim is bounced through seemingly benign websites or URL shorteners before landing on the malicious host that harvests credentials / delivers malware.
23+
24+
## Unicode Ranges Commonly Abused
25+
26+
| Script | Range | Example glyph | Looks like |
27+
|--------|-------|---------------|------------|
28+
| Greek | U+0370-03FF | `Η` (U+0397) | Latin `H` |
29+
| Greek | U+0370-03FF | `ρ` (U+03C1) | Latin `p` |
30+
| Cyrillic | U+0400-04FF | `а` (U+0430) | Latin `a` |
31+
| Cyrillic | U+0400-04FF | `е` (U+0435) | Latin `e` |
32+
| Armenian | U+0530-058F | `օ` (U+0585) | Latin `o` |
33+
| Cherokee | U+13A0-13FF | `` (U+13A2) | Latin `T` |
34+
35+
> Tip: Full Unicode charts are available at [unicode.org](https://home.unicode.org/).
36+
37+
## Detection Techniques
38+
39+
### 1. Mixed-Script Inspection
40+
41+
Phishing emails aimed at an English-speaking organisation should rarely mix characters from multiple scripts. A simple but effective heuristic is to:
42+
43+
1. Iterate each character of the inspected string.
44+
2. Map the code point to its Unicode block.
45+
3. Raise an alert if more than one script is present **or** if non-Latin scripts appear where they are not expected (display name, ___domain, subject, URL, etc.).
46+
47+
Python proof-of-concept:
48+
49+
```python
50+
import unicodedata as ud
51+
from collections import defaultdict
52+
53+
SUSPECT_FIELDS = {
54+
"display_name": "Ηоmоgraph Illusion", # example data
55+
"subject": "Finаnꮯiаl Տtatеmеnt",
56+
"url": "https://xn--messageconnecton-2kb.blob.core.windows.net" # punycode
57+
}
58+
59+
for field, value in SUSPECT_FIELDS.items():
60+
blocks = defaultdict(int)
61+
for ch in value:
62+
if ch.isascii():
63+
blocks['Latin'] += 1
64+
else:
65+
name = ud.name(ch, 'UNKNOWN')
66+
block = name.split(' ')[0] # e.g., 'CYRILLIC'
67+
blocks[block] += 1
68+
if len(blocks) > 1:
69+
print(f"[!] Mixed scripts in {field}: {dict(blocks)} -> {value}")
70+
```
71+
72+
### 2. Punycode Normalisation (Domains)
73+
74+
Internationalised Domain Names (IDNs) are encoded with **punycode** (`xn--`). Converting every hostname to punycode and then back to Unicode allows matching against a whitelist or performing similarity checks (e.g., Levenshtein distance) **after** the string has been normalised.
75+
76+
```python
77+
import idna
78+
hostname = "Ρаypal.com" # Greek Rho + Cyrillic a
79+
puny = idna.encode(hostname).decode()
80+
print(puny) # xn--yl8hpyal.com
81+
```
82+
83+
### 3. Homoglyph Dictionaries / Algorithms
84+
85+
Tools such as **dnstwist** (`--homoglyph`) or **urlcrazy** can enumerate visually-similar ___domain permutations and are useful for proactive takedown / monitoring.
86+
87+
## Prevention & Mitigation
88+
89+
* Enforce strict DMARC/DKIM/SPF policies – prevent spoofing from unauthorised domains.
90+
* Implement the detection logic above in **Secure Email Gateways** and **SIEM/XSOAR** playbooks.
91+
* Flag or quarantine messages where display name ___domain ≠ sender ___domain.
92+
* Educate users: copy-paste suspicious text into a Unicode inspector, hover links, never trust URL shorteners.
93+
94+
## Real-World Examples
95+
96+
* Display name: `Сonfidеntiаl Ꭲiꮯkеt` (Cyrillic `С`, `е`, `а`; Cherokee ``; Latin small capital ``).
97+
* Domain chain: `bestseoservices.com` ➜ municipal `/templates` directory ➜ `kig.skyvaulyt.ru` ➜ fake Microsoft login at `mlcorsftpsswddprotcct.approaches.it.com` protected by custom OTP CAPTCHA.
98+
* Spotify impersonation: `Sρօtifս` sender with link hidden behind `redirects.ca`.
99+
100+
These samples originate from Unit 42 research (July 2025) and illustrate how homograph abuse is combined with URL redirection and CAPTCHA evasion to bypass automated analysis.
101+
102+
## References
103+
104+
- [The Homograph Illusion: Not Everything Is As It Seems](https://unit42.paloaltonetworks.com/homograph-attacks/)
105+
- [Unicode Character Database](https://home.unicode.org/)
106+
- [dnstwist – ___domain permutation engine](https://github.com/elceef/dnstwist)
107+
108+
{{#include ../../banners/hacktricks-training.md}}

0 commit comments

Comments
 (0)