diff --git a/src/SUMMARY.md b/src/SUMMARY.md index 4a374c24d79..4f96c20dc55 100644 --- a/src/SUMMARY.md +++ b/src/SUMMARY.md @@ -32,6 +32,7 @@ - [Clone a Website](generic-methodologies-and-resources/phishing-methodology/clone-a-website.md) - [Detecting Phishing](generic-methodologies-and-resources/phishing-methodology/detecting-phising.md) - [Discord Invite Hijacking](generic-methodologies-and-resources/phishing-methodology/discord-invite-hijacking.md) + - [Homograph Attacks](generic-methodologies-and-resources/phishing-methodology/homograph-attacks.md) - [Mobile Phishing Malicious Apps](generic-methodologies-and-resources/phishing-methodology/mobile-phishing-malicious-apps.md) - [Phishing Files & Documents](generic-methodologies-and-resources/phishing-methodology/phishing-documents.md) - [Basic Forensic Methodology](generic-methodologies-and-resources/basic-forensic-methodology/README.md) diff --git a/src/generic-methodologies-and-resources/phishing-methodology/README.md b/src/generic-methodologies-and-resources/phishing-methodology/README.md index a5ac2912c70..7c77192d691 100644 --- a/src/generic-methodologies-and-resources/phishing-methodology/README.md +++ b/src/generic-methodologies-and-resources/phishing-methodology/README.md @@ -25,6 +25,10 @@ - **hypened subdomain**: Change the **dot for a hyphen** of a subdomain (e.g., www-zelster.com). - **New TLD**: Same domain using a **new TLD** (e.g., zelster.org) - **Homoglyph**: It **replaces** a letter in the domain name with **letters that look similar** (e.g., zelfser.com). + +{{#ref}} +homograph-attacks.md +{{#endref}} - **Transposition:** It **swaps two letters** within the domain name (e.g., zelsetr.com). - **Singularization/Pluralization**: Adds or removes “s” at the end of the domain name (e.g., zeltsers.com). - **Omission**: It **removes one** of the letters from the domain name (e.g., zelser.com). diff --git a/src/generic-methodologies-and-resources/phishing-methodology/homograph-attacks.md b/src/generic-methodologies-and-resources/phishing-methodology/homograph-attacks.md new file mode 100644 index 00000000000..d540597c6fa --- /dev/null +++ b/src/generic-methodologies-and-resources/phishing-methodology/homograph-attacks.md @@ -0,0 +1,108 @@ +# Homograph / Homoglyph Attacks in Phishing + +{{#include ../../banners/hacktricks-training.md}} + +## Overview + +A homograph (aka homoglyph) attack abuses the fact that many **Unicode code points from non-Latin scripts are visually identical or extremely similar to ASCII characters**. By replacing one or more Latin characters with their look-alike counterparts, an attacker can craft: + +* Display names, subjects or message bodies that look legitimate to the human eye but bypass keyword-based detections. +* Domains, sub-domains or URL paths that fool victims into believing they are visiting a trusted site. + +Because every glyph is identified internally by its **Unicode code point**, a single substituted character is enough to defeat naïve string comparisons (e.g., `"Παypal.com"` vs. `"Paypal.com"`). + +## Typical Phishing Workflow + +1. **Craft message content** – Replace specific Latin letters in the impersonated brand / keyword with visually indistinguishable characters from another script (Greek, Cyrillic, Armenian, Cherokee, etc.). +2. **Register supporting infrastructure** – Optionally register a homoglyph domain and obtain a TLS certificate (most CAs do no visual similarity checks). +3. **Send email / SMS** – The message contains homoglyphs in one or more of the following locations: + * Sender display name (e.g., `Ηеlрdеѕk`) + * Subject line (`Urgеnt Аctіon Rеquіrеd`) + * Hyperlink text or fully qualified domain name +4. **Redirect chain** – Victim is bounced through seemingly benign websites or URL shorteners before landing on the malicious host that harvests credentials / delivers malware. + +## Unicode Ranges Commonly Abused + +| Script | Range | Example glyph | Looks like | +|--------|-------|---------------|------------| +| Greek | U+0370-03FF | `Η` (U+0397) | Latin `H` | +| Greek | U+0370-03FF | `ρ` (U+03C1) | Latin `p` | +| Cyrillic | U+0400-04FF | `а` (U+0430) | Latin `a` | +| Cyrillic | U+0400-04FF | `е` (U+0435) | Latin `e` | +| Armenian | U+0530-058F | `օ` (U+0585) | Latin `o` | +| Cherokee | U+13A0-13FF | `Ꭲ` (U+13A2) | Latin `T` | + +> Tip: Full Unicode charts are available at [unicode.org](https://home.unicode.org/). + +## Detection Techniques + +### 1. Mixed-Script Inspection + +Phishing emails aimed at an English-speaking organisation should rarely mix characters from multiple scripts. A simple but effective heuristic is to: + +1. Iterate each character of the inspected string. +2. Map the code point to its Unicode block. +3. Raise an alert if more than one script is present **or** if non-Latin scripts appear where they are not expected (display name, domain, subject, URL, etc.). + +Python proof-of-concept: + +```python +import unicodedata as ud +from collections import defaultdict + +SUSPECT_FIELDS = { + "display_name": "Ηоmоgraph Illusion", # example data + "subject": "Finаnꮯiаl Տtatеmеnt", + "url": "https://xn--messageconnecton-2kb.blob.core.windows.net" # punycode +} + +for field, value in SUSPECT_FIELDS.items(): + blocks = defaultdict(int) + for ch in value: + if ch.isascii(): + blocks['Latin'] += 1 + else: + name = ud.name(ch, 'UNKNOWN') + block = name.split(' ')[0] # e.g., 'CYRILLIC' + blocks[block] += 1 + if len(blocks) > 1: + print(f"[!] Mixed scripts in {field}: {dict(blocks)} -> {value}") +``` + +### 2. Punycode Normalisation (Domains) + +Internationalised Domain Names (IDNs) are encoded with **punycode** (`xn--`). Converting every hostname to punycode and then back to Unicode allows matching against a whitelist or performing similarity checks (e.g., Levenshtein distance) **after** the string has been normalised. + +```python +import idna +hostname = "Ρаypal.com" # Greek Rho + Cyrillic a +puny = idna.encode(hostname).decode() +print(puny) # xn--yl8hpyal.com +``` + +### 3. Homoglyph Dictionaries / Algorithms + +Tools such as **dnstwist** (`--homoglyph`) or **urlcrazy** can enumerate visually-similar domain permutations and are useful for proactive takedown / monitoring. + +## Prevention & Mitigation + +* Enforce strict DMARC/DKIM/SPF policies – prevent spoofing from unauthorised domains. +* Implement the detection logic above in **Secure Email Gateways** and **SIEM/XSOAR** playbooks. +* Flag or quarantine messages where display name domain ≠ sender domain. +* Educate users: copy-paste suspicious text into a Unicode inspector, hover links, never trust URL shorteners. + +## Real-World Examples + +* Display name: `Сonfidеntiаl Ꭲiꮯkеt` (Cyrillic `С`, `е`, `а`; Cherokee `Ꭲ`; Latin small capital `ꮯ`). +* Domain chain: `bestseoservices.com` ➜ municipal `/templates` directory ➜ `kig.skyvaulyt.ru` ➜ fake Microsoft login at `mlcorsftpsswddprotcct.approaches.it.com` protected by custom OTP CAPTCHA. +* Spotify impersonation: `Sρօtifս` sender with link hidden behind `redirects.ca`. + +These samples originate from Unit 42 research (July 2025) and illustrate how homograph abuse is combined with URL redirection and CAPTCHA evasion to bypass automated analysis. + +## References + +- [The Homograph Illusion: Not Everything Is As It Seems](https://unit42.paloaltonetworks.com/homograph-attacks/) +- [Unicode Character Database](https://home.unicode.org/) +- [dnstwist – domain permutation engine](https://github.com/elceef/dnstwist) + +{{#include ../../banners/hacktricks-training.md}}