Skip to content

Deobfuscating Android Apps with Androidmeda A Smarter Way to... #1175

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -31,8 +31,79 @@ By executing the code in a controlled environment, dynamic analysis **allows for
- **Identifying Obfuscation Techniques**: By monitoring the application's behavior, dynamic analysis can help identify specific obfuscation techniques being used, such as code virtualization, packers, or dynamic code generation.
- **Uncovering Hidden Functionality**: Obfuscated code may contain hidden functionalities that are not apparent through static analysis alone. Dynamic analysis allows for the observation of all code paths, including those conditionally executed, to uncover such hidden functionalities.

### Automated De-obfuscation with LLMs (Androidmeda)

While the previous sections focus on fully manual strategies, in 2025 a new class of *Large-Language-Model (LLM) powered* tooling emerged that can automate most of the tedious renaming and control-flow recovery work.
One representative project is **[Androidmeda](https://github.com/In3tinct/Androidmeda)** – a Python utility that takes *decompiled* Java sources (e.g. produced by `jadx`) and returns a greatly cleaned-up, commented and security-annotated version of the code.

#### Key capabilities
* Renames meaningless identifiers generated by ProGuard / DexGuard / DashO / Allatori / … to *semantic* names.
* Detects and restructures **control-flow flattening**, replacing opaque switch-case state machines with normal loops / if-else constructs.
* Decrypts common **string encryption** patterns when possible.
* Injects **inline comments** that explain the purpose of complex blocks.
* Performs a *lightweight static security scan* and writes the findings to `vuln_report.json` with severity levels (informational → critical).

#### Installation
```bash
git clone https://github.com/In3tinct/Androidmeda
cd Androidmeda
pip3 install -r requirements.txt
```

#### Preparing the inputs
1. Decompile the target APK with `jadx` (or any other decompiler) and keep only the *source* directory that contains the `.java` files:
```bash
jadx -d input_dir/ target.apk
```
2. (Optional) Trim `input_dir/` so that it only contains the application packages you want to analyse – this massively speeds-up processing and LLM costs.

#### Usage examples

Remote provider (Gemini-1.5-flash):
```bash
export OPENAI_API_KEY=<your_key>
python3 androidmeda.py \
--llm_provider google \
--llm_model gemini-1.5-flash \
--source_dir input_dir/ \
--output_dir out/ \
--save_code true
```

Offline (local `ollama` backend with llama3.2):
```bash
python3 androidmeda.py \
--llm_provider ollama \
--llm_model llama3.2 \
--source_dir input_dir/ \
--output_dir out/ \
--save_code true
```

#### Output
* `out/vuln_report.json` – JSON array with `file`, `line`, `issue`, `severity`.
* A mirrored package tree with **de-obfuscated `.java` files** (only if `--save_code true`).

#### Tips & troubleshooting
* **Skipped class** ⇒ usually caused by an unparsable method; isolate the package or update the parser regex.
* **Slow run-time / high token usage** ⇒ point `--source_dir` to *specific* app packages instead of the entire decompile.
* Always *manually review* the vulnerability report – LLM hallucinations can lead to false positives / negatives.

#### Practical value – Crocodilus malware case study
Feeding a heavily obfuscated sample from the 2025 *Crocodilus* banking trojan through Androidmeda reduced analysis time from *hours* to *minutes*: the tool recovered call-graph semantics, revealed calls to accessibility APIs and hard-coded C2 URLs, and produced a concise report that could be imported into analysts’ dashboards.

---

## References and Further Reading

- [https://maddiestone.github.io/AndroidAppRE/obfuscation.html](https://maddiestone.github.io/AndroidAppRE/obfuscation.html)
- BlackHat USA 2018: “Unpacking the Packed Unpacker: Reverse Engineering an Android Anti-Analysis Library” [[video](https://www.youtube.com/watch?v=s0Tqi7fuOSU)]
- This talk goes over reverse engineering one of the most complex anti-analysis native libraries I’ve seen used by an Android application. It covers mostly obfuscation techniques in native code.
- REcon 2019: “The Path to the Payload: Android Edition” [[video](https://recon.cx/media-archive/2019/Session.005.Maddie_Stone.The_path_to_the_payload_Android_Edition-J3ZnNl2GYjEfa.mp4)]
- This talk discusses a series of obfuscation techniques, solely in Java code, that an Android botnet was using to hide its behavior.
- Deobfuscating Android Apps with Androidmeda (blog post) – [mobile-hacker.com](https://www.mobile-hacker.com/2025/07/22/deobfuscating-android-apps-with-androidmeda-a-smarter-way-to-read-obfuscated-code/)
- Androidmeda source code – [https://github.com/In3tinct/Androidmeda](https://github.com/In3tinct/Androidmeda)

- [https://maddiestone.github.io/AndroidAppRE/obfuscation.html](https://maddiestone.github.io/AndroidAppRE/obfuscation.html)
- BlackHat USA 2018: “Unpacking the Packed Unpacker: Reverse Engineering an Android Anti-Analysis Library” \[[video](https://www.youtube.com/watch?v=s0Tqi7fuOSU)]
- This talk goes over reverse engineering one of the most complex anti-analysis native libraries I’ve seen used by an Android application. It covers mostly obfuscation techniques in native code.
Expand All @@ -42,4 +113,3 @@ By executing the code in a controlled environment, dynamic analysis **allows for
{{#include ../../banners/hacktricks-training.md}}