diff --git a/src/mobile-pentesting/android-app-pentesting/manual-deobfuscation.md b/src/mobile-pentesting/android-app-pentesting/manual-deobfuscation.md index a77773ba535..4f412b15e6a 100644 --- a/src/mobile-pentesting/android-app-pentesting/manual-deobfuscation.md +++ b/src/mobile-pentesting/android-app-pentesting/manual-deobfuscation.md @@ -31,8 +31,79 @@ By executing the code in a controlled environment, dynamic analysis **allows for - **Identifying Obfuscation Techniques**: By monitoring the application's behavior, dynamic analysis can help identify specific obfuscation techniques being used, such as code virtualization, packers, or dynamic code generation. - **Uncovering Hidden Functionality**: Obfuscated code may contain hidden functionalities that are not apparent through static analysis alone. Dynamic analysis allows for the observation of all code paths, including those conditionally executed, to uncover such hidden functionalities. +### Automated De-obfuscation with LLMs (Androidmeda) + +While the previous sections focus on fully manual strategies, in 2025 a new class of *Large-Language-Model (LLM) powered* tooling emerged that can automate most of the tedious renaming and control-flow recovery work. +One representative project is **[Androidmeda](https://github.com/In3tinct/Androidmeda)** – a Python utility that takes *decompiled* Java sources (e.g. produced by `jadx`) and returns a greatly cleaned-up, commented and security-annotated version of the code. + +#### Key capabilities +* Renames meaningless identifiers generated by ProGuard / DexGuard / DashO / Allatori / … to *semantic* names. +* Detects and restructures **control-flow flattening**, replacing opaque switch-case state machines with normal loops / if-else constructs. +* Decrypts common **string encryption** patterns when possible. +* Injects **inline comments** that explain the purpose of complex blocks. +* Performs a *lightweight static security scan* and writes the findings to `vuln_report.json` with severity levels (informational → critical). + +#### Installation +```bash +git clone https://github.com/In3tinct/Androidmeda +cd Androidmeda +pip3 install -r requirements.txt +``` + +#### Preparing the inputs +1. Decompile the target APK with `jadx` (or any other decompiler) and keep only the *source* directory that contains the `.java` files: + ```bash + jadx -d input_dir/ target.apk + ``` +2. (Optional) Trim `input_dir/` so that it only contains the application packages you want to analyse – this massively speeds-up processing and LLM costs. + +#### Usage examples + +Remote provider (Gemini-1.5-flash): +```bash +export OPENAI_API_KEY= +python3 androidmeda.py \ + --llm_provider google \ + --llm_model gemini-1.5-flash \ + --source_dir input_dir/ \ + --output_dir out/ \ + --save_code true +``` + +Offline (local `ollama` backend with llama3.2): +```bash +python3 androidmeda.py \ + --llm_provider ollama \ + --llm_model llama3.2 \ + --source_dir input_dir/ \ + --output_dir out/ \ + --save_code true +``` + +#### Output +* `out/vuln_report.json` – JSON array with `file`, `line`, `issue`, `severity`. +* A mirrored package tree with **de-obfuscated `.java` files** (only if `--save_code true`). + +#### Tips & troubleshooting +* **Skipped class** ⇒ usually caused by an unparsable method; isolate the package or update the parser regex. +* **Slow run-time / high token usage** ⇒ point `--source_dir` to *specific* app packages instead of the entire decompile. +* Always *manually review* the vulnerability report – LLM hallucinations can lead to false positives / negatives. + +#### Practical value – Crocodilus malware case study +Feeding a heavily obfuscated sample from the 2025 *Crocodilus* banking trojan through Androidmeda reduced analysis time from *hours* to *minutes*: the tool recovered call-graph semantics, revealed calls to accessibility APIs and hard-coded C2 URLs, and produced a concise report that could be imported into analysts’ dashboards. + +--- + ## References and Further Reading +- [https://maddiestone.github.io/AndroidAppRE/obfuscation.html](https://maddiestone.github.io/AndroidAppRE/obfuscation.html) +- BlackHat USA 2018: “Unpacking the Packed Unpacker: Reverse Engineering an Android Anti-Analysis Library” [[video](https://www.youtube.com/watch?v=s0Tqi7fuOSU)] + - This talk goes over reverse engineering one of the most complex anti-analysis native libraries I’ve seen used by an Android application. It covers mostly obfuscation techniques in native code. +- REcon 2019: “The Path to the Payload: Android Edition” [[video](https://recon.cx/media-archive/2019/Session.005.Maddie_Stone.The_path_to_the_payload_Android_Edition-J3ZnNl2GYjEfa.mp4)] + - This talk discusses a series of obfuscation techniques, solely in Java code, that an Android botnet was using to hide its behavior. +- Deobfuscating Android Apps with Androidmeda (blog post) – [mobile-hacker.com](https://www.mobile-hacker.com/2025/07/22/deobfuscating-android-apps-with-androidmeda-a-smarter-way-to-read-obfuscated-code/) +- Androidmeda source code – [https://github.com/In3tinct/Androidmeda](https://github.com/In3tinct/Androidmeda) + - [https://maddiestone.github.io/AndroidAppRE/obfuscation.html](https://maddiestone.github.io/AndroidAppRE/obfuscation.html) - BlackHat USA 2018: “Unpacking the Packed Unpacker: Reverse Engineering an Android Anti-Analysis Library” \[[video](https://www.youtube.com/watch?v=s0Tqi7fuOSU)] - This talk goes over reverse engineering one of the most complex anti-analysis native libraries I’ve seen used by an Android application. It covers mostly obfuscation techniques in native code. @@ -42,4 +113,3 @@ By executing the code in a controlled environment, dynamic analysis **allows for {{#include ../../banners/hacktricks-training.md}} -