TABLE OF CONTENT 📌
- Document Type Definition (DTD)
- Entities
- How XXE Works
- The Attack
- The Vulnerable Code (PHP Example)
- Mitigation
Extensible Markup Language (XML)
XML is a structured format for storing and transporting data that is easy for both humans and machines to understand. It uses tags to label and organize information, similar to how folders organize documents in a filing cabinet. For example, the following XML snippet represents a person’s details:
Document Type Definition (DTD)
A Document Type Definition (DTD) defines the structure and rules of an XML document, acting as a blueprint or guideline. Similar to a database schema, it specifies which elements (tags) and attributes are allowed, ensuring the XML document adheres to a specific format. This helps maintain consistency and accuracy when two systems exchange data using XML.
For example, if we want to ensure that an XML document about people
will always include a name
, address
, email
, and phone number
, we would define those rules through a DTD as shown below:
In the above DTD, defines the elements (tags) that are allowed, like name, address, email, and phone, whereas #PCDATA
stands for parsed people
data, meaning it will consist of just plain text.
Entities
Entities in XML serve as placeholders that help manage and insert large chunks of data or reference external resources, making the XML file easier to manage, especially when data is repeated. There are two main types of entities:
- Internal Entities: These are defined directly within the XML document and can be used to avoid repeating common data. For example:
- External Entities: These refer to data stored outside the XML document, typically in external files or resources. For example, an external entity can be defined like this:
In the provided XML, ext is defined as an external entity. External entities in XML allow the inclusion of external data or resources by referencing their location (usually a URI).
- ext is the name of the external entity. (you can replace ext with any valid name for the entity as long as it follows XML naming rules)
- Key Rules for Naming Entities:
- Must start with a letter or underscore (_).
- Can contain letters, digits, hyphens (-), underscores (_), and periods (.).
- Cannot start with the string xml (case-insensitive) as it is reserved.
- Key Rules for Naming Entities:
- SYSTEM “http://tryhackme.com/robots.txt” specifies the external resource associated with ext.
- SYSTEM indicates a URI is being used to fetch the external content.
- http://tryhackme.com/robots.txt is the location of the resource.
XML External Entities: SYSTEM vs PUBLIC
In XML, external entities can be declared using SYSTEM
or PUBLIC
. Here’s the difference:
1. SYSTEM
- Specifies the location of the external resource using a URI.
- Simple and straightforward, used for most cases.
Example:
2. PUBLIC
- Adds a public identifier (a unique string describing the resource) alongside the URI.
- Often used for standardized resources (e.g., DTDs or schemas).
In XML, external entities can be declared using SYSTEM
or PUBLIC
. Here’s the difference:
Type | Purpose | Example |
---|---|---|
SYSTEM | Direct link to the resource | < !ENTITY example SYSTEM "http://example.com/data.txt" > |
PUBLIC | Descriptive identifier + link | < !ENTITY example PUBLIC "-//Example//Data File//EN" "http://example.com/data.txt" > |
Warning: XML External Entity (XXE) Attacks
This type of XML design is often associated with XML External Entity (XXE) attacks, a vulnerability where an attacker abuses external entities to:
- Fetch sensitive files from the server.
- Make network requests on behalf of the server.
- Potentially execute denial-of-service or other malicious actions.
Recommendation: Ensure that XML parsers are configured securely (e.g., disable external entities if not required).
Exploiting the XXE
How XXE Works
XML is a data format that uses tags to structure information. It can include entities, which are placeholders for data. External entities refer to data located outside the XML document itself (often on a remote server or local file system).
A vulnerable web application might use a function like simplexml_load_string()
in PHP to process XML input. If the application doesn’t properly disable external entity loading (e.g., by using libxml_disable_entity_loader(true)
in PHP), an attacker can inject malicious XML code.
The Attack
Attackers craft a specially designed XML request containing an external entity declaration. For example:
This code defines an entity named payload
that points to the /etc/hosts
file on the server. The <product_id>
tag then uses &payload;
to include the contents of /etc/hosts
.
When the vulnerable application processes this XML, it will resolve the external entity and reveal the contents of /etc/hosts
(which often contains sensitive network configuration details) as part of the response. The attacker could replace /etc/hosts
with other file paths to potentially exfiltrate any file accessible to the web server user.
The Vulnerable Code (PHP Example)
The following PHP code demonstrates the vulnerability:
libxml_disable_entity_loader(false)
explicitly enables the loading of external entities, making the application vulnerable.
Mitigation
The simplest and most effective way to prevent XXE vulnerabilities is to always disable external entity loading in your XML parser. In PHP, use libxml_disable_entity_loader(true)
. Similar settings exist for other programming languages. Thorough input validation is also crucial.