TABLE OF CONTENT 📌

Extensible Markup Language (XML)

XML is a structured format for storing and transporting data that is easy for both humans and machines to understand. It uses tags to label and organize information, similar to how folders organize documents in a filing cabinet. For example, the following XML snippet represents a person’s details:

<people> 
	<name>Glitch</name> 
	<address>Wareville</address> 
	<email>glitch@wareville.com</email> 
	<phone>111000</phone> 
</people>

Document Type Definition (DTD)

A Document Type Definition (DTD) defines the structure and rules of an XML document, acting as a blueprint or guideline. Similar to a database schema, it specifies which elements (tags) and attributes are allowed, ensuring the XML document adheres to a specific format. This helps maintain consistency and accuracy when two systems exchange data using XML.

For example, if we want to ensure that an XML document about people will always include a nameaddressemail, and phone number, we would define those rules through a DTD as shown below:

<!DOCTYPE people [ 
	<!ELEMENT people(name, address, email, phone)> 
	<!ELEMENT name (#PCDATA)> 
	<!ELEMENT address (#PCDATA)> 
	<!ELEMENT email (#PCDATA)> 
	<!ELEMENT phone (#PCDATA)> 
]>

In the above DTD,   defines the elements (tags) that are allowed, like name, address, email, and phone, whereas #PCDATA stands for parsed people data, meaning it will consist of just plain text.


Entities

Entities in XML serve as placeholders that help manage and insert large chunks of data or reference external resources, making the XML file easier to manage, especially when data is repeated. There are two main types of entities:

  1. Internal Entities: These are defined directly within the XML document and can be used to avoid repeating common data. For example:
<!ENTITY phoneNumber "123-456-7890">
  1. External Entities: These refer to data stored outside the XML document, typically in external files or resources. For example, an external entity can be defined like this:
<!DOCTYPE people [
   <!ENTITY ext SYSTEM "http://tryhackme.com/robots.txt">
]>
<people>
   <name>Glitch</name>
   <address>&ext;</address>
   <email>glitch@wareville.com</email>
   <phone>111000</phone>
</people>

In the provided XML, ext is defined as an external entity. External entities in XML allow the inclusion of external data or resources by referencing their location (usually a URI).

  • ext is the name of the external entity. (you can replace ext with any valid name for the entity as long as it follows XML naming rules)
    • Key Rules for Naming Entities:
      1. Must start with a letter or underscore (_).
      2. Can contain letters, digits, hyphens (-), underscores (_), and periods (.).
      3. Cannot start with the string xml (case-insensitive) as it is reserved.
  • SYSTEM “http://tryhackme.com/robots.txt” specifies the external resource associated with ext.
  • SYSTEM indicates a URI is being used to fetch the external content.
  • http://tryhackme.com/robots.txt is the location of the resource.

XML External Entities: SYSTEM vs PUBLIC

In XML, external entities can be declared using SYSTEM or PUBLIC. Here’s the difference:

1. SYSTEM
  • Specifies the location of the external resource using a URI.
  • Simple and straightforward, used for most cases.

Example:

<!ENTITY example SYSTEM "http://example.com/data.txt">
2. PUBLIC
  • Adds a public identifier (a unique string describing the resource) alongside the URI.
  • Often used for standardized resources (e.g., DTDs or schemas).
<!ENTITY example PUBLIC "-//Example//Data File//EN" "http://example.com/data.txt">

In XML, external entities can be declared using SYSTEM or PUBLIC. Here’s the difference:

TypePurposeExample
SYSTEMDirect link to the resource< !ENTITY example SYSTEM "http://example.com/data.txt" >
PUBLICDescriptive identifier + link< !ENTITY example PUBLIC "-//Example//Data File//EN" "http://example.com/data.txt" >

Warning: XML External Entity (XXE) Attacks

This type of XML design is often associated with XML External Entity (XXE) attacks, a vulnerability where an attacker abuses external entities to:

  • Fetch sensitive files from the server.
  • Make network requests on behalf of the server.
  • Potentially execute denial-of-service or other malicious actions.

Recommendation: Ensure that XML parsers are configured securely (e.g., disable external entities if not required).


Exploiting the XXE

img

How XXE Works

XML is a data format that uses tags to structure information. It can include entities, which are placeholders for data. External entities refer to data located outside the XML document itself (often on a remote server or local file system).

A vulnerable web application might use a function like simplexml_load_string() in PHP to process XML input. If the application doesn’t properly disable external entity loading (e.g., by using libxml_disable_entity_loader(true) in PHP), an attacker can inject malicious XML code.


The Attack

img

Attackers craft a specially designed XML request containing an external entity declaration. For example:

<!DOCTYPE foo [<!ENTITY payload SYSTEM "/etc/hosts"> ]>
<wishlist>
  <user_id>1</user_id>
  <item>
    <product_id>&payload;</product_id>
  </item>
</wishlist>

This code defines an entity named payload that points to the /etc/hosts file on the server. The <product_id> tag then uses &payload; to include the contents of /etc/hosts.

When the vulnerable application processes this XML, it will resolve the external entity and reveal the contents of /etc/hosts (which often contains sensitive network configuration details) as part of the response. The attacker could replace /etc/hosts with other file paths to potentially exfiltrate any file accessible to the web server user.


The Vulnerable Code (PHP Example)

The following PHP code demonstrates the vulnerability:

<?php
libxml_disable_entity_loader(false); // This line is the vulnerability!
$xml_data = $_POST['xml']; // XML data received from user
$wishlist = simplexml_load_string($xml_data, "SimpleXMLElement", LIBXML_NOENT);
// ... further processing ...
?>

libxml_disable_entity_loader(false) explicitly enables the loading of external entities, making the application vulnerable.


Mitigation

The simplest and most effective way to prevent XXE vulnerabilities is to always disable external entity loading in your XML parser. In PHP, use libxml_disable_entity_loader(true). Similar settings exist for other programming languages. Thorough input validation is also crucial.