Penligent Header

XML Injection Explained: Risks, Real Attacks, and Complete Defense Guide

XML Injection is the manipulation of XML input to alter how an application parses or interprets data. It occurs when user-controlled input is inserted into an XML document without proper validation, allowing attackers to inject nodes, attributes, entities, or payloads that modify control flow, bypass logic, or trigger dangerous parser behaviors. In today’s API-heavy and integration-driven ecosystems, XML Injection remains a real-world threat that security teams cannot ignore.

Unlike simple input tampering, XML Injection exploits the expressive power of XML itself, impacting complex systems including SOAP, SAML, IoT devices, enterprise integrations, and legacy financial systems.

Why XML Injection Still Matters

Even though JSON dominates modern applications, XML is deeply embedded in enterprise software, authentication protocols, and backend integrations. Attackers can abuse XML structures to:

  • Tamper with business logic
  • Inject unauthorized nodes
  • Manipulate XPath queries
  • Trigger XXE-style file disclosure
  • Break schema validation
  • Cause XML-based Denial of Service

The flexibility of XML makes its misuse particularly powerful.

Real CVE Example: CVE-2025-13526

CVE-2025-13526 demonstrates how a simple XML parsing misconfiguration can lead to full file disclosure. The system allowed upload of XML configuration files, but failed to disable external entity resolution.

Example malicious payload:

xml

<?xml version="1.0"?><!DOCTYPE data [

<!ENTITY xxe SYSTEM "file:///etc/passwd"> ]><config>

<name>&xxe;</name></config>

The server returned the contents of /etc/passwd, showing how XML Injection combined with XXE can expose critical files.

Attack Surfaces of XML Injection

Attackers abuse:

  • Node Injection
  • Attribute Injection
  • XPath Query Manipulation
  • Schema (XSD) Manipulation
  • XXE payloads
  • Entity expansion for DoS
  • Logic bypass inside XML-based workflows

Each category impacts a different layer of the system.

Attack Examples (Same quantity preserved)

  1. Node Injection Manipulation

xml

<order><item>ProductA</item></order>

Injected payload:

pgsql

</order><admin>true</admin><order>

Resulting XML becomes structurally corrupted and may grant unauthorized privileges.

XPath Injection

python

query = f"//users/user[username/text()='{user}']"

Malicious input:

bash

' or '1'='1

This exposes unauthorized user records.

XXE Payload

xml

<!DOCTYPE foo [

<!ENTITY payload SYSTEM "file:///etc/hostname"> ]><root>&payload;</root>

Schema Manipulation

xml

<xs:element name="amount" type="xs:string"/>

This disables expected validation behavior.

Billion Laughs DoS

xml

<!ENTITY a "ha"><!ENTITY b "&a;&a;"><!ENTITY c "&b;&b;">

Massive expansion overloads the parser.

Defensive Techniques

Disable entity resolution

python

parser = etree.XMLParser(resolve_entities=False, no_network=True)

Use safe libraries

python

import defusedxml.ElementTree as ET

Strong schema validation

python

schema.validate(input_xml)

Never concatenate XML strings

Safer approach:

python

el = Element("user") name_tag.text = user_input

Comparison Table

TypeTargetSeverityImpact
XML InjectionStructure manipulationMedium–HighLogic bypass
XPath InjectionQuery controlHighUnauthorized access
XXEParser misuseHighFile read / SSRF
Schema InjectionValidation bypassMediumIntegrity risks
XML Injection

XML Injection in Automated Penetration Testing

Modern AI-driven penetration testing platforms—such as Penligent—perform structure mutation, XML fuzzing, XPath payload testing, and parser misconfiguration detection. These platforms:

  • Discover XML-based attack surfaces
  • Identify injection points
  • Auto-learn schema behavior
  • Generate adaptive payloads
  • Validate parser behavior under multiple configurations

This significantly expands detection coverage.

Security Baselines and Automated Fix Recommendations

Automated platforms can also provide:

  • Parser configuration audits
  • Detection of unsafe XML libraries
  • Verification of entity resolution settings
  • Enforcement of strict XSD validation
  • CI/CD blocking of insecure XML operations

This transforms detection into actionable remediation.

Script for Example

Semgrep Rules — Python XML Injection / XXE / Unsafe Parsing

Create a file:

semgrep-rules/python-xml-injection.yaml

yaml

`rules:

RULE 1: Unsafe use of xml.etree – default parser is vulnerable to XXE

  • id: python-unsafe-xml-etree severity: ERROR message: | xml.etree.ElementTree is unsafe for untrusted XML. It does NOT disable external entity expansion by default. Use defusedxml instead. metadata: cwe: “CWE-611” owasp: “A04:2021 XML External Entities” patterns:
    • pattern: | import xml.etree.ElementTree as ET languages: [python]

RULE 2: Direct use of xml.dom.minidom – unsafe XML parser

  • id: python-unsafe-xml-minidom severity: ERROR message: | xml.dom.minidom does not disable DTDs or ENTITY expansion. Do NOT use it for untrusted XML. metadata: cwe: “CWE-611” pattern: | import xml.dom.minidom languages: [python]

RULE 3: Dangerous lxml XMLParser with resolve_entities=True

  • id: python-lxml-resolve-entities-enabled severity: ERROR message: | lxml XMLParser() with resolve_entities=True enables XXE. Set resolve_entities=False and load_dtd=False. metadata: cwe: “CWE-611” pattern: | XMLParser(resolve_entities=True, …) languages: [python]

RULE 4: lxml with load_dtd=True – dangerous

  • id: python-lxml-load-dtd severity: WARNING message: | load_dtd=True enables DTD processing and may allow entity expansion. Disable unless absolutely required. metadata: cwe: “CWE-611” pattern: | XMLParser(load_dtd=True, …) languages: [python]

RULE 5: Missing safe parser config when using lxml.fromstring()

  • id: python-lxml-fromstring-no-safe-config severity: WARNING message: | lxml.fromstring() called without a hardened XMLParser. Ensure resolve_entities=False, no_network=True. metadata: cwe: “CWE-611” patterns:
    • pattern: | from lxml import etree – pattern: | etree.fromstring($XML) languages: [python]

RULE 6: Not using defusedxml when parsing external XML

  • id: python-defusedxml-not-used severity: INFO message: | Missing use of defusedxml, recommended for secure XML parsing in Python. metadata: cwe: “CWE-611” pattern-either:
    • pattern: | ET.parse($X) – pattern: | etree.parse($X) languages: [python]

RULE 7: xmltodict without forcing defused parser

  • id: python-xmltodict-unsafe severity: WARNING message: | xmltodict can invoke an underlying XML parser that allows XXE. Use defusedxml parser instead. pattern: | xmltodict.parse($X) languages: [python]

RULE 8: Dangerous use of eval() on XML-derived input

  • id: python-eval-from-xml severity: CRITICAL message: | eval() on XML-derived input may lead to RCE. Never evaluate parsed XML values directly. metadata: cwe: “CWE-95” owasp: “A03:2021 Injection” patterns:
    • pattern: | $VAL = $XML.xpath(…) – pattern: | eval($VAL) languages: [python]

RULE 9: Unsafe pickle.loads on XML-sourced data

  • id: python-pickle-on-xml severity: CRITICAL message: | pickle.loads() on input derived from XML can lead to arbitrary code execution. Avoid pickle on user data. metadata: cwe: “CWE-502” patterns:
    • pattern: | $DATA = etree.fromstring(…) – pattern: | pickle.loads($DATA) languages: [python]`

Complete Python SAST Scanner

This complements Semgrep by scanning for risky patterns including:

  • Unsafe XML parser imports
  • ENTITY / DOCTYPE usage
  • eval, exec, pickle, marshal, yaml.load patterns
  • XXE triggers in .xml, .xsd, .wsdl files

Create: python_sast_scanner.py

python

`#!/usr/bin/env python3″”” python_sast_scanner.py Lightweight, fast Python-specific SAST scanner specializing in:

  • XML Injection / XXE
  • unsafe parser usage
  • dangerous functions (eval, exec, pickle.loads, yaml.load) Safe for CI; does not execute any code. Outputs JSON and non-zero exit on findings. “””import os, re, json, sys PATTERNS = {# XML / XXE “xml_etree”: r”import\s+xml\.etree\.ElementTree”,”xml_minidom”: r”import\s+xml\.dom\.minidom”,”lxml_resolve_entities”: r”resolve_entities\s*=\sTrue”,”lxml_load_dtd”: r”load_dtd\s=\s*True”,”doctype_in_xml”: r”<!DOCTYPE”,”entity_in_xml”: r”<!ENTITY”,
One Click PoC XML Injection

Dangerous functions

"eval_usage": r"\\beval\\s*\\(","exec_usage": r"\\bexec\\s*\\(","pickle_loads": r"pickle\\.loads\\s*\\(","marshal_loads": r"marshal\\.loads\\s*\\(","yaml_unsafe_load": r"yaml\\.load\\s*\\(",

} IGNORED = {“.git”, “venv”, “env”, “pycache”, “dist”, “build”, “node_modules”} def scan_file(path): findings = []try:with open(path, “r”, encoding=”utf-8″, errors=”ignore”) as f:for i, line in enumerate(f, 1):for rule, regex in PATTERNS.items():if re.search(regex, line): findings.append({“rule”: rule,”line”: line.strip(),”lineno”: i, })except Exception:pass return findings def walk(root=”.”): results={}for dp, dirs, files in os.walk(root): dirs[:] = [d for d in dirs if d not in IGNORED]for f in files:if f.endswith((“.py”,”.xml”,”.xsd”,”.wsdl”)): full = os.path.join(dp, f) hits = scan_file(full)if hits: results[full] = hitsreturn results def main(): root = sys.argv[1] if len(sys.argv)>1 else “.” results = walk(root)print(json.dumps({“results”: results, “file_count”: len(results)}, indent=2))if results: sys.exit(3) sys.exit(0) if name == “main”: main()`

What This Covers – Python SAST Scope

XML Injection / XXE

✔ Unsafe XML parsers ✔ lxml with resolve_entities=True ✔ DTD loading (potentially dangerous) ✔ XXE markers in stored XML files

Code injection

eval()exec()

Unsafe deserialization

pickle.loads()marshal.loads() ✔ unsafe yaml.load()

Other unsafe patterns

✔ ENTITY / DOCTYPE in .xml, .xsd, .wsdl ✔ xmltodict without hardened parser

Conclusion

XML Injection remains a relevant and dangerous vulnerability category. Its impact ranges from business logic bypass to file disclosure and denial of service. Understanding XML behaviors, securing parsers, validating structure, and incorporating automated penetration testing are essential steps in ensuring robust application security.

Share the Post:
Related Posts