XML Injection is the manipulation of XML input to alter how an application parses or interprets data. It occurs when user-controlled input is inserted into an XML document without proper validation, allowing attackers to inject nodes, attributes, entities, or payloads that modify control flow, bypass logic, or trigger dangerous parser behaviors. In today’s API-heavy and integration-driven ecosystems, XML Injection remains a real-world threat that security teams cannot ignore.
Unlike simple input tampering, XML Injection exploits the expressive power of XML itself, impacting complex systems including SOAP, SAML, IoT devices, enterprise integrations, and legacy financial systems.
Why XML Injection Still Matters
Even though JSON dominates modern applications, XML is deeply embedded in enterprise software, authentication protocols, and backend integrations. Attackers can abuse XML structures to:
- Tamper with business logic
- Inject unauthorized nodes
- Manipulate XPath queries
- Trigger XXE-style file disclosure
- Break schema validation
- Cause XML-based Denial of Service
The flexibility of XML makes its misuse particularly powerful.
Real CVE Example: CVE-2025-13526
CVE-2025-13526 demonstrates how a simple XML parsing misconfiguration can lead to full file disclosure. The system allowed upload of XML configuration files, but failed to disable external entity resolution.
Example malicious payload:
xml
<?xml version="1.0"?><!DOCTYPE data [
<!ENTITY xxe SYSTEM "file:///etc/passwd"> ]><config>
<name>&xxe;</name></config>
The server returned the contents of /etc/passwd, showing how XML Injection combined with XXE can expose critical files.
Attack Surfaces of XML Injection
Attackers abuse:
- Node Injection
- Attribute Injection
- XPath Query Manipulation
- Schema (XSD) Manipulation
- XXE payloads
- Entity expansion for DoS
- Logic bypass inside XML-based workflows
Each category impacts a different layer of the system.
Attack Examples (Same quantity preserved)
- Node Injection Manipulation
xml
<order><item>ProductA</item></order>
Injected payload:
pgsql
</order><admin>true</admin><order>
Resulting XML becomes structurally corrupted and may grant unauthorized privileges.
XPath Injection
python
query = f"//users/user[username/text()='{user}']"
Malicious input:
bash
' or '1'='1
This exposes unauthorized user records.
XXE Payload
xml
<!DOCTYPE foo [
<!ENTITY payload SYSTEM "file:///etc/hostname"> ]><root>&payload;</root>
Schema Manipulation
xml
<xs:element name="amount" type="xs:string"/>
This disables expected validation behavior.
Billion Laughs DoS
xml
<!ENTITY a "ha"><!ENTITY b "&a;&a;"><!ENTITY c "&b;&b;">
Massive expansion overloads the parser.
Defensive Techniques
Disable entity resolution
python
parser = etree.XMLParser(resolve_entities=False, no_network=True)
Use safe libraries
python
import defusedxml.ElementTree as ET
Strong schema validation
python
schema.validate(input_xml)
Never concatenate XML strings
Safer approach:
python
el = Element("user") name_tag.text = user_input
Comparison Table
| Type | Target | Severity | Impact |
|---|---|---|---|
| XML Injection | Structure manipulation | Medium–High | Logic bypass |
| XPath Injection | Query control | High | Unauthorized access |
| XXE | Parser misuse | High | File read / SSRF |
| Schema Injection | Validation bypass | Medium | Integrity risks |

XML Injection in Automated Penetration Testing
Modern AI-driven penetration testing platforms—such as Penligent—perform structure mutation, XML fuzzing, XPath payload testing, and parser misconfiguration detection. These platforms:
- Discover XML-based attack surfaces
- Identify injection points
- Auto-learn schema behavior
- Generate adaptive payloads
- Validate parser behavior under multiple configurations
This significantly expands detection coverage.
Security Baselines and Automated Fix Recommendations
Automated platforms can also provide:
- Parser configuration audits
- Detection of unsafe XML libraries
- Verification of entity resolution settings
- Enforcement of strict XSD validation
- CI/CD blocking of insecure XML operations
This transforms detection into actionable remediation.
Script for Example
Semgrep Rules — Python XML Injection / XXE / Unsafe Parsing
Create a file:
semgrep-rules/python-xml-injection.yaml
yaml
`rules:
RULE 1: Unsafe use of xml.etree – default parser is vulnerable to XXE
- id: python-unsafe-xml-etree severity: ERROR message: | xml.etree.ElementTree is unsafe for untrusted XML. It does NOT disable external entity expansion by default. Use defusedxml instead. metadata: cwe: “CWE-611” owasp: “A04:2021 XML External Entities” patterns:
- pattern: | import xml.etree.ElementTree as ET languages: [python]
RULE 2: Direct use of xml.dom.minidom – unsafe XML parser
- id: python-unsafe-xml-minidom severity: ERROR message: | xml.dom.minidom does not disable DTDs or ENTITY expansion. Do NOT use it for untrusted XML. metadata: cwe: “CWE-611” pattern: | import xml.dom.minidom languages: [python]
RULE 3: Dangerous lxml XMLParser with resolve_entities=True
- id: python-lxml-resolve-entities-enabled severity: ERROR message: | lxml XMLParser() with resolve_entities=True enables XXE. Set resolve_entities=False and load_dtd=False. metadata: cwe: “CWE-611” pattern: | XMLParser(resolve_entities=True, …) languages: [python]
RULE 4: lxml with load_dtd=True – dangerous
- id: python-lxml-load-dtd severity: WARNING message: | load_dtd=True enables DTD processing and may allow entity expansion. Disable unless absolutely required. metadata: cwe: “CWE-611” pattern: | XMLParser(load_dtd=True, …) languages: [python]
RULE 5: Missing safe parser config when using lxml.fromstring()
- id: python-lxml-fromstring-no-safe-config severity: WARNING message: | lxml.fromstring() called without a hardened XMLParser. Ensure resolve_entities=False, no_network=True. metadata: cwe: “CWE-611” patterns:
- pattern: | from lxml import etree – pattern: | etree.fromstring($XML) languages: [python]
RULE 6: Not using defusedxml when parsing external XML
- id: python-defusedxml-not-used severity: INFO message: | Missing use of defusedxml, recommended for secure XML parsing in Python. metadata: cwe: “CWE-611” pattern-either:
- pattern: | ET.parse($X) – pattern: | etree.parse($X) languages: [python]
RULE 7: xmltodict without forcing defused parser
- id: python-xmltodict-unsafe severity: WARNING message: | xmltodict can invoke an underlying XML parser that allows XXE. Use
defusedxmlparser instead. pattern: | xmltodict.parse($X) languages: [python]
RULE 8: Dangerous use of eval() on XML-derived input
- id: python-eval-from-xml severity: CRITICAL message: | eval() on XML-derived input may lead to RCE. Never evaluate parsed XML values directly. metadata: cwe: “CWE-95” owasp: “A03:2021 Injection” patterns:
- pattern: | $VAL = $XML.xpath(…) – pattern: | eval($VAL) languages: [python]
RULE 9: Unsafe pickle.loads on XML-sourced data
- id: python-pickle-on-xml severity: CRITICAL message: | pickle.loads() on input derived from XML can lead to arbitrary code execution. Avoid pickle on user data. metadata: cwe: “CWE-502” patterns:
- pattern: | $DATA = etree.fromstring(…) – pattern: | pickle.loads($DATA) languages: [python]`
Complete Python SAST Scanner
This complements Semgrep by scanning for risky patterns including:
- Unsafe XML parser imports
- ENTITY / DOCTYPE usage
eval,exec,pickle,marshal,yaml.loadpatterns- XXE triggers in
.xml,.xsd,.wsdlfiles
Create: python_sast_scanner.py
python
`#!/usr/bin/env python3″”” python_sast_scanner.py Lightweight, fast Python-specific SAST scanner specializing in:
- XML Injection / XXE
- unsafe parser usage
- dangerous functions (eval, exec, pickle.loads, yaml.load) Safe for CI; does not execute any code. Outputs JSON and non-zero exit on findings. “””import os, re, json, sys PATTERNS = {# XML / XXE “xml_etree”: r”import\s+xml\.etree\.ElementTree”,”xml_minidom”: r”import\s+xml\.dom\.minidom”,”lxml_resolve_entities”: r”resolve_entities\s*=\sTrue”,”lxml_load_dtd”: r”load_dtd\s=\s*True”,”doctype_in_xml”: r”<!DOCTYPE”,”entity_in_xml”: r”<!ENTITY”,

Dangerous functions
"eval_usage": r"\\beval\\s*\\(","exec_usage": r"\\bexec\\s*\\(","pickle_loads": r"pickle\\.loads\\s*\\(","marshal_loads": r"marshal\\.loads\\s*\\(","yaml_unsafe_load": r"yaml\\.load\\s*\\(",
} IGNORED = {“.git”, “venv”, “env”, “pycache”, “dist”, “build”, “node_modules”} def scan_file(path): findings = []try:with open(path, “r”, encoding=”utf-8″, errors=”ignore”) as f:for i, line in enumerate(f, 1):for rule, regex in PATTERNS.items():if re.search(regex, line): findings.append({“rule”: rule,”line”: line.strip(),”lineno”: i, })except Exception:pass return findings def walk(root=”.”): results={}for dp, dirs, files in os.walk(root): dirs[:] = [d for d in dirs if d not in IGNORED]for f in files:if f.endswith((“.py”,”.xml”,”.xsd”,”.wsdl”)): full = os.path.join(dp, f) hits = scan_file(full)if hits: results[full] = hitsreturn results def main(): root = sys.argv[1] if len(sys.argv)>1 else “.” results = walk(root)print(json.dumps({“results”: results, “file_count”: len(results)}, indent=2))if results: sys.exit(3) sys.exit(0) if name == “main”: main()`
What This Covers – Python SAST Scope
XML Injection / XXE
✔ Unsafe XML parsers ✔ lxml with resolve_entities=True ✔ DTD loading (potentially dangerous) ✔ XXE markers in stored XML files
Code injection
✔ eval() ✔ exec()
Unsafe deserialization
✔ pickle.loads() ✔ marshal.loads() ✔ unsafe yaml.load()
Other unsafe patterns
✔ ENTITY / DOCTYPE in .xml, .xsd, .wsdl ✔ xmltodict without hardened parser
Conclusion
XML Injection remains a relevant and dangerous vulnerability category. Its impact ranges from business logic bypass to file disclosure and denial of service. Understanding XML behaviors, securing parsers, validating structure, and incorporating automated penetration testing are essential steps in ensuring robust application security.

