XML Injection Explained: Risks, Real Attacks, and Complete Defense Guide

XML Injection is the manipulation of XML input to alter how an application parses or interprets data. It occurs when user-controlled input is inserted into an XML document without proper validation, allowing attackers to inject nodes, attributes, entities, or payloads that modify control flow, bypass logic, or trigger dangerous parser behaviors. In today’s API-heavy and integration-driven ecosystems, XML Injection remains a real-world threat that security teams cannot ignore.

Unlike simple input tampering, XML Injection exploits the expressive power of XML itself, impacting complex systems including SOAP, SAML, IoT devices, enterprise integrations, and legacy financial systems.

Why XML Injection Still Matters

Even though JSON dominates modern applications, XML is deeply embedded in enterprise software, authentication protocols, and backend integrations. Attackers can abuse XML structures to:

Tamper with business logic
Inject unauthorized nodes
Manipulate XPath queries
Trigger XXE-style file disclosure
Break schema validation
Cause XML-based Denial of Service

The flexibility of XML makes its misuse particularly powerful.

Real CVE Example: CVE-2025-13526

CVE-2025-13526 demonstrates how a simple XML parsing misconfiguration can lead to full file disclosure. The system allowed upload of XML configuration files, but failed to disable external entity resolution.

Example malicious payload:

xml

<?xml version="1.0"?><!DOCTYPE data [

<!ENTITY xxe SYSTEM "file:///etc/passwd"> ]><config>

<name>&xxe;</name></config>

The server returned the contents of /etc/passwd, showing how XML Injection combined with XXE can expose critical files.

Attack Surfaces of XML Injection

Attackers abuse:

Node Injection
Attribute Injection
XPath Query Manipulation
Schema (XSD) Manipulation
XXE payloads
Entity expansion for DoS
Logic bypass inside XML-based workflows

Each category impacts a different layer of the system.

Attack Examples (Same quantity preserved)

Node Injection Manipulation

xml

<order><item>ProductA</item></order>

Injected payload:

pgsql

</order><admin>true</admin><order>

Resulting XML becomes structurally corrupted and may grant unauthorized privileges.

XPath Injection

python

query = f"//users/user[username/text()='{user}']"

Malicious input:

bash

' or '1'='1

This exposes unauthorized user records.

XXE Payload

xml

<!DOCTYPE foo [

<!ENTITY payload SYSTEM "file:///etc/hostname"> ]><root>&payload;</root>

Schema Manipulation

xml

<xs:element name="amount" type="xs:string"/>

This disables expected validation behavior.

Billion Laughs DoS

xml

<!ENTITY a "ha"><!ENTITY b "&a;&a;"><!ENTITY c "&b;&b;">

Massive expansion overloads the parser.

One Click PoC via Penligent

Defensive Techniques

Disable entity resolution

python

parser = etree.XMLParser(resolve_entities=False, no_network=True)

Use safe libraries

python

import defusedxml.ElementTree as ET

Strong schema validation

python

schema.validate(input_xml)

Never concatenate XML strings

Safer approach:

python

el = Element("user") name_tag.text = user_input

Comparison Table

Type	Target	Severity	Impact
XML Injection	Structure manipulation	Medium–High	Logic bypass
XPath Injection	Query control	High	Unauthorized access
XXE	Parser misuse	High	File read / SSRF
Schema Injection	Validation bypass	Medium	Integrity risks

XML Injection

XML Injection in Automated Penetration Testing

Modern AI-driven penetration testing platforms—such as Penligent—perform structure mutation, XML fuzzing, XPath payload testing, and parser misconfiguration detection. These platforms:

Discover XML-based attack surfaces
Identify injection points
Auto-learn schema behavior
Generate adaptive payloads
Validate parser behavior under multiple configurations

This significantly expands detection coverage.

Security Baselines and Automated Fix Recommendations

Automated platforms can also provide:

Parser configuration audits
Detection of unsafe XML libraries
Verification of entity resolution settings
Enforcement of strict XSD validation
CI/CD blocking of insecure XML operations

This transforms detection into actionable remediation.

Script for Example

Semgrep Rules — Python XML Injection / XXE / Unsafe Parsing

Create a file:

semgrep-rules/python-xml-injection.yaml

yaml

`rules:

RULE 1: Unsafe use of xml.etree – default parser is vulnerable to XXE

id: python-unsafe-xml-etree severity: ERROR message: | xml.etree.ElementTree is unsafe for untrusted XML. It does NOT disable external entity expansion by default. Use defusedxml instead. metadata: cwe: “CWE-611” owasp: “A04:2021 XML External Entities” patterns:
- pattern: | import xml.etree.ElementTree as ET languages: [python]

RULE 2: Direct use of xml.dom.minidom – unsafe XML parser

id: python-unsafe-xml-minidom severity: ERROR message: | xml.dom.minidom does not disable DTDs or ENTITY expansion. Do NOT use it for untrusted XML. metadata: cwe: “CWE-611” pattern: | import xml.dom.minidom languages: [python]

RULE 3: Dangerous lxml XMLParser with resolve_entities=True

id: python-lxml-resolve-entities-enabled severity: ERROR message: | lxml XMLParser() with resolve_entities=True enables XXE. Set resolve_entities=False and load_dtd=False. metadata: cwe: “CWE-611” pattern: | XMLParser(resolve_entities=True, …) languages: [python]

RULE 4: lxml with load_dtd=True – dangerous

id: python-lxml-load-dtd severity: WARNING message: | load_dtd=True enables DTD processing and may allow entity expansion. Disable unless absolutely required. metadata: cwe: “CWE-611” pattern: | XMLParser(load_dtd=True, …) languages: [python]

RULE 5: Missing safe parser config when using lxml.fromstring()

id: python-lxml-fromstring-no-safe-config severity: WARNING message: | lxml.fromstring() called without a hardened XMLParser. Ensure resolve_entities=False, no_network=True. metadata: cwe: “CWE-611” patterns:
- pattern: | from lxml import etree – pattern: | etree.fromstring($XML) languages: [python]

RULE 6: Not using defusedxml when parsing external XML

id: python-defusedxml-not-used severity: INFO message: | Missing use of defusedxml, recommended for secure XML parsing in Python. metadata: cwe: “CWE-611” pattern-either:
- pattern: | ET.parse($X) – pattern: | etree.parse($X) languages: [python]

RULE 7: xmltodict without forcing defused parser

id: python-xmltodict-unsafe severity: WARNING message: | xmltodict can invoke an underlying XML parser that allows XXE. Use defusedxml parser instead. pattern: | xmltodict.parse($X) languages: [python]

RULE 8: Dangerous use of eval() on XML-derived input

id: python-eval-from-xml severity: CRITICAL message: | eval() on XML-derived input may lead to RCE. Never evaluate parsed XML values directly. metadata: cwe: “CWE-95” owasp: “A03:2021 Injection” patterns:
- pattern: | $VAL = $XML.xpath(…) – pattern: | eval($VAL) languages: [python]

RULE 9: Unsafe pickle.loads on XML-sourced data

id: python-pickle-on-xml severity: CRITICAL message: | pickle.loads() on input derived from XML can lead to arbitrary code execution. Avoid pickle on user data. metadata: cwe: “CWE-502” patterns:
- pattern: | $DATA = etree.fromstring(…) – pattern: | pickle.loads($DATA) languages: [python]`

Complete Python SAST Scanner

This complements Semgrep by scanning for risky patterns including:

Unsafe XML parser imports
ENTITY / DOCTYPE usage
eval, exec, pickle, marshal, yaml.load patterns
XXE triggers in .xml, .xsd, .wsdl files

Create: python_sast_scanner.py

python

`#!/usr/bin/env python3″”” python_sast_scanner.py Lightweight, fast Python-specific SAST scanner specializing in:

XML Injection / XXE
unsafe parser usage
dangerous functions (eval, exec, pickle.loads, yaml.load) Safe for CI; does not execute any code. Outputs JSON and non-zero exit on findings. “””import os, re, json, sys PATTERNS = {# XML / XXE “xml_etree”: r”import\s+xml\.etree\.ElementTree”,”xml_minidom”: r”import\s+xml\.dom\.minidom”,”lxml_resolve_entities”: r”resolve_entities\s*=\sTrue”,”lxml_load_dtd”: r”load_dtd\s=\s*True”,”doctype_in_xml”: r”<!DOCTYPE”,”entity_in_xml”: r”<!ENTITY”,

One Click PoC XML Injection

Dangerous functions

"eval_usage": r"\\beval\\s*\\(","exec_usage": r"\\bexec\\s*\\(","pickle_loads": r"pickle\\.loads\\s*\\(","marshal_loads": r"marshal\\.loads\\s*\\(","yaml_unsafe_load": r"yaml\\.load\\s*\\(",

} IGNORED = {“.git”, “venv”, “env”, “pycache”, “dist”, “build”, “node_modules”} def scan_file(path): findings = []try:with open(path, “r”, encoding=”utf-8″, errors=”ignore”) as f:for i, line in enumerate(f, 1):for rule, regex in PATTERNS.items():if re.search(regex, line): findings.append({“rule”: rule,”line”: line.strip(),”lineno”: i, })except Exception:pass return findings def walk(root=”.”): results={}for dp, dirs, files in os.walk(root): dirs[:] = [d for d in dirs if d not in IGNORED]for f in files:if f.endswith((“.py”,”.xml”,”.xsd”,”.wsdl”)): full = os.path.join(dp, f) hits = scan_file(full)if hits: results[full] = hitsreturn results def main(): root = sys.argv[1] if len(sys.argv)>1 else “.” results = walk(root)print(json.dumps({“results”: results, “file_count”: len(results)}, indent=2))if results: sys.exit(3) sys.exit(0) if name == “main”: main()`

What This Covers – Python SAST Scope

XML Injection / XXE

✔ Unsafe XML parsers ✔ lxml with resolve_entities=True ✔ DTD loading (potentially dangerous) ✔ XXE markers in stored XML files

Code injection

✔ eval() ✔ exec()

Unsafe deserialization

✔ pickle.loads() ✔ marshal.loads() ✔ unsafe yaml.load()

Other unsafe patterns

✔ ENTITY / DOCTYPE in .xml, .xsd, .wsdl ✔ xmltodict without hardened parser

Conclusion

XML Injection remains a relevant and dangerous vulnerability category. Its impact ranges from business logic bypass to file disclosure and denial of service. Understanding XML behaviors, securing parsers, validating structure, and incorporating automated penetration testing are essential steps in ensuring robust application security.

Share the Post:

ZoneMinder Security, What Security Engineers Need to Know About the Open Source CCTV Stack

ZoneMinder still matters because it sits at the intersection of three things defenders routinely underestimate: physical security, web application security,

CVE-2026-21385 — The Qualcomm Android Flaw Security Teams Should Treat as an Incident, Not a Footnote

Most vulnerability write-ups collapse into one of two bad habits. They either dramatize every fresh CVE as the next industry-wide

XML Injection Explained: Risks, Real Attacks, and Complete Defense Guide

Why XML Injection Still Matters

Real CVE Example: CVE-2025-13526

Attack Surfaces of XML Injection

Attack Examples (Same quantity preserved)

Defensive Techniques

Disable entity resolution

Use safe libraries

Strong schema validation

Never concatenate XML strings

Comparison Table

XML Injection in Automated Penetration Testing

Security Baselines and Automated Fix Recommendations

Script for Example

Semgrep Rules — Python XML Injection / XXE / Unsafe Parsing

RULE 1: Unsafe use of xml.etree – default parser is vulnerable to XXE

RULE 2: Direct use of xml.dom.minidom – unsafe XML parser

RULE 3: Dangerous lxml XMLParser with resolve_entities=True

RULE 4: lxml with load_dtd=True – dangerous

RULE 5: Missing safe parser config when using lxml.fromstring()

RULE 6: Not using defusedxml when parsing external XML

RULE 7: xmltodict without forcing defused parser

RULE 8: Dangerous use of eval() on XML-derived input

RULE 9: Unsafe pickle.loads on XML-sourced data

Complete Python SAST Scanner

Dangerous functions

What This Covers – Python SAST Scope

XML Injection / XXE

Code injection

Unsafe deserialization

Other unsafe patterns

Conclusion

Related Posts

ZoneMinder Security, What Security Engineers Need to Know About the Open Source CCTV Stack

CVE-2026-21385 — The Qualcomm Android Flaw Security Teams Should Treat as an Incident, Not a Footnote