Problem
Say you're comparing two XML files - say web.configs taken from two installations of a website, looking the cause of an environment-specific problem. There's nothing more annoying than trawling through a thousand semantically-neutral white-space differences flagged by your text-based diff tool, looking for a meaningful difference.
For example, these two chunks of XML:
<add name="settingname" value="value" />
and
<add
name="settingname"
value="value"
></add>
mean pretty much the same thing, but to a text comparison tool, they look very different.
Solution
One solution is to transform the XML to a more "canonical" form before diffing. (I know "more canonical" is a bit like "more pregnant", but you get my meaning).
The following chunk of source compiles into a light tool which takes two parameters:
- Input XML filename
- Output XML filename
The tool loads the input file, and then writes a "canonical" version of its XML.
The output XML is:
- Uniformly indented based on the number of ancestor elements
- One attribute per-line (helps out text differs that are very line-oriented)
- More normalised as far as whitespace like tabs, spaces and newlines goes.
Basically, it's just a sneaky way of eliminating some (but not all) the irrelevant (i.e. non-meaningful, i.e. semantically-neutral) variation between two textual representations of what should be similar underlying XML.
Just in case: the canonicalization I'm talking about here has different goals and applies a different set of transformations than the canonicalization described
here.
Source
1 using System;
2 using System.Collections.Generic;
3 using System.Linq;
4 using System.Text;
5 using System.Xml;
6
7 namespace CanonicaliseXml
8 {
9 class Program
10 {
11 static void Main(string[] args)
12 {
13 // This tool's useful if you're comparing two XML documents that may have different tab-ification or indenting/line-breaking, but derive from a common source, so that their actual content is reasonably similar.
14 var source = new XmlDocument();
15
16 source.Load(args[0]);
17
18 using (var dest = XmlWriter.Create(args[1], new XmlWriterSettings() { Indent = true, NewLineOnAttributes = true, NewLineHandling = NewLineHandling.Replace }))
19 {
20 source.WriteContentTo(dest);
21 }
22 }
23 }
24 }