Comparing XML files using text-based diff tools - the quick-and-dirty approach

by acha11 11. May 2009 09:00

Problem

Say you're comparing two XML files - say web.configs taken from two installations of a website, looking the cause of an environment-specific problem. There's nothing more annoying than trawling through a thousand semantically-neutral white-space differences flagged by your  text-based diff tool, looking for a meaningful difference.

For example, these two chunks of XML:  

   <add name="settingname" value="value" />

 

and

   <add

      name="settingname"

       value="value"

    ></add>


mean pretty much the same thing, but to a text comparison tool, they look very different.

Solution

One solution is to transform the XML to a more "canonical" form before diffing. (I know "more canonical" is a bit like "more pregnant", but you get my meaning).

The following chunk of source compiles into a light tool which takes two parameters:

  1. Input XML filename
  2. Output XML filename

The tool loads the input file, and then writes a "canonical" version of its XML.

The output XML is:

  • Uniformly indented based on the number of ancestor elements
  • One attribute per-line (helps out text differs that are very line-oriented)
  • More normalised as far as whitespace like tabs, spaces and newlines goes.

Basically, it's just a sneaky way of eliminating some (but not all) the irrelevant (i.e. non-meaningful, i.e. semantically-neutral) variation between two textual representations of what should be similar underlying XML.

Just in case: the canonicalization I'm talking about here has different goals and applies a different set of transformations than the canonicalization described here.

Source

 

    1 using System;

    2 using System.Collections.Generic;

    3 using System.Linq;

    4 using System.Text;

    5 using System.Xml;

    6 

    7 namespace CanonicaliseXml

    8 {

    9     class Program

   10     {

   11         static void Main(string[] args)

   12         {

   13             // This tool's useful if you're comparing two XML documents that may have different tab-ification or indenting/line-breaking, but derive from a common source, so that their actual content is reasonably similar.

   14             var source = new XmlDocument();

   15 

   16             source.Load(args[0]);

   17 

   18             using (var dest = XmlWriter.Create(args[1], new XmlWriterSettings() { Indent = true, NewLineOnAttributes = true, NewLineHandling = NewLineHandling.Replace }))

   19             {

   20                 source.WriteContentTo(dest);

   21             }

   22         }

   23     }

   24 }

 

 

Tags: ,

Comments are closed

Powered by BlogEngine.NET 1.4.5.0
Theme by Mads Kristensen