PrestoSoft Blog :: Home

Tuesday, December 29, 2020

ExamDiff Pro 12.0: New and Improved Diff Algorithms

ExamDiff Pro 12.0 will feature the biggest improvement to the core diff algorithm since fuzzy line matching was introduced in version 4.5, with the addition of diff algorithm customization options and diff block alignment optimization.

That's kind of a mouthful, so let's illustrate it with an example.

Diff Alignment Optimization

Here's an example of a comparison that was not quite ideal before:

Note that the boundary of the deleted block here is a little surprising: rather than capturing the whole comment above the deleted method, ExamDiff Pro is trying to match the initial <summary> line to the equivalent line in the next comment. It's not wrong exactly, but it looks a little weird and makes it harder to reason about what's actually happening.

To better resolve issues like this, ExamDiff Pro 12.0 introduces a new option called Optimize Diff Block Alignment in a new Diff Algorithm section of the Text Compare | Advanced settings page:

With this option turned on, ExamDiff Pro uses a new heuristic for determining the boundaries of diff blocks, based on the open-source work done in diff-slider-tools. Check out how our problem spot looks now:

Much better! The diff block now covers the whole method and comment as we would expect.

Note that there's a pink diff bar on the left, indicating that this is a moved block rather than a deleted block - ExamDiff Pro was not previously able to detect this as a moved block because its boundaries were a little off and didn't match those of the corresponding block in the other file. We can now right-click on the diff bar and click Locate Matching Moved Block to see the corresponding block in the right file:

Pretty cool: a seemingly small tweak to the diff block alignment has given us a better understanding of what's happening in this given situation.

Because the Optimize diff block alignment feature almost always leads to better-looking comparison results, it is on by default, but can always be turned off in the Text Compare | Advanced options panel.

Diff Algorithm Selection (advanced feature)

In addition to this option, there is now also the option to choose between six different diff algorithms, including the Classic (default) algorithm and Line-by-line (previously enabled by a separate Force line-by-line comparison option), as well as 4 new algorithms brought in from the LibXDiff open-source library: Myers, Minimal, Patience, and Histogram:

This feature is intended for advanced users only. In general, we have found that ExamDiff Pro's Classic algorithm (which is itself a heavily modified version of Myers) gives the best results in most situations, but it's possible that one of the alternative diff algorithms could give you better results in some cases. This paper by Nugroho, Hata, and Matsumoto gives a good overview of the Myers, Minimal, Patience, and Histogram algorithms, and this blog post by Lup Peng offers some examples of when each of these four algorithms can be helpful.

Labels: , ,

0 Comments:

Post a Comment

<< Home