← til

Git diff algorithm options

February 3, 2019
git

I've recently refactored a method with a large case statement and wanted to check a diff of the changes.

I describe the refactoring in another post, but the original method looked something like this:

def parse_word(word)
  case word
  when URI::DEFAULT_PARSER.make_regexp
    request.url = word
  when /\A(-H|--header)\Z/
    parse_header
  when /\A--compressed\Z/
    request.headers['Accept-Encoding'] ||= 'deflate, gzip'
  # ... (more conditionals) ...
  end
end

I've replaced the case with a single module method call and created a bunch of classes, to match each of the scenarios I'd like to cover.

module Word
  def self.parse(request, words, word)
    base_class = Word::Base.new(request, words)
    subclass = base_class.subclasses
                         .detect { |klass| klass === word }
    subclass&.new(words, request)&.parse(word)
  end

  class Base
    def initialize(words, request)
      @words = words
      @request = request
    end

    # List of subclasses, first-inherited class appears first.
    def subclasses
      ObjectSpace.each_object(Base.singleton_class).
        reject { |klass| klass == Base }.reverse
    end

    def self.===(word)
      regex =~ word
    end

    def self.regex
      raise NotImplementedError, "#{name} doesn't have regex defined"
    end

    private

    attr_reader :words
    attr_reader :request
  end

  class Header < Base; end
  class Compressed < Base; end
  # ...(bunch of other class definitions)...
end

def parse_word(word)
  Word.parse(request, words, word)
end

Git uses Myers' algorithm as a default for diffs, which gave me the following output:

-    def parse_word(word)
-      case word
-      when URI::DEFAULT_PARSER.make_regexp
-        request.url = word
-      when /\A(-H|--header)\Z/
-        parse_header
-      when /\A--compressed\Z/
+    module Word
+      def self.parse(request, words, word)
+        base_class = Word::Base.new(request, words)
+        subclass = base_class.subclasses
+                             .detect { |klass| klass === word }
+        subclass&.new(words, request)&.parse(word)
+      end
+
+      class Base
+        def initialize(words, request)
+          @words = words
+          @request = request
+        end
+
+        # List of subclasses, first-inherited class appears first.
+        def subclasses
+          ObjectSpace.each_object(Base.singleton_class).reject { |klass| klass == Base }.reverse
+        end
+
+        def self.===(word)
+          regex =~ word
+        end
+
+        def self.regex
+          raise NotImplementedError, "#{name} doesn't have regex defined"
+        end
+
+        private
+
+        attr_reader :words
+        attr_reader :request
+      end
+
+      class Header < Base; end
+      class Compressed < Base
+        def self.regex
+          /\A--compressed\Z/.freeze
+        end
+
+        def parse(*)
           request.headers['Accept-Encoding'] ||= 'deflate, gzip'
-      case /\A-X|\A--request\Z/
+        end
+      end
+
+      class Request < Base
+        def self.regex

Notice how diff output breaks the old method and inserts the new code in the middle of it. Super ugly. I've recently discovered that the diff command supports some options you can pass it.

So entering this in the console:

git diff --histogram

Gave me:

-    def parse_word(word)
-      case word
-      when URI::DEFAULT_PARSER.make_regexp
-        request.url = word
-      when /\A(-H|--header)\Z/
-        parse_header
-      when /\A'--compressed'\Z/
-        request.headers['Accept-Encoding'] ||= 'deflate, gzip'
-      when /\A-X|\A--request\Z/
-        request.method = extract_method(word)
-      when /\A(-u|--user)\Z/
-        request.headers['Authorization'] ||= "Basic #{Base64.strict_encode64(words.shift)}"
-      when /\A(-b|--cookie)\Z/
-        request.headers['Set-Cookie'] ||= words.shift
-      when /\A(-A|--user-agent)\Z/
-        request.headers['User-Agent'] ||= words.shift
-      when /\A(-I|--head)\Z/
-        request.method = 'HEAD'
-      when /\A(-d|--data|--data-ascii)\Z/
-        parse_data
-      when /\A--data-binary\Z/
-        parse_data(strip_newlines: false)
-      when /\A--data-raw\Z/
-        parse_data(raw: true)
-      when /\A--data-urlencode\Z/
-        parse_data(url_encode: true)
-      when /\./
-        request.url = "http://#{word}" if request.url == ''
+    module Word
+      def self.parse(request, words, word)
+        base_class = Word::Base.new(request, words)
+        subclass = base_class.subclasses
+                             .detect { |klass| klass === word }
+        subclass&.new(words, request)&.parse(word)
+      end
+
+      class Base
+        def initialize(words, request)
+          @words = words
+          @request = request
+        end
+
+        # List of subclasses, first-inherited class appears first.
+        def subclasses
+          ObjectSpace.each_object(Base.singleton_class)
+                     .reject { |klass| klass == Base }.reverse
+        end
+
+        def self.===(word)
+          regex =~ word
+        end
+
+        def self.regex
+          raise NotImplementedError, "#{name} doesn't have regex defined"
+        end
+
+        private
+
+        attr_reader :words
+        attr_reader :request
+      end

Much better if you ask me, since now it's more obvious I've replaced the case statement in the method.