← notes

Deep dive: DidYouMean

December 28, 2019
ruby

Since version 2.3.0, Ruby comes bundled with did_you_mean, a handy gem for detecting typos.

Running this snippet of code:

def greet
  "hello!"
end

greeet

Will produce the following error message:

Traceback (most recent call last):
fun.rb:5:in `<main>': undefined local variable or method
                      `greeet' for main:Object (NameError)
Did you mean?  greet

The bottom part of that error message was produced by Did You Mean.

Let's re-implement the functionality provided by the gem in order to figure out how it works.

Reinvent the wheel

The error message produced contains the standard NameError message, with an additional question for the typo. Let's try to do the same thing for the ZeroDivisionError.

This snippet of code:

1 / 0

Will produce the following error message:

Traceback (most recent call last):
        1: from fun.rb:1:in `<main>'
fun.rb:1:in `/': divided by 0 (ZeroDivisionError)

Overriding the to_s instance method of the error class, enables error message customizations. Let's customize the ZeroDivisionError message to include custom text:

class ZeroDivisionError
  def to_s
    "#{super}\nOops, this looks like infinity to me."
  end
end

1 / 0

Running this will produce:

Traceback (most recent call last):
        1: from fun.rb:7:in `<main>'
fun.rb:7:in `/': divided by 0 (ZeroDivisionError)
Oops, this looks like infinity to me.

This is a very simplified version of what Did You Mean does under the hood, so let's dive right in.

Look under the hood

Let's look into the internals of this gem:

git clone git@github.com:ruby/did_you_mean.git
cd did_you_mean
git checkout 8175f6e

Let's confirm our hunch that to_s is being overriden somewhere:

grep "def to_s" -nR lib

Which leads us to the following:

module DidYouMean
  module Correctable
    def original_message
      method(:to_s).super_method.call
    end

    def to_s
      msg = super.dup
      suggestion = DidYouMean.formatter.message_for(corrections)

      msg << suggestion if !msg.end_with?(suggestion)
      msg
    rescue
      super
    end

    def corrections
      @corrections ||= spell_checker.corrections
    end

    def spell_checker
      SPELL_CHECKERS[self.class.to_s].new(self)
    end
  end
end

Looks like we're fetching a spell checker for this type of error:

module DidYouMean
  module Correctable
    def original_message
      method(:to_s).super_method.call
    end

    def to_s
      msg = super.dup
      suggestion = DidYouMean.formatter.message_for(corrections)

      msg << suggestion if !msg.end_with?(suggestion)
      msg
    rescue
      super
    end

    def corrections
      @corrections ||= spell_checker.corrections
    end

    def spell_checker
      SPELL_CHECKERS[self.class.to_s].new(self)
    end
  end
end

Spell checkers are being set in the entry file:

module DidYouMean
  # Map of error types and spell checker objects.
  SPELL_CHECKERS = Hash.new(NullChecker)

  # Adds +DidYouMean+ functionality to an error using a given spell checker
  def self.correct_error(error_class, spell_checker)
    SPELL_CHECKERS[error_class.name] = spell_checker
    error_class.prepend(Correctable) unless error_class < Correctable
  end

  correct_error NameError, NameErrorCheckers
  correct_error KeyError, KeyErrorChecker
  correct_error NoMethodError, MethodNameChecker

  # Returns the currenctly set formatter. By default, it is set to +DidYouMean::Formatter+.
  def self.formatter
    @@formatter
  end

  # Updates the primary formatter used to format the suggestions.
  def self.formatter=(formatter)
    @@formatter = formatter
  end

  self.formatter = PlainFormatter.new
end

This is a pretty elegant piece of code.

The class method DidYouMean.correct_error not only populates the SPELL_CHECKERS hash with the spell checker for each error class, but also prepends Correctable. Since we've got here by investigating what is SPELL_CHECKERS, we now know this:

SPELL_CHECKERS = {
  "NameError" => NameErrorCheckers,
  "KeyError" => KeyErrorCheckers,
  "NoMethodError" => MethodNameChecker
}

Since Did You Mean is doing different things for these error classes, let's assume that we have spelled a method name wrong and NoMethodError was raised. Did You Mean is going to use MethodNameChecker for this.

Since Correctable is retrieving corrections from spell checker:

module DidYouMean
  module Correctable
    def original_message
      method(:to_s).super_method.call
    end

    def to_s
      msg = super.dup
      suggestion = DidYouMean.formatter.message_for(corrections)

      msg << suggestion if !msg.end_with?(suggestion)
      msg
    rescue
      super
    end

    def corrections
      @corrections ||= spell_checker.corrections
    end

    def spell_checker
      SPELL_CHECKERS[self.class.to_s].new(self)
    end
  end
end

Let's dive into MethodNameChecker#corrections:

require_relative "../spell_checker"

module DidYouMean
  class MethodNameChecker
    attr_reader :method_name, :receiver

    NAMES_TO_EXCLUDE = { NilClass => nil.methods }
    NAMES_TO_EXCLUDE.default = []

    # +MethodNameChecker::RB_RESERVED_WORDS+ is the list of reserved words in
    # Ruby that take an argument. Unlike
    # +VariableNameChecker::RB_RESERVED_WORDS+, these reserved words require
    # an argument, and a +NoMethodError+ is raised due to the presence of the
    # argument.
    #
    # The +MethodNameChecker+ will use this list to suggest a reversed word if
    # a +NoMethodError+ is raised and found closest matches.
    #
    # Also see +VariableNameChecker::RB_RESERVED_WORDS+.
    RB_RESERVED_WORDS = %i(
      alias
      case
      def
      defined?
      elsif
      end
      ensure
      for
      rescue
      super
      undef
      unless
      until
      when
      while
      yield
    )

    def initialize(exception)
      @method_name  = exception.name
      @receiver     = exception.receiver
      @private_call = exception.respond_to?(:private_call?) ? exception.private_call? : false
    end

    def corrections
      @corrections ||= SpellChecker.new(dictionary: RB_RESERVED_WORDS + method_names).correct(method_name) - NAMES_TO_EXCLUDE[@receiver.class]
    end

    def method_names
      method_names = receiver.methods + receiver.singleton_methods
      method_names += receiver.private_methods if @private_call
      method_names.uniq!
      method_names
    end
  end
end

For our case, this checker is initialized with an instance of NoMethodError. Method names are collected elegantly by combining Object#methods and Object#signleton_methods on NameError#receiver:

require_relative "../spell_checker"

module DidYouMean
  class MethodNameChecker
    attr_reader :method_name, :receiver

    NAMES_TO_EXCLUDE = { NilClass => nil.methods }
    NAMES_TO_EXCLUDE.default = []

    # +MethodNameChecker::RB_RESERVED_WORDS+ is the list of reserved words in
    # Ruby that take an argument. Unlike
    # +VariableNameChecker::RB_RESERVED_WORDS+, these reserved words require
    # an argument, and a +NoMethodError+ is raised due to the presence of the
    # argument.
    #
    # The +MethodNameChecker+ will use this list to suggest a reversed word if
    # a +NoMethodError+ is raised and found closest matches.
    #
    # Also see +VariableNameChecker::RB_RESERVED_WORDS+.
    RB_RESERVED_WORDS = %i(
      alias
      case
      def
      defined?
      elsif
      end
      ensure
      for
      rescue
      super
      undef
      unless
      until
      when
      while
      yield
    )

    def initialize(exception)
      @method_name  = exception.name
      @receiver     = exception.receiver
      @private_call = exception.respond_to?(:private_call?) ? exception.private_call? : false
    end

    def corrections
      @corrections ||= SpellChecker.new(dictionary: RB_RESERVED_WORDS + method_names).correct(method_name) - NAMES_TO_EXCLUDE[@receiver.class]
    end

    def method_names
      method_names = receiver.methods + receiver.singleton_methods
      method_names += receiver.private_methods if @private_call
      method_names.uniq!
      method_names
    end
  end
end

SpellChecker is initialized with a dictionary that contains Ruby's reserved words and all of the methods from the receiver, or the object that raised the exception.

Let's dive into the internals of SpellChecker:

require_relative "levenshtein"
require_relative "jaro_winkler"

module DidYouMean
  class SpellChecker
    def initialize(dictionary:)
      @dictionary = dictionary
    end

    def correct(input)
      input     = normalize(input)
      threshold = input.length > 3 ? 0.834 : 0.77

      words = @dictionary.select do |word|
        JaroWinkler.distance(normalize(word), input) >= threshold
      end
      words.reject! { |word| input == word.to_s }
      words.sort_by! { |word| JaroWinkler.distance(word.to_s, input) }
      words.reverse!

      # Correct mistypes
      threshold   = (input.length * 0.25).ceil
      corrections = words.select do |c|
        Levenshtein.distance(normalize(c), input) <= threshold
      end

      # Correct misspells
      if corrections.empty?
        corrections = words.select do |word|
          word   = normalize(word)
          length = if input.length < word.length
                     input.length
                   else
                     word.length
                   end

          Levenshtein.distance(word, input) < length
        end.first(1)
      end

      corrections
    end

    private

    def normalize(str_or_symbol) #:nodoc:
      str = str_or_symbol.to_s.downcase
      str.tr!("@", "")
      str
    end
  end
end

Spell checker calculates two distances for detecting word similarities:

  • Jaro-Winkler distance - metric measuring an edit distance between two words
  • Levenshtein distance - metric measuring the difference between two words

Jaro-Winkler distance

This metric gives us a number called edit distance representing the similarity between two words:

  • 0 means no similarity
  • 1 means that they are identical

Edit distance is calculated by counting the minimum number of operations required to transform one string into the other.

Original algorithm is called "Jaro Similarity" and "Jaro-Winkler" is an improvement of that, giving more favorable rating to the similarity of the beginning of compared words.

I will not go into too much detail about how this algorithm works, but here's an example for using it to suggest correct band names:

DICTIONARY = [
  "the beatles",
  "iron maiden",
  "the clash",
  "ramones",
  "madness",
  "the rolling stones"
]

def spell_check(term, treshold = 0.83)
  DICTIONARY.select do |word|
    DidYouMean::JaroWinkler.distance(word, term) >= treshold
  end.sort_by do |word|
    DidYouMean::JaroWinkler.distance(word, term)
  end.reverse
end

# Since beginnings are similar,
# we get useful suggestions.

spell_check("the beles")          # ["the beatles", "the clash"]
spell_check("sadness")            # ["madness"]
spell_check("the rolling santas") # ["the rolling stones"]
spell_check("ironman")            # ["iron maiden"]
spell_check("romans")             # ["ramones"]

# Since only endings are similar,
# we don't get useful suggestions.

spell_check("polite maiden")      # []
spell_check("car clash")          # []
spell_check("bowling stones")     # []
spell_check("los beatles")        # []

Levenshtein distance

This metric gives us the number of edits needed to convert one word into another.

  • 0 means no edits are needed - words are identical

Here are a couple of examples:

  • Levenshtein distance between foo and bar is 3 since we had to change 3 characters
  • Levenshtein distance between ruby and rugby is 1 since we had to change 1 character

I will not go into too much detail about how this algorithm works, but let's use it on our previous example for spell checking band names:

DICTIONARY = [
  "the beatles",
  "iron maiden",
  "the clash",
  "ramones",
  "madness",
  "the rolling stones"
]

def spell_check(term, threshold = 0.83)
  threshold  = (term.length * 0.25).ceil

  DICTIONARY.select do |word|
    DidYouMean::Levenshtein.distance(word, term) <= threshold
  end.sort_by do |word|
    DidYouMean::Levenshtein.distance(word, term)
  end
end

# We no longer compare beginnings,
# so we get different results.

spell_check("the beles")          # ["the beatles"] <- changed
spell_check("sadness")            # ["madness"]
spell_check("the rolling santas") # ["the rolling stones"]
spell_check("ironman")            # [] <- changed
spell_check("romans")             # [] <- changed

# Since only endings are similar,
# we don't get useful suggestions.

spell_check("polite maiden")  # []
spell_check("car clash")      # ["the clash"] <- changed
spell_check("bowling stones") # []
spell_check("los beatles")    # ["the beatles"] <- changed

Notice the threshold set to 25%. This means that we are not going to consider two words similar if we have to change more than 25% of the misspelled word to get to the valid word.

Combining the algorithms

Did You Mean uses a combination of these two algorithms in order to improve its accuracy:

DICTIONARY = [
  "the beatles",
  "iron maiden",
  "the clash",
  "ramones",
  "madness",
  "the rolling stones"
]

def spell_check(term, threshold = 0.83)
  first_pass = DICTIONARY.select do |word|
    DidYouMean::JaroWinkler.distance(word, term) >= threshold
  end.sort_by do |word|
    DidYouMean::JaroWinkler.distance(word, term)
  end.reverse

  threshold  = (term.length * 0.25).ceil
  corrections = first_pass.select do |word|
    DidYouMean::Levenshtein.distance(word, term) <= threshold
  end.sort_by do |word|
    DidYouMean::Levenshtein.distance(word, term)
  end
end

# Since beginnings are similar,
# we get useful suggestions.

spell_check("the beles")          # ["the beatles"]
spell_check("sadness")            # ["madness"]
spell_check("the rolling santas") # ["the rolling stones"]
spell_check("ironman")            # []
spell_check("romans")             # []

# Since only endings are similar,
# we don't get useful suggestions.

spell_check("polite maiden")  # []
spell_check("car clash")      # [] <- changed
spell_check("bowling stones") # []
spell_check("los beatles")    # [] <- changed

Back to formatter

Now that we know how corrections get generated, we can go back to the original line inside Correctable:

module DidYouMean
  module Correctable
    def original_message
      method(:to_s).super_method.call
    end

    def to_s
      msg = super.dup
      suggestion = DidYouMean.formatter.message_for(corrections)

      msg << suggestion if !msg.end_with?(suggestion)
      msg
    rescue
      super
    end

    def corrections
      @corrections ||= spell_checker.corrections
    end

    def spell_checker
      SPELL_CHECKERS[self.class.to_s].new(self)
    end
  end
end

The formatter is set to PlainFormatter here:

module DidYouMean
  # Map of error types and spell checker objects.
  SPELL_CHECKERS = Hash.new(NullChecker)

  # Adds +DidYouMean+ functionality to an error using a given spell checker
  def self.correct_error(error_class, spell_checker)
    SPELL_CHECKERS[error_class.name] = spell_checker
    error_class.prepend(Correctable) unless error_class < Correctable
  end

  correct_error NameError, NameErrorCheckers
  correct_error KeyError, KeyErrorChecker
  correct_error NoMethodError, MethodNameChecker

  # Returns the currenctly set formatter. By default, it is set to +DidYouMean::Formatter+.
  def self.formatter
    @@formatter
  end

  # Updates the primary formatter used to format the suggestions.
  def self.formatter=(formatter)
    @@formatter = formatter
  end

  self.formatter = PlainFormatter.new
end

PlainFormatter simply appends the suggestion message to the end of exception message:

# frozen-string-literal: true

module DidYouMean
  # The +DidYouMean::PlainFormatter+ is the basic, default formatter for the
  # gem. The formatter responds to the +message_for+ method and it returns a
  # human readable string.
  class PlainFormatter

    # Returns a human readable string that contains +corrections+. This
    # formatter is designed to be less verbose to not take too much screen
    # space while being helpful enough to the user.
    #
    # @example
    #
    #   formatter = DidYouMean::PlainFormatter.new
    #
    #   # displays suggestions in two lines with the leading empty line
    #   puts formatter.message_for(["methods", "method"])
    #
    #   Did you mean?  methods
    #                   method
    #   # => nil
    #
    #   # displays an empty line
    #   puts formatter.message_for([])
    #
    #   # => nil
    #
    def message_for(corrections)
      corrections.empty? ? "" :
        "\nDid you mean?  #{corrections.join("\n               ")}"
    end
  end
end

And that's it, we now know how Did You Mean appends its corrections to the error messages.

Bonus

As a reward for completing this article, I'd like to introduce a gem that autofixes the suggestions from DidYouMean. It's called I Did Mean and it comes with Rails integration.

Try it out and let me know what you think!