Since version 2.3.0, Ruby comes bundled
with did_you_mean,
a handy gem for detecting typos.
Running this snippet of code:
defgreet"hello!"endgreeet
Will produce the following error message:
Traceback (most recent call last):
fun.rb:5:in `<main>': undefined local variable or method
`greeet' for main:Object (NameError)
Did you mean? greet
The bottom part of that error message was produced by
Did You Mean.
Let's re-implement the functionality provided by the
gem in order to figure out how it works.
Reinvent the wheel
The error message produced contains the standard NameError
message, with an additional question for the typo. Let's try
to do the same thing for the ZeroDivisionError.
This snippet of code:
1/0
Will produce the following error message:
Traceback (most recent call last):
1: from fun.rb:1:in `<main>'
fun.rb:1:in `/': divided by 0 (ZeroDivisionError)
Overriding the to_s instance method of the error class,
enables error message customizations. Let's customize
the ZeroDivisionError message to include custom
text:
classZeroDivisionErrordefto_s"#{super}\nOops, this looks like infinity to me."endend1/0
Running this will produce:
Traceback (most recent call last):
1: from fun.rb:7:in `<main>'
fun.rb:7:in `/': divided by 0 (ZeroDivisionError)
Oops, this looks like infinity to me.
This is a very simplified version of what Did You Mean
does under the hood, so let's dive right in.
Look under the hood
Let's look into the internals
of this gem:
git clone git@github.com:ruby/did_you_mean.git
cd did_you_mean
git checkout 8175f6e
Let's confirm our hunch that to_s is being overriden
somewhere:
moduleDidYouMean# Map of error types and spell checker objects.SPELL_CHECKERS=Hash.new(NullChecker)# Adds +DidYouMean+ functionality to an error using a given spell checkerdefself.correct_error(error_class,spell_checker)SPELL_CHECKERS[error_class.name]=spell_checkererror_class.prepend(Correctable)unlesserror_class<Correctableendcorrect_errorNameError,NameErrorCheckerscorrect_errorKeyError,KeyErrorCheckercorrect_errorNoMethodError,MethodNameChecker# Returns the currenctly set formatter. By default, it is set to +DidYouMean::Formatter+.defself.formatter@@formatterend# Updates the primary formatter used to format the suggestions.defself.formatter=(formatter)@@formatter=formatterendself.formatter=PlainFormatter.newend
This is a pretty elegant piece of code.
The class method DidYouMean.correct_error not only
populates the SPELL_CHECKERS hash with the spell checker for each error class,
but also prepends Correctable. Since we've got here by investigating what is SPELL_CHECKERS,
we now know this:
Since Did You Mean is doing different things for these error
classes, let's assume that we have spelled a method name wrong
and NoMethodError was raised. Did You Mean is going to
use MethodNameChecker for this.
Since Correctable is retrieving corrections from spell checker:
require_relative"../spell_checker"moduleDidYouMeanclassMethodNameCheckerattr_reader:method_name,:receiverNAMES_TO_EXCLUDE={NilClass=>nil.methods}NAMES_TO_EXCLUDE.default=[]# +MethodNameChecker::RB_RESERVED_WORDS+ is the list of reserved words in# Ruby that take an argument. Unlike# +VariableNameChecker::RB_RESERVED_WORDS+, these reserved words require# an argument, and a +NoMethodError+ is raised due to the presence of the# argument.## The +MethodNameChecker+ will use this list to suggest a reversed word if# a +NoMethodError+ is raised and found closest matches.## Also see +VariableNameChecker::RB_RESERVED_WORDS+.RB_RESERVED_WORDS=%i(aliascasedefdefined?elsifendensureforrescuesuperundefunlessuntilwhenwhileyield)definitialize(exception)@method_name=exception.name@receiver=exception.receiver@private_call=exception.respond_to?(:private_call?)?exception.private_call?:falseenddefcorrections@corrections||=SpellChecker.new(dictionary:RB_RESERVED_WORDS+method_names).correct(method_name)-NAMES_TO_EXCLUDE[@receiver.class]enddefmethod_namesmethod_names=receiver.methods+receiver.singleton_methodsmethod_names+=receiver.private_methodsif@private_callmethod_names.uniq!method_namesendendend
For our case, this checker is initialized with an instance of NoMethodError. Method names
are collected elegantly by combining Object#methods and Object#signleton_methods on
NameError#receiver:
require_relative"../spell_checker"moduleDidYouMeanclassMethodNameCheckerattr_reader:method_name,:receiverNAMES_TO_EXCLUDE={NilClass=>nil.methods}NAMES_TO_EXCLUDE.default=[]# +MethodNameChecker::RB_RESERVED_WORDS+ is the list of reserved words in# Ruby that take an argument. Unlike# +VariableNameChecker::RB_RESERVED_WORDS+, these reserved words require# an argument, and a +NoMethodError+ is raised due to the presence of the# argument.## The +MethodNameChecker+ will use this list to suggest a reversed word if# a +NoMethodError+ is raised and found closest matches.## Also see +VariableNameChecker::RB_RESERVED_WORDS+.RB_RESERVED_WORDS=%i(aliascasedefdefined?elsifendensureforrescuesuperundefunlessuntilwhenwhileyield)definitialize(exception)@method_name=exception.name@receiver=exception.receiver@private_call=exception.respond_to?(:private_call?)?exception.private_call?:falseenddefcorrections@corrections||=SpellChecker.new(dictionary:RB_RESERVED_WORDS+method_names).correct(method_name)-NAMES_TO_EXCLUDE[@receiver.class]enddefmethod_namesmethod_names=receiver.methods+receiver.singleton_methodsmethod_names+=receiver.private_methodsif@private_callmethod_names.uniq!method_namesendendend
SpellChecker is initialized with a dictionary that contains Ruby's reserved
words and all of the methods from the receiver, or the object that raised the exception.
Spell checker calculates two distances for detecting word similarities:
Jaro-Winkler distance - metric measuring an edit distance between two words
Levenshtein distance - metric measuring the difference between two words
Jaro-Winkler distance
This metric gives us a number called edit distance representing the similarity between two words:
0 means no similarity
1 means that they are identical
Edit distance is calculated by counting the minimum number of operations required to transform one string into the other.
Original algorithm is called "Jaro Similarity" and "Jaro-Winkler" is an improvement of that,
giving more favorable rating to the similarity of the beginning of compared words.
I will not go into too much detail about how this algorithm works, but here's an example for using it
to suggest correct band names:
DICTIONARY=["the beatles","iron maiden","the clash","ramones","madness","the rolling stones"]defspell_check(term,treshold=0.83)DICTIONARY.selectdo|word|DidYouMean::JaroWinkler.distance(word,term)>=tresholdend.sort_bydo|word|DidYouMean::JaroWinkler.distance(word,term)end.reverseend# Since beginnings are similar,# we get useful suggestions.spell_check("the beles")# ["the beatles", "the clash"]spell_check("sadness")# ["madness"]spell_check("the rolling santas")# ["the rolling stones"]spell_check("ironman")# ["iron maiden"]spell_check("romans")# ["ramones"]# Since only endings are similar,# we don't get useful suggestions.spell_check("polite maiden")# []spell_check("car clash")# []spell_check("bowling stones")# []spell_check("los beatles")# []
Levenshtein distance
This metric gives us the number of edits needed to convert one word into another.
0 means no edits are needed - words are identical
Here are a couple of examples:
Levenshtein distance between foo and bar is 3 since we had to change 3 characters
Levenshtein distance between ruby and rugby is 1 since we had to change 1 character
I will not go into too much detail about how this algorithm works, but let's use
it on our previous example for spell checking band names:
DICTIONARY=["the beatles","iron maiden","the clash","ramones","madness","the rolling stones"]defspell_check(term,threshold=0.83)threshold=(term.length*0.25).ceilDICTIONARY.selectdo|word|DidYouMean::Levenshtein.distance(word,term)<=thresholdend.sort_bydo|word|DidYouMean::Levenshtein.distance(word,term)endend# We no longer compare beginnings,# so we get different results.spell_check("the beles")# ["the beatles"] <- changedspell_check("sadness")# ["madness"]spell_check("the rolling santas")# ["the rolling stones"]spell_check("ironman")# [] <- changedspell_check("romans")# [] <- changed# Since only endings are similar,# we don't get useful suggestions.spell_check("polite maiden")# []spell_check("car clash")# ["the clash"] <- changedspell_check("bowling stones")# []spell_check("los beatles")# ["the beatles"] <- changed
Notice the threshold set to 25%. This means that we are
not going to consider two words similar if we have to change
more than 25% of the misspelled word to get to the valid word.
Combining the algorithms
Did You Mean uses a combination of these two algorithms in order
to improve its accuracy:
DICTIONARY=["the beatles","iron maiden","the clash","ramones","madness","the rolling stones"]defspell_check(term,threshold=0.83)first_pass=DICTIONARY.selectdo|word|DidYouMean::JaroWinkler.distance(word,term)>=thresholdend.sort_bydo|word|DidYouMean::JaroWinkler.distance(word,term)end.reversethreshold=(term.length*0.25).ceilcorrections=first_pass.selectdo|word|DidYouMean::Levenshtein.distance(word,term)<=thresholdend.sort_bydo|word|DidYouMean::Levenshtein.distance(word,term)endend# Since beginnings are similar,# we get useful suggestions.spell_check("the beles")# ["the beatles"]spell_check("sadness")# ["madness"]spell_check("the rolling santas")# ["the rolling stones"]spell_check("ironman")# []spell_check("romans")# []# Since only endings are similar,# we don't get useful suggestions.spell_check("polite maiden")# []spell_check("car clash")# [] <- changedspell_check("bowling stones")# []spell_check("los beatles")# [] <- changed
Back to formatter
Now that we know how corrections get generated, we can go back to the original line inside Correctable:
moduleDidYouMean# Map of error types and spell checker objects.SPELL_CHECKERS=Hash.new(NullChecker)# Adds +DidYouMean+ functionality to an error using a given spell checkerdefself.correct_error(error_class,spell_checker)SPELL_CHECKERS[error_class.name]=spell_checkererror_class.prepend(Correctable)unlesserror_class<Correctableendcorrect_errorNameError,NameErrorCheckerscorrect_errorKeyError,KeyErrorCheckercorrect_errorNoMethodError,MethodNameChecker# Returns the currenctly set formatter. By default, it is set to +DidYouMean::Formatter+.defself.formatter@@formatterend# Updates the primary formatter used to format the suggestions.defself.formatter=(formatter)@@formatter=formatterendself.formatter=PlainFormatter.newend
PlainFormatter simply appends the suggestion message to the end of exception message:
# frozen-string-literal: truemoduleDidYouMean# The +DidYouMean::PlainFormatter+ is the basic, default formatter for the# gem. The formatter responds to the +message_for+ method and it returns a# human readable string.classPlainFormatter# Returns a human readable string that contains +corrections+. This# formatter is designed to be less verbose to not take too much screen# space while being helpful enough to the user.## @example## formatter = DidYouMean::PlainFormatter.new## # displays suggestions in two lines with the leading empty line# puts formatter.message_for(["methods", "method"])## Did you mean? methods# method# # => nil## # displays an empty line# puts formatter.message_for([])## # => nil#defmessage_for(corrections)corrections.empty??"":"\nDid you mean? #{corrections.join("\n ")}"endendend
And that's it, we now know how Did You Mean appends its corrections
to the error messages.
Bonus
As a reward for completing this article, I'd like to introduce a gem
that autofixes the suggestions from DidYouMean. It's called I Did Mean
and it comes with Rails integration.
Try it out and let me know what you think!
Want to talk more about this or any other topic? Email me. I welcome every email.