Refactoring a large case statement

I'm currently building a gem for converting curl commands to something else, and one early problem I needed to deal with was parsing the curl command arguments.

The basic idea for a gem is to convert curl command to a sweet looking hash that we can use to do magical things. So when given:

curl https://isrubydead.com/

Our program converts it to:

{
  "body": "",
  "method": "GET",
  "headers": {},
  "url": "https://isrubydead.com/"
}

We'll call this hash a Request, so let's define it:

Request = Struct.new(:body, :method, :headers, :url) do
  def initialize
    self.method = 'GET'
    self.headers = {}
    self.url = ''
    self.body = ''
  end
end

Our program should expect the curl command as a command line argument, so let's use ARGV.first to read the curl command:

require 'shellwords'

module ParseCurl
  class << self
    def parse(command)
      self.request = Request.new
      self.parser = Parser.new(request)

      parser.parse(Shellwords.split(command.strip))

      request.to_h
    end

    private

    attr_accessor :request
    attr_accessor :parser
  end
end

ParseCurl.parse(ARGV.first)

We use Shellwords here to convert the curl command to an array of tokens, to make it easier to parse.

Now that we have a place to store the result of parsing the curl command, and we are passing the command line argument to a parser, it's time to write an actual parser.

Being a very serious and very senior developer, I've ended up with this very appealing class:

class Parser
  def initialize(request)
    @request = request
  end

  def parse(words)
    self.words = words

    while (word = words.shift)
      parse_word(word)
    end
  end

  private

  attr_accessor :words
  attr_reader :request

  def parse_word(word)
    case word
    when URI::DEFAULT_PARSER.make_regexp
      request.url = word
    when /\A(-H|--header)\Z/
      parse_header
    when /\A--compressed\Z/
      request.headers['Accept-Encoding'] ||= 'deflate, gzip'
    when /\A-X|\A--request\Z/
      request.method = extract_method(word)
    when /\A(-u|--user)\Z/
      request.headers['Authorization'] ||=
        "Basic #{Base64.strict_encode64(words.shift)}"
    when /\A(-b|--cookie)\Z/
      request.headers['Set-Cookie'] ||= words.shift
    when /\A(-A|--user-agent)\Z/
      request.headers['User-Agent'] ||= words.shift
    when /\A(-I|--head)\Z/
      request.method = 'HEAD'
    when /\A(-d|--data|--data-ascii)\Z/
      parse_data
    when /\A--data-binary\Z/
      parse_data(strip_newlines: false)
    when /\A--data-raw\Z/
      parse_data(raw: true)
    when /\A--data-urlencode\Z/
      parse_data(url_encode: true)
    when /\./
      request.url = "http://#{word}" if request.url == ''
    end
  end
end

I know, a thing a beauty.

What's also very nice is that this class also have to know everything about parsing curl arguments; from header and method arguments to various data arguments.

Super nice! Just look at these beautiful methods, that all belong to this class:

class Parser
  private

  def parse_data(**args)
    request.method = 'POST' if request.method == 'GET' ||
                               request.method == 'HEAD'
    request.headers['Content-Type'] ||=
      'application/x-www-form-urlencoded'

    data = extract_data(args)

    request.body = if request.body.empty?
                     data
                   else
                     "#{request.body}&#{data}"
                   end
  end

  def parse_header
    header = extract_header(words.shift)

    return unless header

    request.headers[header.keys.first] =
      header.values.first
  end

  def extract_data(args)
    data = words.shift
    data = read_file(data, args) if data =~ /\A@/ || args[:raw]
    data
  end

  def read_file(word,
                strip_newlines: true,
                raw: false,
                url_encode: false)
    path = raw ? word : word[1..-1]
    content = File.read(path)
    content.gsub!(/\n|\r/, '') if strip_newlines
    content = CGI.escape(content) if url_encode
    content
  rescue Errno::ENOENT
    file_path = File.expand_path(data[1..-1])
    raise FileNotFoundError,
      "Cannot read file: #{file_path}. Does it exist?"
  end

  # Needed for extracting -XPUT and similar formats.
  def extract_method(word)
    return words.shift unless word =~ /^-X/

    if !(method = word.match(/^-X(.*)/)[1]).empty?
      method
    else
      words.shift
    end
  end

  def extract_header(header)
    [header.split(/: (.+)/)].to_h
  rescue StandardError
    nil
  end
end

Marvelous! This class knows absolutely everything about curl and all the arguments and options. We could ask it the curl's shoe size, and it would give us the answer.

Now that we have thought this class so much, and it has achieved the master's degree worth of education in the University of curl, I'm a little worried it's going to take over the world.

Sure, it only knows about curl for now, but what happens when someone decides it should know about, I don't know, some scary AI thing. It looks innocent right now, but then you turn around and it gets interested in doors.

I don't trust it one bit, so let's steal some knowledge from this class and let's spread it to other classes.

I'll now start explaining the refactoring step by step, but you can take a look at the final result if you'd like to skip.

This class will from now on be ignorant of curl arguments, so the humongous case statement becomes:

class Parser
  def parse_word(word)
    Word.parse(request, words, word)
  end
end

Since every scenario we have to cover in the case statement consists of the regex check and the parsing part, the idea is to abstract those into class.

Let's introduce the Word::Base class:

module Word
  class Base
    # Used to check if the word given matches the subclass' regex.
    def self.===(word)
      word =~ regex
    end

    def self.regex
      raise NotImplementedError, "#{name} doesn't have .regex defined"
    end

    def parse(*)
      raise NotImplementedError, "#{self.class.name} doesn't have #parse defined"
    end
  end
end

The plan is to have one base class, that will be inherited for every scenario we'd like to cover. Here's the first example of one such class:

module Word
  class Header < Base
    def self.regex
      /\A(-H|--header)\Z/
    end

    def parse(*)
      header = extract_header(words.shift)

      return unless header

      request.headers[header.keys.first] = header.values.first
    end

    private

    def extract_header(header)
      [header.split(/: (.+)/)].to_h
    rescue StandardError
      nil
    end
  end
end

This class stores information about everything that's needed to parse a header argument. It detects the argument (when the curl command includes -H or --header in arguments) and then changes the request hash, which we return in the end.

Since this class has to know about words and request too, we'll add that to the base class. We expect other descendants to need that too:

module Word
  class Base
    def initialize(words, request)
      @words = words
      @request = request
    end

    private

    attr_reader :words
    attr_reader :request
  end
end

The next problem we have is that there's no connection between the original parse_word method and this new class we have created, so let's write a method that connects those two:

module Word
  def self.parse(request, words, word)
    subclass = Word::Base.subclasses
                         .detect { |klass| klass === word }
    subclass&.new(words, request)&.parse(word)
  end

  class Base
    # List of subclasses, first-inherited class first.
    def self.subclasses
      ObjectSpace.each_object(Base.singleton_class)
                 .reject { |klass| klass == Base }.reverse
    end
  end
end

With this method we can find a Word::Base subclass that matches the word that was given and then parses that word. If we find the subclass, we try to parse the word. So our current code works for parsing --header or -h words from curl.

Our code now supports parsing headers, so let's proceed to add classes for every argument we'd like to support:

module Word
  class Compressed < Base
    def self.regex
      /\A--compressed\Z/
    end

    def parse(*)
      request.headers['Accept-Encoding'] ||= 'deflate, gzip'
    end
  end

  class Request < Base
    def self.regex
      /\A-X|\A--request\Z/
    end

    def parse(word)
      request.method = extract_method(word)
    end

    private

    # Needed for extracting -XPUT and similar formats.
    def extract_method(word)
      return words.shift unless word =~ /^-X/

      if !(method = word.match(/^-X(.*)/)[1]).empty?
        method
      else
        words.shift
      end
    end
  end

  class Authorization < Base
    def self.regex
      /\A(-u|--user)\Z/
    end

    def parse(*)
      request.headers['Authorization'] ||=
        "Basic #{Base64.strict_encode64(words.shift)}"
    end
  end

  class Cookie < Base
    def self.regex
      /\A(-b|--cookie)\Z/
    end

    def parse(*)
      request.headers['Set-Cookie'] ||= words.shift
    end
  end

  class UserAgent < Base
    def self.regex
      /\A(-A|--user-agent)\Z/
    end

    def parse(*)
      request.headers['User-Agent'] ||= words.shift
    end
  end

  class Head < Base
    def self.regex
      /\A(-I|--head)\Z/
    end

    def parse(*)
      request.method = 'HEAD'
    end
  end

  class Data < Base
    def self.regex
      /\A(-d|--data|--data-ascii)\Z/
    end

    def options
      {}
    end

    def parse(*)
      request.method = 'POST' if request.method == 'GET' ||
                                 request.method == 'HEAD'
      request.headers['Content-Type'] ||=
        'application/x-www-form-urlencoded'

      data = extract_data(options)

      request.body = if request.body.empty?
                       data
                     else
                       "#{request.body}&#{data}"
                     end
    end

    def extract_data(args)
      data = words.shift
      data = read_file(data, args) if data =~ /\A@/ ||
                                      args[:raw]
      data
    end

    def read_file(word,
                  strip_newlines: true,
                  raw: false,
                  url_encode: false)
      path = raw ? word : word[1..-1]
      content = File.read(path)
      content.gsub!(/\n|\r/, '') if strip_newlines
      content = CGI.escape(content) if url_encode
      content
    rescue Errno::ENOENT
      file_path = File.expand_path(data[1..-1])
      raise FileNotFoundError,
            "Cannot read file: #{file_path}. Does it exist?"
    end
  end

  class StripNewlinesData < Data
    def self.regex
      /\A--data-binary\Z/
    end

    def options
      { strip_newlines: false }
    end
  end

  class RawData < Data
    def self.regex
      /\A--data-raw\Z/
    end

    def options
      { raw: true }
    end
  end

  class URLEncodeData < Data
    def self.regex
      /\A--data-urlencode\Z/
    end

    def options
      { url_encode: true }
    end
  end

  class URL < Base
    def self.===(word)
      word =~ URI::DEFAULT_PARSER.make_regexp
    end

    def parse(word)
      request.url = word
    end
  end

  class NonStrictURL < Base
    def self.regex
      /\./
    end

    def parse(word)
      request.url = "http://#{word}" if request.url == ''
    end
  end
end

Here's the final code we end up with:

module Word
  def self.parse(request, words, word)
    subclass = Word::Base.subclasses
                         .detect { |klass| klass === word }
    subclass&.new(words, request)&.parse(word)
  end

  class Base
    def initialize(words, request)
      @words = words
      @request = request
    end

    # List of subclasses, first-inherited class appears first.
    def self.subclasses
      ObjectSpace.each_object(Base.singleton_class)
                 .reject { |klass| klass == Base }.reverse
    end

    # Used to check if the word given matches the subclass' regex.
    def self.===(word)
      word =~ regex
    end

    def self.regex
      raise NotImplementedError,
            "#{name} doesn't have .regex defined"
    end

    def parse(*)
      raise NotImplementedError,
            "#{self.class.name} doesn't have #parse defined"
    end

    private

    attr_reader :words
    attr_reader :request
  end

  class Header < Base
    def self.regex
      /\A(-H|--header)\Z/
    end

    def parse(*)
      header = extract_header(words.shift)

      return unless header

      request.headers[header.keys.first] = header.values.first
    end

    private

    def extract_header(header)
      [header.split(/: (.+)/)].to_h
    rescue StandardError
      nil
    end
  end

  class Compressed < Base
    def self.regex
      /\A--compressed\Z/
    end

    def parse(*)
      request.headers['Accept-Encoding'] ||= 'deflate, gzip'
    end
  end

  class Request < Base
    def self.regex
      /\A-X|\A--request\Z/
    end

    def parse(word)
      request.method = extract_method(word)
    end

    private

    # Needed for extracting -XPUT and similar formats.
    def extract_method(word)
      return words.shift unless word =~ /^-X/

      if !(method = word.match(/^-X(.*)/)[1]).empty?
        method
      else
        words.shift
      end
    end
  end

  class Authorization < Base
    def self.regex
      /\A(-u|--user)\Z/
    end

    def parse(*)
      request.headers['Authorization'] ||=
        "Basic #{Base64.strict_encode64(words.shift)}"
    end
  end

  class Cookie < Base
    def self.regex
      /\A(-b|--cookie)\Z/
    end

    def parse(*)
      request.headers['Set-Cookie'] ||= words.shift
    end
  end

  class UserAgent < Base
    def self.regex
      /\A(-A|--user-agent)\Z/
    end

    def parse(*)
      request.headers['User-Agent'] ||= words.shift
    end
  end

  class Head < Base
    def self.regex
      /\A(-I|--head)\Z/
    end

    def parse(*)
      request.method = 'HEAD'
    end
  end

  class Data < Base
    def self.regex
      /\A(-d|--data|--data-ascii)\Z/
    end

    def options
      {}
    end

    def parse(*)
      request.method = 'POST' if request.method == 'GET' ||
                                 request.method == 'HEAD'
      request.headers['Content-Type'] ||=
        'application/x-www-form-urlencoded'

      data = extract_data(options)

      request.body = if request.body.empty?
                       data
                     else
                       "#{request.body}&#{data}"
                     end
    end

    def extract_data(args)
      data = words.shift
      data = read_file(data, args) if data =~ /\A@/ ||
                                      args[:raw]
      data
    end

    def read_file(word,
                  strip_newlines: true,
                  raw: false,
                  url_encode: false)
      path = raw ? word : word[1..-1]
      content = File.read(path)
      content.gsub!(/\n|\r/, '') if strip_newlines
      content = CGI.escape(content) if url_encode
      content
    rescue Errno::ENOENT
      file_path = File.expand_path(data[1..-1])
      raise FileNotFoundError,
            "Cannot read file: #{file_path}. Does it exist?"
    end
  end

  class StripNewlinesData < Data
    def self.regex
      /\A--data-binary\Z/
    end

    def options
      { strip_newlines: false }
    end
  end

  class RawData < Data
    def self.regex
      /\A--data-raw\Z/
    end

    def options
      { raw: true }
    end
  end

  class URLEncodeData < Data
    def self.regex
      /\A--data-urlencode\Z/
    end

    def options
      { url_encode: true }
    end
  end

  class URL < Base
    def self.===(word)
      word =~ URI::DEFAULT_PARSER.make_regexp
    end

    def parse(word)
      request.url = word
    end
  end

  class NonStrictURL < Base
    def self.regex
      /\./
    end

    def parse(word)
      request.url = "http://#{word}" if request.url == ''
    end
  end
end

class Parser
  def initialize(request)
    @request = request
  end

  def parse(words)
    self.words = words

    while (word = words.shift)
      parse_word(word)
    end
  end

  private

  attr_accessor :words
  attr_reader :request

  def parse_word(word)
    Word.parse(request, words, word)
  end
end

module ParseCurl
  class << self
    def parse(command)
      self.request = Request.new
      self.parser = Parser.new(request)

      parser.parse(Shellwords.split(command.strip))

      request.to_h
    end

    private

    attr_accessor :request
    attr_accessor :parser
  end
end

ParseCurl.parse(ARGV.first)

Note how each class contains knowledge specific to the argument we're trying to cover with it and nothing else. I think that's an improvement since it's still easy to support new arguments, and we don't pollute a single class with many methods.