Olá, sou o Jônatas Davi Paganini e esse é meu blog. Sou programador, tenho alguns projetos no github e escrevo livremente aqui no ideia.me.

Veja minhas talks ou conecte-se via twitter / github / instagram / facebook / linkedin / soundcloud / strava / meetup.

Search in Ruby AST

Some months ago I started coding some cops for RuboCop.

I was amazed in my first contact with the Ruby AST and what I can do with that.

After some time looking for other cops, I discovered the NodePattern and it was an amazing experience. It reminds the mind blowing as when I learned regular expressions, and now I got the same in the Abstract Syntax Tree.

To check how the AST works, please install the gem parser. It brings a helpful command ruby-parse that can print the AST representation of some Ruby code.

Let's start looking for the AST and how it works. The following class declaration:

class SimpleClass

Looking the AST representation with the ruby-parse utility:

$ ruby-parse simple_class.rb
  (const nil :SimpleClass) nil nil)

I created a small project to help me understand and do experiments with Ruby AST and test the node pattern. You can check out it and follow this step by step:

$ git clone git@github.com:jonatas/rubocopular.git

Then basically it has a bin/console that allows you to play with to simple methods node and test.

Lets check how it works:

$ cd rubocopular
$ bin/console

node is a method that can parse the AST representation of some Ruby as string code.

node('a = 1') # => s(:lvasgn, :a, s(:int, 1))

So, the output is pretty similar with ruby-parse. As it's using pry you can check that the method s is a simple wrap to a node:

show-method s
# From: /Users/jonatasdp/.rbenv/versions/2.3.3/lib/ruby/gems/2.3.0/gems/rubocop-0.48.1/lib/rubocop/ast/sexp.rb @ line 11:
# Owner: RuboCop::AST::Sexp
# Visibility: public
# Number of lines: 3

def s(type, *children)
  Node.new(type, children)

And basically, the node is an object that has a #type and #children. The children can be symbols or other nodes and so on.

node('a = 1')          # => s(:lvasgn, :a, s(:int, 1))
node('a = 1').type     # => :lvasgn
node('a = 1').children # => [:a, s(:int, i)]

Let's check a few more examples:

node('class A ; end')    # => s(:class, s(:const, nil, :A), nil, nil)

Extended class:

node('class A < B; end') # => s(:class, s(:const, nil, :A), s(:const, nil, :B), nil)

Implementing a method:

node('class A < B; def method; end end')
# s(:class,
#   s(:const, nil, :A),
#   s(:const, nil, :B),
#   s(:def, :method,
#     s(:args), nil))

You can use puts to check the same result as ruby-parse outputs:

puts node('class A < B; def method; end end')

The output should be:

  (const nil :A)
  (const nil :B)
  (def :method
    (args) nil))

Then all are single nodes with a #type and possible #children.

A few examples with math:

node('1 + 2 + 3') # s(:send, s(:send, s(:int, 1), :+, s(:int, 2)), :+, s(:int, 3))

Going deeply in some levels:

node('1 + 2 + (3 / 5 * (6 ** 7))')

The output should be a tree a bit more complex:

    s(:int, 1), :+,
    s(:int, 2)), :+,
        s(:int, 3), :/,
        s(:int, 5)), :*,
          s(:int, 6), :**,
          s(:int, 7))))))

Other example with methods:

puts node('a.b.c(d).e(f)')

AST output:

      (send nil :a) :b) :c
    (send nil :d)) :e
  (send nil :f))

Cool yeah? Let's have fun with the Node Pattern 🕶

RuboCop Node Pattern

The node pattern is a cool engine that allows you to match in some AST code using the node pattern. It's pretty similar of use regular expressions for me. Let's come back with our a = 1 assignment example:

node('a = 1') # => s(:lvasgn, :a, s(:int, 1))

Did you remember the test method I mentioned before?

test('(...)', 'a = 1') # => true

Cool, it matches with "something with last children".

Now let's check if it's a :lvasg aka local variable assignment:

test('(:lvasgn)','a = 1') # => nil

It does not match because the node has parameters. Lets specify that it comes with something else using the ... shortcut:

test('(:lvasgn ...)','a = 1') # => true


It's hard sometimes to just understand based on true and false. Let's use $ to capture and go deeply in the comprehension:

test('(:lvasgn $...)','a = 1') # => [:a, s(:int, 1)]

Cool! It captures the children.

Let's ignore the variable name using _ that for me represents "one thing", I don't care the name or value for a while.

test('(:lvasgn _ $...)','a = 1') # => [s(:int, 1)]

We can also capture only the value:

test('(:lvasgn _ (int $_)','a = 1') # => 1

Or the assigned node:

test('(:lvasgn _ $(int _))','a = 1') # => s(:int, 1)

Or capture multiple things independently:

test('(:lvasgn _ ($int $_))','a = 1') # => [:int, 1]

That's cool no?

Let's imagine we want to restrict our search for float or int:

test('(:lvasgn _ (int _))','a = 1') # => true

Testing the expression with a float will not work:

test('(:lvasgn _ (int _))','a = 1.0') # => nil

Now let's introduce {} that we can union patterns:

test('(:lvasgn _ ({int float} _))','a = 1.0') # => true

And we can capture the type of the field as well:

test('(:lvasgn _ (${int float} _))','a = 1.0') # => :float
test('(:lvasgn _ (${int float} _))','a = 1') # => :int

This search thing was amazing for me, and I tried to encapsulate in a search command to allow me to search in code with this expression as I do with grep or ag.

The prototype is in rubocopular project too. I created the bin/search for it:

The search command syntax is pretty simple:

bin/search 'pattern' *path-to-ruby-files

Let's try something:

$ bin/search '(const ... )' lib/*.rb                                                                                                                  11:58:01
lib/rubocopular.rb:5: Rubocopular
lib/rubocopular.rb:7: RuboCop::ProcessedSource
lib/rubocopular.rb:11: RuboCop::ProcessedSource
lib/rubocopular.rb:11: IO
lib/rubocopular.rb:15: RuboCop::AST::Node
lib/rubocopular.rb:16: RuboCop::NodePattern
lib/rubocopular.rb:20: RuboCop::NodePattern

Nice, we can filter a bit more and search only for things under RuboCop scope:

bin/search '(const nil :RuboCop ... )' lib/*.rb                                                                                                     11:59:09
lib/rubocopular.rb:7: RuboCop
lib/rubocopular.rb:11: RuboCop
lib/rubocopular.rb:15: RuboCop
lib/rubocopular.rb:16: RuboCop
lib/rubocopular.rb:20: RuboCop

It's cool, but we can also do it with grep. The difference here is that we're listing the nodes not only the code, than we can print some multiline nodes.

Let's print all check static methods defined in Rubocopular library:

$ bin/search '(defs ... )' lib/*.rb                                                                                                                   12:00:48
lib/rubocopular.rb:6: def self.node(code)
    RuboCop::ProcessedSource.new(code.to_s, 2.3).ast
lib/rubocopular.rb:10: def self.parse_source(path)
    RuboCop::ProcessedSource.new(IO.read(path), 2.3, path)
lib/rubocopular.rb:14: def self.test(pattern, code)
    code = node(code) unless code.is_a?(RuboCop::AST::Node)
lib/rubocopular.rb:19: def self.inspect(pattern, code)
    RuboCop::NodePattern.new(pattern.gsub(/(\.{3}|_)/, '$\1')).match(node(code))

Awesome, it works!

Thanks for reading until here. If you're interested in AST search, I'm working in another library that is my "node pattern" implementation running without RuboCop dependency. The idea is just starting: https://github.com/jonatas/fast

You can do similar things with plain ruby code and arrays to build the matcher:

Fast.match?(s(:send, s(:send, nil, :a), :b), [:send, '...'])) # => true

Checkout the current specs.

In the next step, I'm thinking about do a f() for find and c() to capture things. Not sure exactly how to proceed but the idea is something like:

Fast.match?(ast, f(:send, f(:send, f(:send, c(:send, nil, '_'), '_'), :c), '_'))) # => s(:send, nil, :a)

Any thoughts or ideas about how to be expressive in this search?