Wimdu Developers

Code like a Local

Linepipe: Processing Your Data One Step at a Time

Processing collections in Ruby is something we do almost every day. We take an array of elements, map it, sort it, filter it and turn it into a Hash or whatnot. The Enumerable interface is second nature to every seasoned Ruby developer.

What does this fluent interface look like? Let’s put a supermarket shelf as an example.

A supermarket shelf

We are given this task: from all our shelves, take all the non-perishable products that are more expensive than $80, create a 10% discount for them, and return those discounts.

Our Enumerable grok comes to help and spits this out:

1
2
3
4
5
6
shelves
  .map(&:product)
  .flatten
  .reject(&:perishable?)
  .select { |product| product.price > 80 }
  .map { |product| Discount.new(product, percent: 10) }

Six months after having written this code, someone changes the price field to contain dollar cents instead of dollars (all prices get multiplied by 100). You are not aware of this change, but you see the tests that cover this code fail returning more discounts than you expected.

You look at the code and have no idea where the failure is – there are 5 steps in the chain that could have gone wrong. The good old debugging technique, tap, comes to rescue:

1
2
3
4
5
6
shelves
  .map(&:product).tap { |x| puts x.inspect }
  .flatten
  .reject(&:perishable?)
  .select { |product| product.price > 80 }
  .map { |product| Discount.new(product, percent: 10) }

And you try until you get to the #select part, where you see the filtering is not actually filtering anything. Bug catched!

Linepipe: a better way to develop, benchmark and test pipelines of work

Since this kind of chains are something we do all the time, why not stepping up? Let’s say we want the ability to test, and maybe even benchmark, every step of the chain, or just some important steps. So let’s write the algorithm using linepipe:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
linepipe = Linepipe.develop do
  data { shelves }

  step("Get their products") { |data|
    data.map(&:products).flatten
  }

  step("Reject the perishable") { |data|
    data.reject(&:perishable?)
  }.expect('none of them are perishable') { |data|
    data.none? { |product| product.perishable? }
  }

  step("Filter by price") { |data|
    data.select { |product| product.price > 80 }
  }

  step("Generate discounts") { |data|
    data.map { |product| Discount.new(product, percent: 10) }
  }
end

linepipe.output # => [#<Discount ...>, #<Discount ...>, ...]

As we declared it with #develop, expectations will run after every step that has them (like Reject the perishable, which tests that after it no product is perishable).

We can also black-box-test any given step from our unit-test suite, for example:

1
2
step = linepipe["Filter by price"]
expect(step.apply(products)).to eq(filtered_products)

When we’re comfortable with the result, we can switch #develop for #run, which eliminates the overhead of running the expectations and reporting to the standard output:

1
2
3
linepipe = Linepipe.run do
  ...
end

We can even benchmark every step to see how it performs in comparison with other steps. It’s as easy as just calling #benchmark instead of #run:

1
2
3
linepipe = Linepipe.benchmark(10_000) do
  ...
end

Which will output:

Rehearsal ---------------------------------------------------------
Get their products      0.040000   0.000000   0.040000 (  0.043888)
Reject the perishable   0.020000   0.000000   0.020000 (  0.022620)
Filter by price         0.020000   0.000000   0.020000 (  0.015405)
Generate discounts      0.050000   0.000000   0.050000 (  0.050341)
------------------------------------------------ total: 0.130000sec

                            user     system      total        real
Get their products      0.040000   0.000000   0.040000 (  0.035150)
Reject the perishable   0.020000   0.000000   0.020000 (  0.021736)
Filter by price         0.010000   0.010000   0.020000 (  0.016627)
Generate discounts      0.060000   0.000000   0.060000 (  0.058812)

So we can get an overview of how the work distributes among the steps (and maybe parallelize some of them, etc).

Contribute!

Linepipe is Open Source and on Github, so feel free to use it in your projects, and submit Pull Requests to make it better! :)

Josep M. Bach (Txus)

Josep M. Bach (Txus)