Thursday, July 16, 2009

How Not to Do CouchDB Views

‹prev | My Chain | next›

Tonight, I'm going to find some bugs (dammit). Last night, I indulged in my first look at the assembled Sinatra / CouchDB version of my family cookbook. I found an issue or two, but no crashes. I always have a crash or two.

Boundary conditions are always a good place to probe. I spent much time and wrote many a Cucumber scenario covering boundary conditions of pagination. I seem to have done my job well. Pagination works. Sorting works. Manually entering invalid page numbers even works. Bummer.

I do note a couple of issues in the github issues tracker for my project. I forgot a picture of the meal on the meal page. There are no links between date ordered recipes. The refined query is not always displayed and, when it is, does not supply the correct query parameter. Small little stuff. I'm hunting bigger game.

I finally find something as I am moving between meals-by-month listings. There is a problem when listing meals from August of 2008:



Ah, a type error converting from String to Integer. I knew there had to be one of those somewhere. I am strangely comforted knowing that I finally found it.

The error occurs in Haml template, trying to evaluate one of the meals' dates:
.meals
- @meals.each do |meal|
- date = Date.parse(meal['date'])
I am an old school debugger—no fancy debug mode for me. I stick with print STDERR (or $stderr.print in Ruby terms). I am not sure which meal is causing me trouble, so I dump all of them from July, 2008 in the Sinatra action:
get %r{/meals/(\d+)/(\d+)} do |year, month|
url = "#{@@db}/_design/meals/_view/by_month?group=true&key=%22#{year}-#{month}%22"
data = RestClient.get url
@meals = JSON.parse(data)['rows'].first['value']
@month = "#{year}-#{month}"

raise @meals.pretty_inspect

url = "#{@@db}/_design/meals/_view/count_by_month?group=true"
data = RestClient.get url
@count_by_year = JSON.parse(data)['rows']

haml :meal_by_month
end
What I get from that is:
[[{"title"=>"Star Wars: The Dinner",…},
{"title"=>"Lemony Start for the Weekend",…}],
[{"title"=>"Ideal Pasta for an 8 Year Old Boy",…},
{"title"=>"Spinach Pie",…},
{"title"=>"You Each Take a Side",…}]]
Hunh? An array of arrays? How did that happen?

Sigh. It turns out I wildly abused map-reduce. The reduce function for the meals/by_month design doc (ignore the map function for now):
function(keys, values, rereduce) { values }
The problem is the rereduce, which is set to true when CouchDB is combining intermediate reduce result sets. More info on reduce and rereduce can be found in the documentation. Essentially, when CouchDB is working with large datasets (like the 1,000+ documents in my development database) it divides and conquers—but it expects some help, which is why the rereduce parameter is supplied.

In this case, I need to combine and flatten the arrays of arrays in the values when re-reducing, which can be done with some ugly javascript:
function(keys, values, rereduce) {
if (rereduce) {
var a = [];
for (var i=0; i<values.length; i++) {
for (var j=0; j<values[i].length; j++) {
a.push(values[i][j]);
}
}
return a;
}
else {
return values;
}
}
Yup, that is some pretty ugly Javascript. What makes it even worse is that it is completely unnecessary!

The original problem that I was solving was finding all of the meals in a particular month. I solved this with a by_month map/reduce grouping. More experience with CouchDB has taught me that this can be solved easily, without map/reduce, by using a startkey/endkey on a simple view.

I already have a by_date view (map only). I can grab all of the meals from July 2008 by accessing the by_date view with a startkey of 2008-07-00 and an endkey or 2008-07-99. Demonstrating this in curl:
cstrom@jaynestown:~/repos/eee-code$ curl http://localhost:5984/eee/_design/meals/_view/by_date?startkey=%222008/07/00%22\&endkey=%222008/07/99%22
{"total_rows":500,"offset":491,"rows":[
{"id":"2008-07-10","key":"2008/07/10","value":["2008-07-10","You Each Take a Side"]},
{"id":"2008-07-13","key":"2008/07/13","value":["2008-07-13","Star Wars: The Dinner"]},
{"id":"2008-07-19","key":"2008/07/19","value":["2008-07-19","Lemony Start for the Weekend"]},
{"id":"2008-07-21","key":"2008/07/21","value":["2008-07-21","Spinach Pie"]},
{"id":"2008-07-28","key":"2008/07/28","value":["2008-07-28","Ideal Pasta for an 8 Year Old Boy"]}
]}
Grr... That's a much better solution than my silly map/reduce. Looks like I have some refactoring to do. Refactoring and some warnings to add to old blog posts.

I'll get started on that tomorrow.

1 comment:

  1. The reduce you wrote is not only ugly, but it also goes against the idea of reduce, which is not "combine", but REDUCE. Would be good no point that out.

    ReplyDelete