Archive for July, 2010

couchdb bulk document transfer

July 17, 2010 1 comment

I have a few test and development servers scattered around at various locations, some of which are there just because of convenience for the different people and locations I work with. I needed to copy some databases from one couchdb server to the other, but since the servers weren’t running the same version of couchdb I couldn’t just copy the files. The bulk document functionality almost gets me there, but it has extra junk besides just the bare documents. I also wanted to be able to remove the _rev tags, since in this case they are just an extra headache (I’m deleting the database and loading it fresh from the other site).

My solution was to write a little method in PHP to dump a database and write out the contents to a file. The approach I used was sparked by the specific project I was working on when I needed it, so it is written based on Zend Framework. I’m just using Zend_HTML for my CouchDB interaction, which enables working with couchdb very nicely.

First, we need to get the data from CouchDB.

       $client = new Zend_Http_Client( );
        $resp = $client->setUri( $this->URI . '/' . $db . '/_all_docs?include_docs=true' )
                ->request( 'GET' );
        $body = $resp->getBody( );

        if( $resp->getStatus( ) != 200 ){
            echo '<div class="error">Status: ' . $resp->getStatus( ) . "<br>Did you spell the database name right?<br><pre>$body</pre></div>"; 

We’re using Zend_Http_Client to connect to CouchDB. We set the URI, which we have in our class constructor. I’m getting it from Zend_Config, and piecing together the login credentials if they are in the config, but that’s a different post. The $db is just the database name we’re dumping. If the status returned in the response isn’t 200, that almost always means (in this specific context) that we asked for a database that isn’t there ($db doesn’t exist in CouchDB), so since we can’t continue we show an error and exit.

So now we have a big glob of JSON in $body, but it isn’t formatted the way we need it to load, and we need a file to write this into. One side note here: I’m assuming the size of our database is fairly moderate. My use case was a couple thousand documents, and this approached worked well. You would need to handle it in pieces if you had more data than you could process at once.

So in the next part we convert the JSON to an array so we can iterate through it in PHP, and we drop the parts we don’t care about. We only need the part that’s in “rows.” CouchDB gives us extra information about the number of rows in the data and some other stuff we aren’t going to use here.

        $data = Zend_Json::decode( $body );
        $total_rows = $data['total_rows'];
        $data = $data['rows'];

I use Zend_Json::decode instead of just json_decode here because it converts to a nice array instead of to an object, and its just nicer in this situation to work with.

        $resource = fopen( $outpath, 'w' );

        $count = 0;
        $comma = "{\"docs\": [\n";
        foreach( $data as $key => $value ){
            if( !$preserve_rev ) unset( $value['doc']['_rev'] ); // we aren't going to use this to update, so we don't need the _rev
            $docs = $comma . Zend_Json::encode( $value['doc'] );
            fwrite( $resource, $docs );
            unset( $data[$key] );
            $comma = ",\n";
        fwrite( $resource, "\n]}" );
        fclose( $resource );
        $filesize = filesize( $outpath );
        echo "<div>CouchDB reported $total_rows rows.  I wrote $count rows to $outfile, with a resulting file size of $filesize bytes.</div>";

The first line here is opening a file so we can write to it. Make sure you check if the file exists or not and that you are safely handling the file name first.

We’re using $count and $total_rows (from that last section) to report on the results at the end.

We add the JSON opening to the front of our file (the first $comma setting does that), and iterate through the array, sticking each document from the data into the file, separating each document with a comma. We unset the documents from the array as we use them, just to tidy up as we go.

After we’ve gone through the whole array we close the JSON array and object, close the file, get the filesize to report, and output a summary of what we did.

This results in a file at $outpath that we can move to the destination server and just load in. Typically for my case it means I copy the file, delete and recreate the database, and push the file in as the new data. So far I’ve done that from the command line with curl, an example of why CouchDB is so easy to work with.

Categories: couchDB, Zend Framework