Couchzilla

Couchzilla – CouchDB/Cloudant access for Julians.

Philosophy

We've tried to wrap the CouchDB API as thinly as possible, hiding the JSON and the HTTP but no overwrought abstractions on top. That means that a CouchDB JSON document is represented as the corresponding de-serialisation into native Julia types:

{
  "_id": "45c4affe6f40c7aaf0ba533f7a6601a2",
  "_rev": "1-47e8deed9ccfcf8d061f7721d3ba085c",
  "item": "Malus domestica",
  "prices": {
    "Fresh Mart": 1.59,
    "Price Max": 5.99,
    "Apples Express": 0.79
  }
}

is represented as

Dict{UTF8String,Any}(
  "_rev"   => "1-47e8deed9ccfcf8d061f7721d3ba085c",
  "prices" => Dict{UTF8String,Any}("Fresh Mart"=>1.59,"Price Max"=>5.99,"Apples Express"=>0.79),
  "_id"    => "45c4affe6f40c7aaf0ba533f7a6601a2",
  "item"   => "Malus domestica"
)

Along similar lines, Couchzilla will return CouchDB's JSON-responses simply converted as-is.

CouchDB vs Cloudant

IBM Cloudant offers a clustered version of CouchDB as a service. What started out as a fork has with version 2.0 och CouchDB now largely come back togther, and Cloudant now does (nearly) all its work directly in the Apache CouchDB repos. However, some features of Cloudant makes no sense in the CouchDB context, so there are still some differences. Couchzilla tries to cover both bases, but makes no attempt to hide Cloudant-only functionality when using CouchDB.

The main differences are:

Text indexes - Cloudant integrates with Lucene. CouchDB only has json indexes in its Mango implementation.
Rate capping - as Cloudant sells its service in terms of provisioned throughput capacity, Cloudant will occasionally throw a 429 error indicating that the cap has been hit.
API keys – Cloudant has a separate auth system distinct from CouchDB's _users database.
Geospatial indexes – Cloudant has sophisticated geospatial capabilities which are not present in CouchDB.

Getting Started

Couchzilla defines two types, Client and Database. Client represents an authenticated connection to the remote CouchDB instance. Using this you can perform database-level operations, such as creating, listing and deleting databases. The Database immutable type represents a client that is connected to a specific database, allowing you to perform document-level operations.

Install the library using the normal Julia facilities Pkg.add("Couchzilla").

Let's load up the credentials from environment variables.

username = ENV["COUCH_USER"]
password = ENV["COUCH_PASS"]
host     = ENV["COUCH_HOST_URL"] # e.g. https://accountname.cloudant.com

We can now create a client connection, and use that to create a new database:

dbname = "mynewdb"
client = Client(username, password, host)
db, created = createdb(client, dbname)

If the database already existed, created will be set to false on return, and true means that the database was created.

We can now add documents to the new database using createdoc. It returns an array of Dicts showing the {id, rev} tuples of the new documents:

result = createdoc(db, [
    Dict("name" => "adam",    "data" => "hello"),
    Dict("name" => "billy",   "data" => "world"),
    Dict("name" => "cecilia", "data" => "authenticate"),
    Dict("name" => "davina",  "data" => "cloudant"),
    Dict("name" => "eric",    "data" => "blobbyblobbyblobby")
])

5-element Array{Any,1}:
 Dict{String,Any}(Pair{String,Any}("ok",true),Pair{String,Any}("rev","1-783f91178091c10cce61c326473e8849"),Pair{String,Any}("id","93790b75ed6a59e5002cb0eddb78b42d"))
 Dict{String,Any}(Pair{String,Any}("ok",true),Pair{String,Any}("rev","1-9ecba7e9a824a6fdcfb005c454fea12e"),Pair{String,Any}("id","93790b75ed6a59e5002cb0eddb78b69c"))
 Dict{String,Any}(Pair{String,Any}("ok",true),Pair{String,Any}("rev","1-e05530fc65101ed432c5ee457d327952"),Pair{String,Any}("id","93790b75ed6a59e5002cb0eddb78c304"))
 Dict{String,Any}(Pair{String,Any}("ok",true),Pair{String,Any}("rev","1-446bb325003aa6a995bde4e7c3dd513f"),Pair{String,Any}("id","93790b75ed6a59e5002cb0eddb78c867"))
 Dict{String,Any}(Pair{String,Any}("ok",true),Pair{String,Any}("rev","1-e1f2181b3b4d7fa285b4516eee02d287"),Pair{String,Any}("id","93790b75ed6a59e5002cb0eddb78c8a1"))

This form of createdoc creates multiple documents using a single HTTP POST which is the most efficient way of creating multiple new documents.

We can read a document back using readdoc, hitting the CouchDB primary index. Note that reading back a document you just created is normally bad practice, as it will sooner or later fall foul of CouchDB's eventual consistency and give rise to sporadic, hard to troubleshoot errors. Having said that, let's do it anyway, and hope for the best:

id = result[2]["id"]
readdoc(db, id)

Dict{String,Any} with 4 entries:
  "_rev" => "1-9ecba7e9a824a6fdcfb005c454fea12e"
  "name" => "billy"
  "_id"  => "93790b75ed6a59e5002cb0eddb78b69c"
  "data" => "world"

returning the winning revision for the given id as a Dict.

Conflict handling in CouchDB and eventual consistency is beyond the scope of this documentation, but worth understanding fully before using CouchDB in anger.

Query

Mango (also known as Cloudant Query) is a declarative query language inspired by MongoDB. It allows us to query the database in a (slightly) more ad-hoc fashion than using map reduce views.

In order to use this feature we first need to set up the necessary indexes:

mango_index(db, ["name", "data"])

Dict{String,Any} with 3 entries:
  "name"   => "f519be04f7f80838b6a88811f75de4fb83d966dd"
  "id"     => "_design/f519be04f7f80838b6a88811f75de4fb83d966dd"
  "result" => "created"

We can now use this index to retrieve data:

mango_query(db, q"name=davina")

Couchzilla.QueryResult(Dict{AbstractString,Any}[Dict{AbstractString,Any}(Pair{AbstractString,Any}("_rev","1-446bb325003aa6a995bde4e7c3dd513f"),Pair{AbstractString,Any}("name","davina"),Pair{AbstractString,Any}("_id","93790b75ed6a59e5002cb0eddb78c867"),Pair{AbstractString,Any}("data","cloudant"))],"")

The construct q"..." (see @q_str) is a custom string literal type which takes a simplistic DSL expression which gets converted to the actual JSON-representation of a Mango selector. If you are familiar with Mango selectors, you can use the raw JSON expression if you prefer:

mango_query(db, Selector("{\"name\":{\"\$eq\":\"davina\"}}"))

Couchzilla.QueryResult(Dict{AbstractString,Any}[Dict{AbstractString,Any}(Pair{AbstractString,Any}("_rev","1-446bb325003aa6a995bde4e7c3dd513f"),Pair{AbstractString,Any}("name","davina"),Pair{AbstractString,Any}("_id","93790b75ed6a59e5002cb0eddb78c867"),Pair{AbstractString,Any}("data","cloudant"))],"")

There are also coroutine versions of some of the functions that return data from views. If we had many results to process, we could use paged_mango_query in a Julia Task:

for page in @task paged_mango_query(db, q"name=davina"; pagesize=10)
    # Do something with the page.docs array
end

This version uses the limit and skip parameters and issues an HTTP(S) request per page.

Views

A powerful feature of CouchDB are secondary indexes, known as views. They are created using a map function written most commonly in Javascript, and optionally a reduce part. For example, to create a view on the name field, we use the following:

view_index(db, "my_ddoc", "my_view",
"""
function(doc) {
  if(doc && doc.name) {
    emit(doc.name, 1);
  }
}""")

Dict{String,Any} with 3 entries:
  "ok"  => true
  "rev" => "1-b950984b19bb1b8bb43513c9d5b235bc"
  "id"  => "_design/my_ddoc"

To read from this view, use the view_query method:

view_query(db, "my_ddoc", "my_view"; keys=["davina", "billy"])

Dict{String,Any} with 3 entries:
  "rows"       => Any[Dict{String,Any}(Pair{String,Any}("key","davina"),Pair{St…
  "offset"     => 1
  "total_rows" => 5

Cloudant has an interactive tool for trying out Mango Query which is a useful resource:

Cloudant Query demo

Using attachments

CouchDB can store files alongside documents as attachments. This can be a convenient feature for many applications, but it has drawbacks, especially in terms of performance. If you find that you need to store large (say greater than a couple of meg) binary attachments, you should probably consider a dedicated, separate file store and only use CouchDB for metadata.

To write an attachment, use put_attachment, which expects an {id, rev} tuple referencing and existing document in the database and the path to the file holding the attachment:

data = createdoc(db, Dict("item" => "screenshot"))
result = put_attachment(db, data["id"], data["rev"], "test.png", "image/png", "data/test.png")

In order to read the attachment, use get_attachment, which returns an IO stream:

att = get_attachment(db, result["id"], "test.png"; rev=result["rev"])
open("data/fetched.png", "w") do f
  write(f, att)
end

Geospatial queries

One of the fancier aspects of Cloudant is its geospatial capabilities, and Couchzilla provides access to this functionality. Using this it is possible to use Cloudant to answer questions such as "show me all documents that fall within a given radius of a given point". A full description of this capability is beyond the scope of this document, but Cloudant provides rich documentation on the subject.

In order to try out the geospatial stuff using Couchzilla, we first need some data. Cloudant provides an open database that you can replicate into your own account here. It's a database of the locations of reported crimes in the Boston area.

Let's connect Couchzilla to a replica of this database, and run through the examples from Cloudant's geospatial tutorial page. We can re-use the client from before:

geodb = connectdb(client, "crimes")

The database already contains the necessary geospatial indexes. Had this not been the case we could have indexed it using geo_index.

So let's list the first 20 crimes within a radius of 10,000m of the Boston State House:

result = geo_query(geodb, "geodd", "geoidx";
  lat    = 42.357963,
  lon    = -71.063991,
  radius = 10000.0,
  limit  = 200)
result["rows"]

200-element Array{Any,1}:
 Dict{String,Any}(Pair{String,Any}("rev","1-caa129c6e0c9e7667cd401675859da2a"),Pair{String,Any}("id","79f14b64c57461584b152123e38fcf2b"),Pair{String,Any}("geometry",Dict{String,Any}(Pair{String,Any}("coordinates",Any[-71.0666,42.3593]),Pair{String,Any}("type","Point"))))
 Dict{String,Any}(Pair{String,Any}("rev","1-e7c7eb51c49d7e5fab38b33b19542106"),Pair{String,Any}("id","79f14b64c57461584b152123e38c548a"),Pair{String,Any}("geometry",Dict{String,Any}(Pair{String,Any}("coordinates",Any[-71.0646,42.3612]),Pair{String,Any}("type","Point"))))
 Dict{String,Any}(Pair{String,Any}("rev","1-de437f29d19bb55a495693fa40975962"),Pair{String,Any}("id","79f14b64c57461584b152123e38b22cc"),Pair{String,Any}("geometry",Dict{String,Any}(Pair{String,Any}("coordinates",Any[-71.06,42.3616]),Pair{String,Any}("type","Point"))))
 Dict{String,Any}(Pair{String,Any}("rev","1-4c4650e64d0cc0bb01e32a0b5aca2802"),Pair{String,Any}("id","79f14b64c57461584b152123e3917804"),Pair{String,Any}("geometry",Dict{String,Any}(Pair{String,Any}("coordinates",Any[-71.06,42.3616]),Pair{String,Any}("type","Point"))))
 Dict{String,Any}(Pair{String,Any}("rev","1-e557e2555201054b924f618299cb9b64"),Pair{String,Any}("id","79f14b64c57461584b152123e392e828"),Pair{String,Any}("geometry",Dict{String,Any}(Pair{String,Any}("coordinates",Any[-71.06,42.3616]),Pair{String,Any}("type","Point"))))
 Dict{String,Any}(Pair{String,Any}("rev","1-86261a0030776d68d98f805afec21c94"),Pair{String,Any}("id","79f14b64c57461584b152123e38a779d"),Pair{String,Any}("geometry",Dict{String,Any}(Pair{String,Any}("coordinates",Any[-71.0587,42.3594]),Pair{String,Any}("type","Point"))))
 Dict{String,Any}(Pair{String,Any}("rev","1-0892e7f4eb551df2453e9a11b274e190"),Pair{String,Any}("id","79f14b64c57461584b152123e38d6b78"),Pair{String,Any}("geometry",Dict{String,Any}(Pair{String,Any}("coordinates",Any[-71.0587,42.3594]),Pair{String,Any}("type","Point"))))
 Dict{String,Any}(Pair{String,Any}("rev","1-4ce963293c1810c3fc8fe606e9345e8e"),Pair{String,Any}("id","79f14b64c57461584b152123e38ee226"),Pair{String,Any}("geometry",Dict{String,Any}(Pair{String,Any}("coordinates",Any[-71.0587,42.3594]),Pair{String,Any}("type","Point"))))
 Dict{String,Any}(Pair{String,Any}("rev","1-816e850ff5ec2249993675fd568b2e9c"),Pair{String,Any}("id","79f14b64c57461584b152123e3927629"),Pair{String,Any}("geometry",Dict{String,Any}(Pair{String,Any}("coordinates",Any[-71.0587,42.3594]),Pair{String,Any}("type","Point"))))
 Dict{String,Any}(Pair{String,Any}("rev","1-59e512ec186a17dc3e94a3f1d7c13392"),Pair{String,Any}("id","79f14b64c57461584b152123e392867d"),Pair{String,Any}("geometry",Dict{String,Any}(Pair{String,Any}("coordinates",Any[-71.0587,42.3594]),Pair{String,Any}("type","Point"))))
 ⋮
 Dict{String,Any}(Pair{String,Any}("rev","1-be45124918034417ce77adbd99d3d54f"),Pair{String,Any}("id","79f14b64c57461584b152123e38c8ead"),Pair{String,Any}("geometry",Dict{String,Any}(Pair{String,Any}("coordinates",Any[-71.1331,42.3634]),Pair{String,Any}("type","Point"))))
 Dict{String,Any}(Pair{String,Any}("rev","1-e17545f877d7fc1442abe71557ec44c8"),Pair{String,Any}("id","79f14b64c57461584b152123e391c876"),Pair{String,Any}("geometry",Dict{String,Any}(Pair{String,Any}("coordinates",Any[-71.1073,42.3038]),Pair{String,Any}("type","Point"))))
 Dict{String,Any}(Pair{String,Any}("rev","1-50e1dd9b9ad194f90a0fb4f9001d1b43"),Pair{String,Any}("id","79f14b64c57461584b152123e3929889"),Pair{String,Any}("geometry",Dict{String,Any}(Pair{String,Any}("coordinates",Any[-71.0551,42.289]),Pair{String,Any}("type","Point"))))
 Dict{String,Any}(Pair{String,Any}("rev","1-f8407a2467b8fea166aa451994de75da"),Pair{String,Any}("id","79f14b64c57461584b152123e38b682a"),Pair{String,Any}("geometry",Dict{String,Any}(Pair{String,Any}("coordinates",Any[-71.0773,42.2896]),Pair{String,Any}("type","Point"))))
 Dict{String,Any}(Pair{String,Any}("rev","1-459aadf6156187de8c11ecce3b5f1f28"),Pair{String,Any}("id","79f14b64c57461584b152123e38afe98"),Pair{String,Any}("geometry",Dict{String,Any}(Pair{String,Any}("coordinates",Any[-71.0501,42.2897]),Pair{String,Any}("type","Point"))))
 Dict{String,Any}(Pair{String,Any}("rev","1-1d1c012db58954c6d799646e0e009728"),Pair{String,Any}("id","79f14b64c57461584b152123e38b0d38"),Pair{String,Any}("geometry",Dict{String,Any}(Pair{String,Any}("coordinates",Any[-71.0473,42.2902]),Pair{String,Any}("type","Point"))))
 Dict{String,Any}(Pair{String,Any}("rev","1-21dea1eb417bff225b4932acbe983314"),Pair{String,Any}("id","79f14b64c57461584b152123e38c9b44"),Pair{String,Any}("geometry",Dict{String,Any}(Pair{String,Any}("coordinates",Any[-71.1097,42.3042]),Pair{String,Any}("type","Point"))))
 Dict{String,Any}(Pair{String,Any}("rev","1-edd6492692311118baaa8cbb980ef1c5"),Pair{String,Any}("id","79f14b64c57461584b152123e38d51e7"),Pair{String,Any}("geometry",Dict{String,Any}(Pair{String,Any}("coordinates",Any[-71.1341,42.349]),Pair{String,Any}("type","Point"))))
 Dict{String,Any}(Pair{String,Any}("rev","1-13144e283f47d611d62d9f11d94161be"),Pair{String,Any}("id","79f14b64c57461584b152123e39168d7"),Pair{String,Any}("geometry",Dict{String,Any}(Pair{String,Any}("coordinates",Any[-71.135,42.3504]),Pair{String,Any}("type","Point"))))

We can specify a polygon for the Commercial Street corridor, which should yield only two docs:

result = geo_query(geodb, "geodd", "geoidx";
  g="POLYGON ((-71.0537124 42.3681995 0,-71.054399 42.3675178 0,-71.0522962 42.3667409 0,-71.051631 42.3659324 0,-71.051631 42.3621431 0,-71.0502148 42.3618577 0,-71.0505152 42.3660275 0,-71.0511589 42.3670263 0,-71.0537124 42.3681995 0))")
result["rows"]

2-element Array{Any,1}:
 Dict{String,Any}(Pair{String,Any}("rev","1-f0551b24741f182c5944621f87f9ac76"),Pair{String,Any}("id","79f14b64c57461584b152123e38d6349"),Pair{String,Any}("geometry",Dict{String,Any}(Pair{String,Any}("coordinates",Any[-71.0511,42.3651]),Pair{String,Any}("type","Point"))))
 Dict{String,Any}(Pair{String,Any}("rev","1-8a9f1673b2b15232bbbb956a7f3b5397"),Pair{String,Any}("id","79f14b64c57461584b152123e3924516"),Pair{String,Any}("geometry",Dict{String,Any}(Pair{String,Any}("coordinates",Any[-71.052,42.3667]),Pair{String,Any}("type","Point"))))

If you want to delete a database, simply call deletedb:

deletedb(client, dbname)

Dict{String,Any} with 1 entry:
  "ok" => true

Handling Cloudant's rate capping

Cloudant pushes most of its stuff to upstream to Apache CouchDB. However, not everything Cloudant does makes sense for CouchDB, and once such example is throughput throttling. Cloudant, currently only in its Bluemix guise, prices its service in terms of provisioned throughput capacity for lookups, writes and queries. This means that you purchase a certain max number of requests per second, bucketed by type. This is similar in spirit to how other purveyors of database services price their services (e.g. DynamoDB).

When you hit capacity, Cloudant will return an error, signified by the HTTP status code 429 (Too many requests). This means that the request was not successful, and will need to be retried at a later stage. Couchzilla optionally gives you a way to deal with 429 errors:

retry_settings!(;enabled=true, max_retries=5, delay_ms=10)

This will enable the retrying of requests failed with a 429. This will try a request a maximum of 5 times, with a delay of 10 ms added cumulatively, plus a little bit of noise (randomly between 1 and 10 ms). This is a module-global setting, so will apply to all Clients created within the same Julia session.

You can retrieve the current settings using:

retry_settings()

Note that this behaviour is not enabled by default, and relying on it alone on a rate-capped cluster will only help with temporary transgressions – your own code must still handle the case where the max retries are exceeded.

Using Cloudant's API keys for auth

Cloudant has an auth system distinct from the CouchDB traditional style based on the _users database. By using API keys you can grant and revoke a client application's access. API keys have roles attached to them, a combination of _admin, _reader, _writer, _replicator and _creator. It's not quite as straight-forward as it may seem. _reader grants read-only access. TODO

In order to use the API key system, you need two steps:

Create the key using

data = make_api_key(client::Client) 2. Assign key to a database, with the appropriate roles

current = get_permissions(db) result = set_permissions(db, current; key=data["key"], roles=["_reader", "_writer"]) 3. Create a new client connection using the new key

api_client = Client(data["key"], data["password"], host) 4. Create a database connection using the new client

api_db = connectdb(api_client, "dbname")

There is one gotcha here that you need to be aware of. API keys are created on a central Cloudant admin cluster, and then replicated back to the one you're using. This means that running through the four steps above may occasionally fail to authenticate (step 3) for a good few minutes whilst the update percolates through. It helps to treat API keys as something to be created up front, rather than on the fly.

Client

# Couchzilla.Client — Type.

type Client
  url
  cookies

  Client(username::AbstractString, password::AbstractString, urlstr::AbstractString; auth=true) = 
    cookieauth!(new(URI(urlstr)), username, password, auth)
end

The Client type represents an authenticated connection to a remote CouchDB/Cloudant instance.

Couchzilla

Philosophy

CouchDB vs Cloudant

Getting Started

Query

Views

Using attachments

Geospatial queries

Handling Cloudant's rate capping

Using Cloudant's API keys for auth

Client

Database

Views

Mango/Cloudant Query

Attachments

Replication

Geospatial

Auth

Utility stuff