A Little Riak Book for LFE

Buckets and Bucket Types

Although we've been using bucket types and buckets as namespaces up to now, they are capable of more.

Different use-cases will dictate whether a bucket is heavily written to, or largely read from. You may use one bucket to store logs, one bucket could store session data, while another may store shopping cart data. Sometimes low latency is important, while other times it's high durability. And sometimes we just want buckets to react differently when a write occurs.

Quorum

The basis of Riak's availability and tolerance is that it can read from, or write to, multiple nodes. Riak allows you to adjust these N/R/W values (which we covered under Concepts) on a per-bucket basis.

N/R/W

N is the number of total nodes that a value should be replicated to, defaulting to 3. But we can set this n_val to less than the total number of nodes.

Any bucket property, including n_val, can be set by sending a props value as a JSON object to the bucket URL. Let's set the n_val to 5 nodes, meaning that objects written to cart will be replicated to 5 nodes.

curl -i -XPUT "$RIAK/types/default/buckets/cart/props" \
  -H "Content-Type: application/json" \
  -d '{"props":{"n_val":5}}'

You can take a peek at the bucket's properties by issuing a GET to the bucket.

Note: Riak returns unformatted JSON. If you have a command-line tool like jsonpp (or json_pp) installed, you can pipe the output there for easier reading. The results below are a subset of all the props values.

curl "$RIAK/types/default/buckets/cart/props" | jsonpp
{
  "props": {
    ...
    "dw": "quorum",
    "n_val": 5,
    "name": "cart",
    "postcommit": [],
    "pr": 0,
    "precommit": [],
    "pw": 0,
    "r": "quorum",
    "rw": "quorum",
    "w": "quorum",
    ...
  }
}

As you can see, n_val is 5. That's expected. But you may also have noticed that the cart props returned both r and w as quorum, rather than a number. So what is a quorum?

Symbolic Values

A quorum is one more than half of all the total replicated nodes (floor(N/2) + 1). This figure is important, since if more than half of all nodes are written to, and more than half of all nodes are read from, then you will get the most recent value (under normal circumstances).

Here's an example with the above n_val of 5 ({A,B,C,D,E}). Your w is a quorum (which is 3, or floor(5/2)+1), so a PUT may respond successfully after writing to {A,B,C} ({D,E} will eventually be replicated to). Immediately after, a read quorum may GET values from {C,D,E}. Even if D and E have older values, you have pulled a value from node C, meaning you will receive the most recent value.

What's important is that your reads and writes overlap. As long as r+w > n, in the absence of sloppy quorum (below), you'll be able to get the newest values. In other words, you'll have a reasonable level of consistency.

A quorum is an excellent default, since you're reading and writing from a balance of nodes. But if you have specific requirements, like a log that is often written to, but rarely read, you might find it make more sense to wait for a successful write from a single node, but read from all of them. This affords you an overlap

curl -i -XPUT "$RIAK/types/default/buckets/logs/props" \
  -H "Content-Type: application/json" \
  -d '{"props":{"w":"one","r":"all"}}'
  • all - All replicas must reply, which is the same as setting r or w equal to n_val
  • one - Setting r or w equal to 1
  • quorum - A majority of the replicas must respond, that is, “half plus one”.

Sloppy Quorum

In a perfect world, a strict quorum would be sufficient for most write requests. However, at any moment a node could go down, or the network could partition, or squirrels get caught in the tubes, triggering the unavailability of a required nodes. This is known as a strict quorum. Riak defaults to what's known as a sloppy quorum, meaning that if any primary (expected) node is unavailable, the next available node in the ring will accept requests.

Think about it like this. Say you're out drinking with your friend. You order 2 drinks (W=2), but before they arrive, she leaves temporarily. If you were a strict quorum, you could merely refuse both drinks, since the required people (N=2) are unavailable. But you'd rather be a sloppy drunk... erm, I mean sloppy quorum. Rather than deny the drink, you take both, one accepted on her behalf (you also get to pay).

A Sloppy Quorum

When she returns, you slide her drink over. This is known as hinted handoff, which we'll look at again in the next chapter. For now it's sufficient to note that there's a difference between the default sloppy quorum (W), and requiring a strict quorum of primary nodes (PW).

More than R's and W's

Some other values you may have noticed in the bucket's props object are pw, pr, and dw.

pr and pw ensure that many primary nodes are available before a read or write. Riak will read or write from backup nodes if one is unavailable, because of network partition or some other server outage. This p prefix will ensure that only the primary nodes are used, primary meaning the vnode which matches the bucket plus N successive vnodes.

(We mentioned above that r+w > n provides a reasonable level of consistency, violated when sloppy quorums are involved. pr+pw > n allows for a much stronger assertion of consistency, although there are always scenarios involving conflicting writes or significant disk failures where that too may not be enough.)

Finally dw represents the minimal durable writes necessary for success. For a normal w write to count a write as successful, a vnode need only promise a write has started, with no guarantee that write has been written to disk, aka, is durable. The dw setting means the backend service (for example Bitcask) has agreed to write the value. Although a high dw value is slower than a high w value, there are cases where this extra enforcement is good to have, such as dealing with financial data.

Per Request

It's worth noting that these values (except for n_val) can be overridden per request.

Consider a scenario in which you have data that you find very important (say, credit card checkout), and want to help ensure it will be written to every relevant node's disk before success. You could add ?dw=all to the end of your write.

curl -i -XPUT "$RIAK/types/default/buckets/cart/keys/cart1?dw=all" \
  -H "Content-Type: application/json" \
  -d '{"paid":true}'

If any of the nodes currently responsible for the data cannot complete the request (i.e., hand off the data to the storage backend), the client will receive a failure message. This doesn't mean that the write failed, necessarily: if two of three primary vnodes successfully wrote the value, it should be available for future requests. Thus trading availability for consistency by forcing a high dw or pw value can result in unexpected behavior.

Hooks

Another utility of buckets are their ability to enforce behaviors on writes by way of hooks. You can attach functions to run either before, or after, a value is committed to a bucket.

Precommit hooks are functions that run before a write is called. A precommit hook has the ability to cancel a write altogether if the incoming data is considered bad in some way. A simple precommit hook is to check if a value exists at all.

I put my custom Erlang code files under the riak installation ./custom/my_validators.erl.

-module(my_validators).
-export([value_exists/1]).

%% Object size must be greater than 0 bytes
value_exists(RiakObject) ->
  Value = riak_object:get_value(RiakObject),
  case erlang:byte_size(Value) of
    0 -> {fail, "A value sized greater than 0 is required"};
    _ -> RiakObject
  end.

Then compile the file.

erlc my_validators.erl

Install the file by informing the Riak installation of your new code with an advanced.config file that lives alongside riak.conf in each node, then rolling restart each node.

{riak_kv,
  {add_paths, ["./custom"]}
}

Then you need to do set the Erlang module (my_validators) and function (value_exists) as a JSON value to the bucket's precommit array {"mod":"my_validators","fun":"value_exists"}.

curl -i -XPUT "$RIAK/types/default/buckets/cart/props" \
  -H "Content-Type:application/json" \
  -d '{"props":{"precommit":[{"mod":"my_validators","fun":"value_exists"}]}}'

If you try and post to the cart bucket without a value, you should expect a failure.

curl -XPOST "$RIAK/types/default/buckets/cart/keys" \
  -H "Content-Type:application/json"
A value sized greater than 0 is required

You can also write precommit functions in JavaScript, though Erlang code will execute faster.

Post-commits are similar in form and function, albeit executed after the write has been performed. Key differences:

  • The only language supported is Erlang.
  • The function's return value is ignored, thus it cannot cause a failure message to be sent to the client.