Although we've been using bucket types and buckets as namespaces up to now, they are capable of more.
Different use-cases will dictate whether a bucket is heavily written to, or largely read from. You may use one bucket to store logs, one bucket could store session data, while another may store shopping cart data. Sometimes low latency is important, while other times it's high durability. And sometimes we just want buckets to react differently when a write occurs.
The basis of Riak's availability and tolerance is that it can read from, or write to, multiple nodes. Riak allows you to adjust these N/R/W values (which we covered under Concepts) on a per-bucket basis.
N is the number of total nodes that a value should be replicated to, defaulting to 3. But we can set this n_val
to less than the total number of nodes.
Any bucket property, including n_val
, can be set by sending a props
value as a JSON object to the bucket URL. Let's set the n_val
to 5 nodes, meaning that objects written to cart
will be replicated to 5 nodes.
curl -i -XPUT "$RIAK/types/default/buckets/cart/props" \
-H "Content-Type: application/json" \
-d '{"props":{"n_val":5}}'
You can take a peek at the bucket's properties by issuing a GET to the bucket.
Note: Riak returns unformatted JSON. If you have a command-line tool like jsonpp (or json_pp) installed, you can pipe the output there for easier reading. The results below are a subset of all the props
values.
curl "$RIAK/types/default/buckets/cart/props" | jsonpp
{
"props": {
...
"dw": "quorum",
"n_val": 5,
"name": "cart",
"postcommit": [],
"pr": 0,
"precommit": [],
"pw": 0,
"r": "quorum",
"rw": "quorum",
"w": "quorum",
...
}
}
As you can see, n_val
is 5. That's expected. But you may also have noticed that the cart props
returned both r
and w
as quorum
, rather than a number. So what is a quorum?
A quorum is one more than half of all the total replicated nodes (floor(N/2) + 1
). This figure is important, since if more than half of all nodes are written to, and more than half of all nodes are read from, then you will get the most recent value (under normal circumstances).
Here's an example with the above n_val
of 5 ({A,B,C,D,E}). Your w
is a quorum (which is 3
, or floor(5/2)+1
), so a PUT may respond successfully after writing to {A,B,C} ({D,E} will eventually be replicated to). Immediately after, a read quorum may GET values from {C,D,E}. Even if D and E have older values, you have pulled a value from node C, meaning you will receive the most recent value.
What's important is that your reads and writes overlap. As long as r+w > n
, in the absence of sloppy quorum (below), you'll be able to get the newest values. In other words, you'll have a reasonable level of consistency.
A quorum
is an excellent default, since you're reading and writing from a balance of nodes. But if you have specific requirements, like a log that is often written to, but rarely read, you might find it make more sense to wait for a successful write from a single node, but read from all of them. This affords you an overlap
curl -i -XPUT "$RIAK/types/default/buckets/logs/props" \
-H "Content-Type: application/json" \
-d '{"props":{"w":"one","r":"all"}}'
all
- All replicas must reply, which is the same as setting r
or w
equal to n_val
one
- Setting r
or w
equal to 1
quorum
- A majority of the replicas must respond, that is, “half plus one”.In a perfect world, a strict quorum would be sufficient for most write requests. However, at any moment a node could go down, or the network could partition, or squirrels get caught in the tubes, triggering the unavailability of a required nodes. This is known as a strict quorum. Riak defaults to what's known as a sloppy quorum, meaning that if any primary (expected) node is unavailable, the next available node in the ring will accept requests.
Think about it like this. Say you're out drinking with your friend. You order 2 drinks (W=2), but before they arrive, she leaves temporarily. If you were a strict quorum, you could merely refuse both drinks, since the required people (N=2) are unavailable. But you'd rather be a sloppy drunk... erm, I mean sloppy quorum. Rather than deny the drink, you take both, one accepted on her behalf (you also get to pay).
When she returns, you slide her drink over. This is known as hinted handoff, which we'll look at again in the next chapter. For now it's sufficient to note that there's a difference between the default sloppy quorum (W), and requiring a strict quorum of primary nodes (PW).
Some other values you may have noticed in the bucket's props
object are pw
, pr
, and dw
.
pr
and pw
ensure that many primary nodes are available before a read or write. Riak will read or write from backup nodes if one is unavailable, because of network partition or some other server outage. This p
prefix will ensure that only the primary nodes are used, primary meaning the vnode which matches the bucket plus N successive vnodes.
(We mentioned above that r+w > n
provides a reasonable level of consistency, violated when sloppy quorums are involved. pr+pw > n
allows for a much stronger assertion of consistency, although there are always scenarios involving conflicting writes or significant disk failures where that too may not be enough.)
Finally dw
represents the minimal durable writes necessary for success. For a normal w
write to count a write as successful, a vnode need only promise a write has started, with no guarantee that write has been written to disk, aka, is durable. The dw
setting means the backend service (for example Bitcask) has agreed to write the value. Although a high dw
value is slower than a high w
value, there are cases where this extra enforcement is good to have, such as dealing with financial data.
It's worth noting that these values (except for n_val
) can be overridden per request.
Consider a scenario in which you have data that you find very important (say, credit card checkout), and want to help ensure it will be written to every relevant node's disk before success. You could add ?dw=all
to the end of your write.
curl -i -XPUT "$RIAK/types/default/buckets/cart/keys/cart1?dw=all" \
-H "Content-Type: application/json" \
-d '{"paid":true}'
If any of the nodes currently responsible for the data cannot complete the request (i.e., hand off the data to the storage backend), the client will receive a failure message. This doesn't mean that the write failed, necessarily: if two of three primary vnodes successfully wrote the value, it should be available for future requests. Thus trading availability for consistency by forcing a high dw
or pw
value can result in unexpected behavior.
Another utility of buckets are their ability to enforce behaviors on writes by way of hooks. You can attach functions to run either before, or after, a value is committed to a bucket.
Precommit hooks are functions that run before a write is called. A precommit hook has the ability to cancel a write altogether if the incoming data is considered bad in some way. A simple precommit hook is to check if a value exists at all.
I put my custom Erlang code files under the riak installation ./custom/my_validators.erl
.
-module(my_validators).
-export([value_exists/1]).
%% Object size must be greater than 0 bytes
value_exists(RiakObject) ->
Value = riak_object:get_value(RiakObject),
case erlang:byte_size(Value) of
0 -> {fail, "A value sized greater than 0 is required"};
_ -> RiakObject
end.
Then compile the file.
erlc my_validators.erl
Install the file by informing the Riak installation of your new code with an advanced.config
file that lives alongside riak.conf
in each node, then rolling restart each node.
{riak_kv,
{add_paths, ["./custom"]}
}
Then you need to do set the Erlang module (my_validators
) and function (value_exists
) as a JSON value to the bucket's precommit array {"mod":"my_validators","fun":"value_exists"}
.
curl -i -XPUT "$RIAK/types/default/buckets/cart/props" \
-H "Content-Type:application/json" \
-d '{"props":{"precommit":[{"mod":"my_validators","fun":"value_exists"}]}}'
If you try and post to the cart
bucket without a value, you should expect a failure.
curl -XPOST "$RIAK/types/default/buckets/cart/keys" \
-H "Content-Type:application/json"
A value sized greater than 0 is required
You can also write precommit functions in JavaScript, though Erlang code will execute faster.
Post-commits are similar in form and function, albeit executed after the write has been performed. Key differences: