It can be hard to think outside the table, but once you do, you may find interesting patterns to use in any database, even a relational one.sql-databases
sql-databases. Feel free to use a relational database when you're ↩
willing to sacrifice the scalability, performance, and availability of Riak...but why would you?
If you thoroughly absorbed the earlier content, some of this may feel redundant, but the implications of the key/value model are not always obvious.
As with most such lists, these are guidelines rather than hard rules, but take them seriously.
(@keys) Know your keys.
The cardinal rule of any key/value datastore: the fastest way to get data is to know what to look for, which means knowing which key you want. How do you pull that off? Well, that's the trick, isn't it? The best way to always know the key you want is to be able to programmatically reproduce it based on information you already have. Need to know the sales data for one of your client's magazines in December 2013? Store it in a **sales** bucket and name the key after the client, magazine, and month/year combo. Guess what? Retrieving it will be much faster than running a SQL `SELECT *` statement in a relational database. And if it turns out that the magazine didn't exist yet, and there are no sales figures for that month? No problem. A negative response, especially for immutable data, is among the fastest operations Riak offers. Because keys are only unique within a bucket, the same unique identifier can be used in different buckets to represent different information about the same entity (e.g., a customer address might be in an `address` bucket with the customer id as its key, whereas the customer id as a key in a `contacts` bucket would presumably contain contact information).
(@namespace) Know your namespaces.
Riak has several levels of namespaces when storing data. Historically, buckets have been what most thought of as Riak's virtual namespaces. The newest level is provided by **bucket types**, introduced in Riak 2.0, which allow you to group buckets for configuration and security purposes. Less obviously, keys are their own namespaces. If you want a hierarchy for your keys that looks like `sales/customer/month`, you don't need nested buckets: you just need to name your keys appropriately, as discussed in (@keys). `sales` can be your bucket, while each key is prepended with customer name and month.
(@views) Know your queries.
Writing data is cheap. Disk space is cheap. Dynamic queries in Riak are very, very expensive. As your data flows into the system, generate the views you're going to want later. That magazine sales example from (@keys)? The December sales numbers are almost certainly aggregates of smaller values, but if you know in advance that monthly sales numbers are going to be requested frequently, when the last data arrives for that month the application can assemble the full month's statistics for later retrieval. Yes, getting accurate business requirements is non-trivial, but many Riak applications are version 2 or 3 of a system, written once the business discovered that the scalability of MySQL, Postgres, or MongoDB simply wasn't up to the job of handling their growth.
(@small) Take small bites.
Remember your parents' advice over dinner? They were right. When creating objects that will be updated, constrain their scope and keep the number of contained elements low to reduce the odds of multiple clients attempting to update the data concurrently.
(@indexes) Create your own indexes.
Riak offers metadata-driven secondary indexes (2i) and full-text indexes (Riak Search) for values, but these face scaling challenges: in order to identify all objects for a given index value, roughly a third of the cluster must be involved. For many use cases, creating your own indexes is straightforward and much faster/more scalable, since you'll be managing and retrieving a single object. See [Conflict Resolution](#conflict-resolution) for more discussion of this.
(@immutable) Embrace immutability.
As we discussed in [Mutability], immutable data offers a way out of some of the challenges of running a high-volume, high-velocity datastore. If possible, segregate mutable from non-mutable data, ideally using different buckets for [request tuning][Request tuning]. [Datomic](http://www.datomic.com) is a unique data storage system that leverages immutability for all data, with Riak commonly used as a backend datastore. It treats any data item in its system as a "fact," to be potentially superseded by later facts but never updated.
(@hybrid) Don't fear hybrid solutions.
As much as we would all love to have a database that is an excellent solution for any problem space, we're a long way from that goal. In the meantime, it's a perfectly reasonable (and very common) approach to mix and match databases for different needs. Riak is very fast and scalable for retrieving keys, but it's decidedly suboptimal at ad hoc queries. If you can't model your way out of that problem, don't be afraid to store keys alongside searchable metadata in a relational or other database that makes querying simpler, and once you have the keys you need, grab the values from Riak. Just make sure that you consider failure scenarios when doing so; it would be unfortunate to compromise Riak's availability by rendering it useless when your other database is offline.