Taste of Riak

Secondary Indexes

If you're coming from an SQL world, Secondary Indexes (2i) are a lot like SQL indexes. They are a way to quickly look up objects based on a secondary key, without scanning through the whole dataset. This makes it very easy to find groups of related data by values, or even ranges of values. To properly show this off, we will now add some more data to our application, and add some secondary index entries at the same time.

Let's define some utility functions:

(defun format-date
  ((`#(#(,year ,month ,day) #(,hour ,min ,sec)))
    (io_lib:format "~B~2..0B~2..0B~2..0B~2..0B~2..0B"
                   `(,year ,month ,day ,hour ,min ,sec))))

(defun int->bin (integer)
  (list_to_binary
    (integer_to_list integer)))

(defun add-indices (order-key)
  (let* ((`#(ok ,order) (lric:get client-pid order-bucket (int->bin order-key)))
         (order-data (binary_to_term (lrico:get-value order)))
         (order-meta (lrico:get-update-metadata order))
         (index-1 `#(#(binary_index "created-date")
                     (,(format-date (order-created-date order-data)))))
         (md-1 (lrico:set-secondary-index order-meta `(,index-1)))
         (index-2 `#(#(integer_index "salesperson-id")
                     (,(order-salesperson-id order-data))))
         (md-2 (lrico:set-secondary-index md-1 `(,index-2)))
         (order (lrico:update-metadata order md-2)))
    (lric:put client-pid order)))

Let's update those orders:

> (lists:foreach #'add-indices/1 '(1 2 3))

As you may have noticed, ordinary Key/Value data is opaque to 2i, so we have to add entries to the indices at the application level. Now let's find all of Jane Appleseed's processed orders, we'll lookup the orders by searching the saleperson_id_int index for Jane's id of 9000.

> (lric:get-index-eq
    client-pid
    order-bucket
    '#(integer_index "salesperson-id")
    9000)

Which returns:

#(ok #(index_results_v1 (#B(49) #B(51)) undefined undefined))

where #B(49) and #B(51) represent orders 1 and 3, repsectively as can be verified by the following:

> (binary "1")
#B(49)
> (binary "3")
#B(51)

Jane processed orders 1 and 3. We used an “integer” index to reference Jane's id. Next let's use a “binary” index. Let's say that the VP of Sales wants to know how many orders came in during October 2013. In this case, we can exploit 2i's range queries. Let's search the order_date_bin index for entries between 20141201 and 20150201.

> (lric:get-index-range
    client-pid
    order-bucket
    '#(binary_index "created-date")
    (binary "20141201")
    (binary "20150201"))
#(ok #(index_results_v1 (#B(49) #B(50)) undefined undefined))

That gives us orders 1 and 2. Note that by extending the range one more day, we would get all orders:

> (lric:get-index-range
    client-pid
    order-bucket
    '#(binary_index "created-date")
    (binary "20141201")
    (binary "20150202"))
#(ok #(index_results_v1 (#B(51) #B(50) #B(49)) undefined undefined))

Boom! Easy-peasy. We used 2i's range feature to search for a range of values, and demonstrated binary indexes.

So, to recap:

  • You can use Secondary Indexes to quickly lookup an object based on a secondary id other than the object's key.
  • Indices can have either Integer or Binary(String) keys
  • You can search for specific values, or a range of values
  • Riak will return a list of keys that match the index query