5 Strategies for Random Records from DB

Benchmarking different approaches to fetch random database records

random-records-database.png

TLDR: I’m now using Strategy #5

Table of Contents


If you’ve ever needed to collect a random record from a database whether using Rails or another framework/language, you may have found that there are several strategies to choose from.

Why Get Random Records?

Before we dive in, let’s consider why you’d need random records at all. Random behavior is rarely desired in applications since it leads to unpredictable results. But sometimes business requirements demand it.

Common use cases include displaying random content (like a “random image” feature or “random author” on a book tracking site), or testing/debugging to catch edge cases and unexpected behavior.

Elephant programmer analyzing database queries and performance metrics


Strategy #1 - Use RANDOM()

Use the database’s built-in random function (RANDOM() for PostgreSQL/SQLite, RAND() for MySQL/MariaDB). This is the most robust approach, but slow as tables grow—around 260ms on my 1M-record test table.

Pros:

Cons:

Author.order('RANDOM()').limit(1)

Strategy #2 - Pluck & Array.sample

Pluck all IDs from the database and use Ruby’s Array#sample to pick one at random. Faster than Strategy #1 (~50ms on 1M records), but watch out for memory consumption as tables grow.

Pros:

Cons:

Author.find(Author.pluck(:id).sample)

Strategy #3 - find_by with Min/Max ID

A third strategy is to generate a random ID between the minimum and maximum ID values, then use find_by to look up the record. This strategy is extremely fast, taking around 17ms on a table with 1,000,000 records, but it can be brittle if there are deleted records.

Pros:

Cons:

Author.find_by(rand(1..Author.count))

Strategy #4 - Random Offset

Use a random offset to look up a record. Not as fast (~120ms on 1M records), but works with any ID type and needs no error handling.

Pros:

Cons:

Author.offset(rand(1..Author.count)).limit(1)

Strategy #5 - Where with Min/Max

My preferred approach: generate a random ID between min and max values, then use a where lookup. Extremely fast (1-6ms on 1M records) and handles deleted records gracefully.

Pros:

Cons:

# `random_id` Can be memoized / cached,
# but beware of scoping before caching(!!!)
# as you can get empty results where many are expected
random_id = rand(Author.minimum(:id)..Author.maximum(:id))
Author.where("id >= ?", random_id).limit(1)

I used to prefer Strategy #3 (even with occasional deleted records, it stayed fast). Strategy #4 also worked well. However, while answering a StackOverflow question that led to this blog post, I discovered Strategy #5—now my preferred option.

Ruby developer elephant working on database optimization code


Tips and Caveats

A couple of tips to keep in mind when implementing these strategies:

Caching Author.count:

If the exact current count isn’t critical, preload Author.count and memoize it (e.g., as Author.total_count), or cache it in an app-load config. This eliminates the count query overhead and speeds up Strategy #3 to under 5ms and Strategy #4 to about 100ms. Here’s an example of how to do this:

def self.total_count
  @total_count ||= count
end

Be Careful with Optimizations:

However, I would warn against going crazy with optimizations here, such as caching min/max values, as depending on the context, scoping, etc, it may lead to unexpected results. If you use Strategy #5, the bottleneck won’t be in the database, so making 3 quick queries is preferable, unless it really-really matters for your WebScale™ app.

Error Handling:

For error handling, you have two options: retry lookups until finding an existing record, or do multiple Strategy #3 lookups and randomly select from the results. Depending on how many gaps you have in your data set, Strategy #3 may not be the best choice.


Your Thoughts?

Did I miss any approaches, or made a glaring blunder? Please, comment and let me know :)

Published Updated