Rust and ScyllaDB: Prepared Statements, Paging and Retries

9 Min to complete

A previous lesson, Rust and ScyllaDB: Getting Started, explained how to use the Sylla Rust driver to create applications that interact with ScyllaDB. This lesson covers more advanced topics such as prepared statements, paging, and retries. These topics will be explained and demonstrated using the ScyllaDB Rust driver.

Starting ScyllaDB in Docker

Before starting the cluster, make sure the aio-max-nr value is high enough (1048576 or more).

This parameter determines the maximum number of allowable Asynchronous non-blocking I/O (AIO) concurrent requests by the Linux Kernel, and it helps ScyllaDB perform in a heavy I/O workload environment.

Check the value:

cat /proc/sys/fs/aio-max-nr

If it needs to be changed:

echo "fs.aio-max-nr = 1048576" >> /etc/sysctl.conf

sysctl -p /etc/sysctl.conf

If you haven’t done so yet, download the example from git:

git clone https://github.com/scylladb/scylla-code-samples.git

cd scylla-code-samples/Rust_ScyllaDB_Driver/chat/

To quickly get ScyllaDB up and running, use the official Docker image:

docker run \
  -p 9042:9042/tcp \
  --name some-scylla \
  --hostname rust-scylla \
  -d scylladb/scylla:6.2.0 \
   --smp 1 --memory=750M --overprovisioned 1

Example Application

In this example, you’ll create a console application that reads messages from standard input and puts them into a table in ScyllaDB.

Create the keyspace and the table:


docker exec -it some-scylla cqlsh


CREATE KEYSPACE IF NOT EXISTS log WITH REPLICATION = {
  'class': 'SimpleStrategy',
  'replication_factor': 1
};


CREATE TABLE IF NOT EXISTS log.messages (
  id bigint,
  message text,
  PRIMARY KEY (id)
);

Now, look at the main code of the application:

The application connects to the database, reads some lines from the console, and stores them in the table log.messages. It then reads those lines from the table and prints them.

So far, this is quite similar to what you saw in the previous lesson. Using this application you’ll see how some minor changes can improve the performance of the application.

Prepared statements

In every iteration of the while loop, we want to insert new data into the log.messages table. Doing so naively is inefficient as every call to session.query_unpaged would send the entire query string to the database, which then parses it. To avoid unnecessary database-side calculations, one can prepare a query in advance using the session.prepare method. A call to this method will return a PreparedStatement object, which can be used later with session.execute_iter() / session.execute_single_page() / session.execute_unpaged() to execute the desired request.

What Exactly are Prepared Statements?

A prepared statement is a query parsed by ScyllaDB and then saved for later use. One of the valuable benefits of using prepared statements is that you can continue to reuse the same query while modifying variables in the query to match parameters such as names, addresses, and locations.

When asked to prepare a CQL statement, the client library will send a CQL statement to ScyllaDB. ScyllaDB will then create a unique fingerprint for that CQL statement by MD5 hashing it. ScyllaDB then uses this hash to check its query cache and see if it has already seen it. If it has, it will send back a reference to that cached CQL statement. If ScyllaDB does not have that unique query hash in its cache, it will then proceed to parse the query and insert the parsed output into its cache.

The client will then be able to send and execute a request specifying the statement id (which is encapsulated in the PreparedStatement object) and providing the (bound) variables, as you will see next.

Using Prepared Statements In the Application

Go over the sample code above and modify it to use prepared statements.

The first step is to create a prepared statement (with the help of session.prepare) before the while loop. Next, you need to replace session.query_unpaged with session.execute_unpaged inside the while loop.

After these two steps, the app will reuse the prepared statement insert_message instead of sending raw queries. This significantly improves performance.

Paging

Look at the last lines of the application:

There is a call to the Session::query_unpaged method, and an unprepared select query is sent. Since this query is only executed once, it isn’t worth preparing. However, if we suspect that the result will be large, it might be better to use paging.

What is Paging?

Paging is a way to return a lot of data in manageable chunks.

Without paging, the coordinator node prepares a single result entity that holds all the data and returns it. In the case of a large result, this may have a significant performance impact as it might use up a lot of memory, both on the client and the ScyllaDB side.

To avoid this, when using paging, the results are transmitted in chunks of limited size, one chunk at a time. After transmitting each chunk, the database stops and waits for the client to request the next one. This is repeated until the entire result set is transmitted. This is called paging.

The client can limit the size of the pages according to the number of rows it can contain. If a page reaches the size limit before it reaches the client-provided row limit, it’s called a short page or short read.

Adding Paging to Our App

As you may have guessed by now, Session::query_unpaged does not use paging. It fetches the whole result into memory in one go. An alternative Session method uses paging under the hood – Session::query_iter (Session::execute_iter is another alternative that works with prepared statements). The Session::query_iter method takes a query and a value list as arguments and returns an async iterator (stream) over the result Rows. This is how it is used:

After the query_iter invocation, the driver starts a background task that fetches subsequent rows. The caller task (the one that invoked query_iter) consumes newly fetched rows by using an iterator-like stream interface. The caller and the background task run concurrently, so one of them can fetch new rows while the other consumes them.

By adding paging to the app, you reduce memory usage and increase the application’s performance.

It’s worth noting that you don’t need to rely on the transparent paging provided by the query_iter/execute_iter methods – you can use paging manually with query_single_page / execute_single_page. You can find more information about this in documentation.

Retries

After a query fails, the driver might decide to retry it based on the retry policy and on the query itself. The retry policy can be configured for the whole Session or just for a single query.

Provided Retry Policies

The driver offers two policies to choose from:

Fallthrough Retry Policy – never retries and returns all errors straight to the user.
Default Retry Policy – used by default, might retry if there is a high chance of success.

It is possible to provide a custom retry policy by implementing RetryPolicy and RetrySesssion.

Using Retry Policies

The key to enjoying the benefits of retry policies is to provide more information about query idempotency. A query is idempotent if it can be applied multiple times without changing the result of the initial application. The driver will not retry a failed query if it is not idempotent. The user is expected to mark queries as idempotent, as the driver does not parse query strings.

Mark the app’s select statement as an idempotent one:

By making this change, you will be able to use retries (provided by the default retry policy) in case of a select statement execution error.

Running the Application

Run the application.

Exit the CQL Shell if you are still in it:

exit

And run the app, providing an input line.

echo 'this is a message' | cargo run

The application reads all the lines from stdin, writes them into the database, and prints them back:

If you rerun it, you will see that there are two lines stored now.

Summary

In this lesson, you saw an example chat application that uses ScyllaDB and the Rust driver. In the application, you saw how prepared statements, paging, and retries could be used to improve the app.

The blog post Stop Wasting ScyllaDB’s CPU Time by Not Being Prepared goes into more detail about using prepared statements and the performance impact of using them. Other valuable resources on this topic are the ScyllaDB Specific Drivers Overview lesson on ScyllaDB University and the Rust Driver documentation page about prepared statements.

You can read more about paging on the Paged Queries documentation page. This topic is also covered in the Paging and Shard Awareness lesson at ScyllaDB University.

The Rust Driver Retry Policy Configuration documentation page has more info on this topic.

Previous Lesson

Back to Course

Next Lesson