Rust and ScyllaDB: Getting Started

16 Min to complete

ScyllaDB and Rust

In this lesson, you’ll build a simple Rust application that will connect to a ScyllaDB cluster and perform basic queries. For establishing communication between the application and the ScyllaDB server, you will use the scylla-rust-driver, which is an open-source ScyllaDB driver for Rust.

The ScyllaDB Rust Driver is a client-side driver for ScyllaDB written in pure Rust with a fully async API using Tokio. Although optimized for ScyllaDB, the driver is also compatible with Apache Cassandra®.

Starting ScyllaDB in Docker

Before starting the cluster, make sure the aio-max-nr value is high enough (1048576 or more). 

This parameter determines the maximum number of allowable Asynchronous non-blocking I/O (AIO) concurrent requests by the Linux Kernel, and it helps ScyllaDB perform in a heavy I/O workload environment.

Check the value: 

cat /proc/sys/fs/aio-max-nr

If it needs to be changed:

echo "fs.aio-max-nr = 1048576" >> /etc/sysctl.conf
sysctl -p /etc/sysctl.conf

If you haven’t done so yet, download the example from git:

git clone https://github.com/scylladb/scylla-code-samples.git
cd scylla-code-samples/Rust_Scylla_Driver/ps-logger/

To quickly get ScyllaDB up and running, use the official Docker image:

docker run \
  -p 9042:9042/tcp \
  --name some-scylla \
  --hostname some-scylla \
  -d scylladb/scylla:5.2.0 \
   --smp 1 --memory=750M --overprovisioned 1

Note that in this lesson, it is assumed that the ScyllaDB instance is run on a local machine. 

Wait a few seconds until the node is up. Use cqlsh to create a keyspace and table on the ScyllaDB server:

docker exec -it some-scylla cqlsh

Data Schema

The application will be able to store and query temperature time-series data. Each measurement will contain the following information:

  • The sensor ID for the sensor that measured the temperature
  • The time the temperature was measured
  • The temperature value 

First, create a keyspace called tutorial:

CREATE KEYSPACE IF NOT EXISTS tutorial
  WITH REPLICATION = {
    'class': 'SimpleStrategy',
    'replication_factor': 1
};

As this is just an example, you’ll use SimpleStrategy with a single datacenter. The cluster has a single node, so set the Replication Factor to one.

Keep in mind that SimpleStrategy should not be used in production.

Based on the desired query being the temperature reported by a specific device for a given time interval, create the following table:

CREATE TABLE IF NOT EXISTS tutorial.temperature (
  device UUID,
  time timestamp,
  temperature smallint,
  PRIMARY KEY(device, time)
);

You can learn more about Basic Data Modeling here

The application you’re building will be able to query all temperatures measured by a given device within a selected time frame. That’s why you will use the following SELECT query:

where ? will be replaced with actual values – device ID, time-from, and time-to, respectively.

Next exit the CQL Shell:

exit

Rust and Connection to the DB

If you don’t already have Rust and Cargo installed, go ahead and install it using the rustup.rs toolchain:

curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh 

This will help you to install the Rust compiler with other helpful tools.

The application name is temperature, and the required dependencies are defined in the Cargo.toml file:

Where:

  • uuid – Package that provides UUID.
  • tokio Provides the async runtime to execute database queries in.
  • scylla – Rust ScyllaDB/Casandra driver.
  • chrono – Package for working with time.

The file `/src/result.rs` contains just:

You’ll use this Result type in the application code to make error handling easier. It allows you to
return a result of a generic type `T` and any error that can be converted to a Box<Error>.

Now it’s possible to declare the `result` module and use the `Result` type in `/src/main.rs`:

The `main` function works asynchronously by using `tokio`. The following makes sure it returns the result:

The file `/src/db.rs` will hold the logic for working with the ScyllaDB instance. The first step is to establish a database session.
For this example, you won’t use authentication:

Note: See here for an example on user authentication.

The file `/src/main.rs` imports the `db` module:

And then it initializes the session like so:

With this, you’ll use the `SCYLLA_URI` environment variable or `127.0.0.1:9042` if not provided.

Notice the `.await` after `create_session`. That’s because `async` functions return a Future. Futures can be await-ed inside other `async` functions to get their actual value, which in this case is `Result<Session, Error>`. And lastly, with the `?` after `await` we are making sure that if we get back an error instead of a session from `create_session`, the error will be propagated up, and the application will terminate, printing the error.

**Note**: In this lesson, there is a single node, so choosing a load balancing strategy doesn’t make a difference. The available load balancing strategies are listed in the documentation. By default, `Token aware Round robin` is used.

Next, the file `/src/db.rs`, defines functions for creating the keyspace and table to store temperature measurements. You’ll use queries for creating the keyspace and a table:

The `initialize` function in `main` is used to create the keyspace and table:

The file `/src/temperature_measurement.rs` defines a structure that will represent a single temperature measurement:

Next, the file `/src/duration.rs` is the custom implementation of `Duration` that can be both serialized and deserialized at the same time:

As part of standard `Debug`, two extra traits are derived – `FromRow` and `ValueList` provided by the driver.
`FromRow` allows you to convert the database rows into instances of `TemperatureMeasurement` and `ValueList` allows you to use
an instance of `TemperatureMeasurement` as a value argument to a query, instead of listing all of its fields separately.
Derive during the compilation time auto-generates code according to associated procedural macros. Procedural macros and derive is an advanced topic that is out of the scope of this lesson.

The file `/src/db.rs`, defines the insert query. ScyllaDB will use each value as a replacement for ?:

Here `ValueList` allows you to use query string templates with ? as a placeholder for dynamic values. The values themselves are provided by providing
an instance of the struct with fields named the same way.

Reading Measurements

Next, the select-query logic is defined in the `/src/db.rs` module:

The important steps are:

  • Make a select query with the specified parameters (device ID, start and end date).
  • Await the response and convert it into rows.
  • The rows might be empty, `unwrap_or_default` ensures that you will get an empty `Vec` if that’s the case.
  • Once the rows are obtained, convert each row by using `into_typed::<TemperatureMeasurement>()`, which will use the `FromRow` derive macro.
  • Since `into_typed` returns a `Result`, that means converting each result might fail, with `.map(|v| v.map_err(From::from))` you ensure that each row’s error will be converted to the generic error defined in `/src/result.rs`.
  • Finally, `collect` saves the iterated values to a vector.

Now, back in `/src/main.rs` you can see the rest of the `main` function, imports, and modules:

To run the example:

cargo run

Conclusion

In this lesson, you created a simple Rust application that allowed you to connect to a one node ScyllaDB cluster, store, and select temperature measurements from sensors.

Future topics not discussed in this lesson include query preparation, query batching, and execution of prepared queries. Another topic that is left aside is multi-node clusters. Stay tuned for more lessons and check out the scylla-rust-driver examples and documentation.

fa-angle-up