This lesson covers “Non-Prepared Statements” and “Non-Token Aware Request” in ScyllaDB.
In this class we’re going to analyze the data modeling that will be used throughout the entire section and show you how to use the Scylla Monitoring Dashboard to identify problems related to Non-Prepared Statements queries.
Introduction
Welcome to How to Write Better Apps Section!
Today we’re gonna be talking about “Non-Prepared Statements” and “Non-Token Aware Request” in ScyllaDB.
At this class we’re going to analyze the data modeling that will be used throughout the entire section and show you how to use the Scylla Monitoring Dashboard to identify problems related to Non-Prepared Statements queries.
Ok, so let’s say that we’re building a twitter clone… What would be our MVP in this case?
We will be working two major features: the Tweets and the Timeline.
The “Tweet Feature” only have the tweet_id as our partition key since it’s not a complex query.
However at the “Timeline Feature”, we’re gonna use the “username” as the partition key and cluster it by a timeuuid in descending order to make sure to retrieve the latest tweets on each query.
When someone tweets something, all of his followers should receive the new tweet in their individual timeline.
Now, imagine that you have 1 MILLION followers and you tweet something, it will be 1 million new rows in scylla… Your application and database must be monstrously fast to process and ingest the whole data.
And every time that something enters into our timeline, we also need to be quicker to retrieve this data to the user. Let’s say that every scrolling you will be fetching the fifty-first latest tweets from your private timeline.
First Gauge: CQL Non-Prepared Statements
Ok, now let’s jump into the first monitoring Gauge: the Non-Prepared Statement one.
Sometimes, when you’re rushing to develop some MVP, you’ll forget to pay attention to “good practices” in development, like an unprepared statement query…
It happens sometimes, but in Scylla if you just FORGET it there, it will turn into a massive performance problem.
And at this scenario, we forgot two insert queries without prepared statements and suddenly our system started to get slower and process less queries.
Before we fix it, it’s important to remember that this gauge should ALWAYS stay at 0%, meaning that there is not a single unprepared statement running in our application.
This is how a “non-prepared statement” looks like. You can just append the values inside the string and send to scylla.
And then the coordinator will be receiving the statement, look at his query statement cache and see if there’s an identical query there.
After realize that this one is a new query, he will be adding at the cached queries list, process the data to check if everything matches and then insert it.
Now imagine that you will be doing this “3 steps” for each query that you send. The statement will always be something different since we’re sending raw uuids in the query scope.
Or you can prepare your statements saying that your query has only “placeholders”, that will be cached once and reused in the future. Also, the data binding and filtering will be done in the driver side as well. So now the coordinator will have only one task: insert the data.
Now you can compare how faster your database is after this simple change.
Second Gauge: Non-Token Aware Queries
Now let’s talk about the Token Aware policy.
Now let’s talk about the Token Aware policy. Imagine you have a 5-node cluster with a replication factor of 3.
Token Aware is the policy which ensures that your request will hit one of the 3 nodes that is responsible for storing the data being accessed. Without it enabled in an environment with more nodes than configured as the replication factor, you will hit the 1 replica that hold different partitions and this replica will point you to the right node that your data is placed.
Depending on the driver you are using, you will find the TokenAware as a policy and you should use it together with the RoundRobin policy, which helps your load balancer by optimizing the even data distribution among the nodes.
The Rust driver uses RoundRobin and TokenAware by default. You have the option to use the DefaultPolicy builder and create your own depending on your project needs.
After enabling it, the policy will ensure to understand the topology of the cluster and send the request to the right node every time.
There will be always a number ticking in this gauge if you have more nodes than the specified at your Keyspace Replication Factor, because coordinator will be writing data in the remaining replicas, but never reading from it.
It’s not a problem if it’s stays in lower percentages.
Instead just showing you some graphics, let’s jump to the real world application with real numbers.
Real world with Scylla Monitoring Stack
I’m running locally the Scylla monitoring and also a three-node cluster. As you can see here, my application is getting 7.6k requests per second. We have some huge read latency, and if we jump into the SQL dashboard, you can find that all of our queries are running without prepared statements. Another thing to pay attention to is how many requests are hitting the cache and how many are missing the cache layer.
If we switch to the second example, which runs the prepared statements, you’ll see a significant difference. In this example, I’m preparing the statements before sending them. Let’s see how this affects our payload and the data being sent.
After giving it some time to cool down, we can see that our P99 dropped, the average read and write times also dropped, and the amount of requests increased. The hits on the cache went to zero because it was overloaded. In the SQL dashboard, there’s no single non-prepared statement running, meaning every request is now aware.
I hope you enjoyed this, and see you next class.