Introduction

2 min to complete

Introduction

In Scylla, as opposed to relational databases, the data model is based around the queries and not just around the domain entities. When creating the data model, we take into account both the conceptual data model and the application workflow: which queries will be performed by which users and how often.

The main goal of data modeling in Scylla is to return results fast. To achieve that, we want:

  • Even data distribution: data should be evenly spread across the cluster so that every node holds roughly the same amount of data. Scylla determines which node should store the data based on hashing the partition key. Therefore, choosing a good partition key is crucial. More on this later on.
  • To minimize the number of partitions accessed in a read query: writes are cheap in Scylla. Reads, not so much. To have fast read times we’d ideally have all the data required in a read query, stored in a single Table. Although it’s fine to duplicate data across tables, the data required by a read query should be located in one table.

Things we should NOT focus on:

  • Avoiding data duplication: To get efficient reads we sometimes have to duplicate data. More about that and denormalization later in this lesson. In a later session we learn how to avoid duplication in some cases using Secondary Indexes.
  • Minimizing the number of writes: writes in Scylla aren’t free but they are very efficient and “cheap”. Scylla is optimized for high write throughput. Reads are usually more expensive than writes and are harder to fine tune. We’d usually be ready to increase the number of writes in order to increase read efficiency.

In this lesson and in future lessons we will use an example based around a Veterinary Clinic, named 4Paws Clinic. In this clinic, each animal which is admitted has a connected heart rate monitor, which logs heart rate and other vital information every five seconds.

To report this post you need to login first.