Data Model and Architecture Overview

7 min to complete

To register a user click here:

To join our community slack channel click here:


The data in ScyllaDB, is stored in a set of rows that are organized as tables. Each of the rows has a primary key which identifies it. The data, is partitioned by this primary key. Data can be  retrieved according to the primary key. Keyspaces, are the highest level of the data model. They typically contain many Tables. Tables, are defined within the Keyspaces. They contain a set of columns and a defined primary key. The data is stored in a set of rows. Columns define the data structure in the table. Each column has a defined data type, such as Integer, Boolean, Text and so on. We’ll elaborate about this in the Data Modeling course.

To understand how ScyllaDB offers high availability, we need to understand some basic terms first. A Node is a unit of storage in ScyllaDB. It is comprised of the ScyllaDB database server software running on a computer server. A cluster is a collection of nodes that ScyllaDB uses to store the data. A minimum cluster typically consists of at least three nodes. ScyllaDB replicates data according to a replication strategy defined by the user. This strategy will determine the placement of the replicated data.

ScyllaDB runs nodes in a hash ring. All nodes are equal, there are no master, slave, or replica sets. The Replication Factor, or RF, is equivalent to the number of nodes where data is replicated. Data is typically replicated to multiple nodes. An RF of one, means there is only one copy of a row in a cluster, and there is no way to recover the data if the node is compromised or goes down. An RF of 2, means that there are two copies of a row in a cluster. An RF of at least three is used in most systems. Data is always replicated automatically. Read or write operations can occur to data stored on any of the replicated nodes. The Consistency Level, or CL, determines how many replicas in a cluster must acknowledge read or write operations before it is considered successful.

In the example above, our client sends a request to write partition 1 to node V; 1’s data is replicated to nodes W, X and Z, as we have a replication factor, or RF, of 3. We will go into this in more depth and discuss these terms in the High Availability lesson.

ScyllaDB is a database that scales out and up. ScyllaDB adopted much of its distributed scale out design, from the Apache Cassandra project, which adopted distribution modeling concepts from Google BigTable. In the world of big data, a single node cannot hold the entire data set, and thus, a cluster of nodes is needed. A ScyllaDB Cluster is a collection of nodes, or ScyllaDB instances, visualized as a ring. A token is a value in a range, used to identify both nodes and  partitions. The partition key, is the unique identifier for a partition, and is represented as a token which is hashed from the primary key.

A partition, is a subset of data, that is stored on a node and replicated across nodes. On the physical layer, a partition is a unit of data stored on a node and is identified by a partition key. A partition key, is the primary means of looking up a set of rows that comprise a partition. A node in the cluster that stores a given partition, as well as to distribute data across nodes in a cluster. The partitioner, or partition hash function, using a partition key, determines where data is stored on a given node in the cluster. It does this by computing a token for each partition key. The hashed output of the partition key, determines its  placement within the cluster. In the Architecture lesson we will take a closer look at the ScyllaDB design and architecture.

The next part of the course is the Quick Wins lesson. Here we’ll dive right in. You’ll see how easy it is to install and run ScyllaDB and to run some basic database queries. You’ll get a chance to try it yourself using some hands-on examples. The ScyllaDB community is available  to help you whenever you need it. Your first assignment is to register a user here. Also join our community Slack channel here.

Thanks and enjoy the rest of the course.