A Graph Data System Powered by ScyllaDB and JanusGraph – Part 2

S110: The Mutant Monitoring System (MMS) and Integrations A Graph Data System Powered by ScyllaDB and JanusGraph – Part 2

This is the second lesson on using JanusGraph, a graph data system powered by ScyllaDB as the data storage layer.

In the previous lesson, you learned how to deploy JanusGraph with ScyllaDB as the underlying database. That lesson used Docker, and the main steps were: Spinning up a virtual machine on AWS, running the JanusGraph server, running a Gremlin Console to connect to the new server, spinning up a three-node ScyllaDB cluster, setting it as the data storage for the JanusGraph server, and performing some basic graph operations.

In this lesson, you will learn about the JanusGraph data model, how data is persisted using ScyllaDB as a backend for JanusGraph, and you’ll see an example of persistence in case of server failure.

This lesson does not depend on the previous lesson. However, it is recommended that you start there as it covers more basic concepts not discussed here.

Data Model, JanusGraph and ScyllaDB

The three main building blocks of the data model in JanusGraph are:

Vertices (or Nodes) represent entities. They have labels. However, unlike with edges, vertex labels are optional.
Edges represent relationships between nodes. They connect vertices and have a label that defines the relationship, for example, an edge with the label “father”, would denote that relationship between the two vertices it connects.
Properties on vertices and edges are key-value pairs. Each property key has a data type.

It’s recommended to explicitly define the schema. However, it can also be implicitly defined.

For more information on the data model, see the JanusGraph documentation.

JanusGraph can be used with many databases, including relational ones. When using JanusGraph with ScyllaDB, the benefits are similar to those of using ScyllaDB in general:

Your data becomes highly available with no single point of failure
It’s easy to scale your data storage layer by adding (or removing) nodes, with no downtime required
There is no single point of failure, as all nodes are equal, and there is no master/slave architecture
The system is easy to maintain and auto-tunes. No need to worry about garbage collection and like (as in Cassandra)
Last but not least, ScyllaDB is open source, and you avoid vendor lock-in.

You can learn more about these and other ScyllaDB attributes in the Essentials course.

When working with JanusGraph and ScyllaDB, here are some of the important settings (in the file conf/janusgraph-cql.properties), with example values:

Replication Factor: storage.cassandra.replication-factor=3
Read consistency level: storage.cql.read-consistency-level=LOCAL_QUORUM
Write consistency level: storage.cql.write-consistency-level=LOCAL_QUORUM
Keyspace name: storage.cql.keyspace=scyllaks
Local datacenter: storage.cql.local-datacenter=us-west1

Other settings deal with security, encryption, and so on.You can learn more in the JanusGraph Configuration Reference.

System Setup and Persistence

If you haven’t done so yet, start the cloud server, and set it up, as described in the section System Setup in the previous lesson.

At this point, you should have your three-node ScyllaDB cluster up and running and a running JanusGraph server.

This lesson assumes you are starting with a fresh graph.

Open a new terminal and connect to the server (replace the Public IPv4 DNS address below with the one of your server):

ssh -i ~/Downloads/aws/xxx.pem [email protected]

Next, in this new terminal, connect to one of the ScyllaDB nodes with the cql shell:

docker exec -it scylla-node1 cqlsh

Check which keyspaces are defined:

desc keyspaces

You can see that only the default system keyspaces with the “system” prefix are defined.

Next, in a new terminal tab, open a Gremlin console and connect to the server (you might have to replace the network Driver name):

docker run --rm --network ec2-user_web --link janusgraph-server:janusgraph -e GREMLIN_REMOTE_HOSTS=janusgraph  -it janusgraph/janusgraph:latest ./bin/gremlin.sh

:remote connect tinkerpop.server conf/remote.yaml

:remote console

Now back in the cql shell, reexamine the keyspaces:

desc keyspaces

A “janusgraph” keyspace is defined. Let’s examine it:

use janusgraph;

desc tables

In JanusGraph with ScyllaDB (or Cassandra) as a data storage layer, all the nodes and edges are stored in a table called edgestore.

select * from edgestore;

Since our graph is empty, you can see that the table has no rows.

JanusGraph stores data using the Bigtable data model:

Source: JanusGraph data model documentation

When using ScyllaDB (or Cassandra) as the backend, the key from the above table is the partition key. Column 1 is the clustering key, and they both compose the primary key.

Back in the Gremlin console, add a vertex to the graph:

g.addV('person').property('name', 'guy')

And in the cql shell terminal, check what happens in the table:

select * from edgestore;

The key column stores the id of the vertex you just added. 0x02 states that the vertex exists.

You can read more about JanusGraph’s data model in the documentation.

Since we’re using ScyllaDB s the backend, we’d expect our graph to be persisted even if there is a failure in the server/client.

To see this, stop and restart janusgraph-server.

From the Gremlin console, check that there is indeed one node in our graph:

g.V().count()

From a third terminal, connect to the server, list the docker instances and stop the JanusGraph server:

docker ps -a

docker stop janusgraph-server

Now, if you try to execute a command in the Gremlin console, you will get a connection error:

g.V().count()

From the third terminal tab, restart the JanusGraph server:

docker start janusgraph-server

In the Gremlin console, reconnect to the server:

:remote connect tinkerpop.server conf/remote.yaml

And by reading the graph vertices, you can see that the graph still has the single vertex we previously added:

g.V().count()

Summary

This lesson covered more advanced topics in JanusGraph and ScyllaDB.

You learned about the data model JanusGraph uses and the advantages of using ScyllaDB as a data layer for Janusgraph.

Next, you saw how data is persisted in CQL and what happens if the JanusGraph server fails. Since we used a client-server model with ScyllaDB as the storage layer, the data is persisted even if the JanusGraph server or client fails.

Previous Lesson

Back to Course

Next Lesson

A Graph Data System Powered by ScyllaDB and JanusGraph – Part 2

Data Model, JanusGraph and ScyllaDB

System Setup and Persistence

Summary

About

Resources

Documentation

Keep in Touch