The topics covered in this lesson are roles and permissions, Confidentiality, Non-repudiation, roles users permissions, and how this all comes together.
The second area I want to
talk about and I’m especially excited to talk about it because I refactored a lot
of code and added support for a lot of this stuff is – roles and permissions and
role management and so an important concept in security is confidentiality
and I hope you’re all familiar with this one basically says that we want the
stuff that should be secret to remain secret and we want to ensure that only
the people who are allowed to access our stuff can
and in ScyllaDB this means that we want to limit who can read certain parts of the
database and another important concept in security is a non-repudiation and
the definition of this is that nobody can refute the state of data but that’s
a little hard to understand I think. What it really means is that we understand
exactly how data came to be the way it is, we understand who did what, when,
what they did and how they did it. And we can be sure of that information
So the combination of defining what exists in the database in terms of describing the
different things that are in there and how we describe and define who can do
what is called authorization, who’s allowed to do what and so we define a
resource is basically an identifier for something in the database and we have a
resource describing a table for example or a keyspace encompassing many tables
or even a particular role, and I’ll get to what a role is in a second, and we
have a certain fixed set of permissions like CREATE or ALTER or SELECT and what
we say is that a particular CQL operation is enabled or allowed given a
particular combination of a resource and permission, so for example if I say Joe
Smith needs to be able to query the events table then I can enable that
functionality by granting the SELECT permission to the events table
to Joe Smith and like with users internally this is just a table in ScyllaDB
and we have columns for a user, a resource and a permission set and so
checking the permissions basically means that first before executing any query we
have to query this permissions table for the combination of the resource and
the permission and identify, excuse me let me go back a little bit, for the user
and the resource, pardon me, identify which permissions have been granted for
that user and that resource and then make sure that the operation that’s being
executed is within that set of permissions and so that means that every
time we execute a query we have to do the second query to check what the
permissions are and you might think that there’s a lot of overhead of doing that
and you’d probably be right so we have a caching there that caches all this stuff
so you don’t have to execute all these additional CQL queries and that cache is
a time-based cache so after certain amount of time the cache will be evicted
and the data will be reloaded and you can configure the time for that at the
user level in the .yml file. And so this is a nice model it works fairly
well but then you start to realize that there are difficulties in practice and
mostly operational so if you have lots of different tables and lots of
different types of data and you have lots of users, then every time you add a
new user you’re gonna have to go through this hassle of adding permissions to the
appropriate things for each one and then the problem is even more
annoying when you want to change permissions you may have to repeat this
change of permissions for many many users and this even describing it to you
now it sends incredibly tedious and as we all know when systems are hard to use
and tedious – laziness usually wins and so what people are frequently do in the
face of systems that are hard to use is they’ll try to subvert them and how
would you subvert a system like this well you probably just have a user like
developer with a password like Nintendo and you give everybody the same username
and password and everybody just uses that and the problem with that, is that
you’re actually subverting and missing the point entirely of using users all
the benefits that we’ve talked about just go out the window because now you
have shared credentials, you don’t have non-repudiation, you don’t know who did
what, you can’t limit access to information, it’s a big mess so we want
to avoid systems where doing the right thing is hard, we want to develop systems
where doing the right thing is out of the hands of people, so a role is a
concept that stems from this idea that if you have a large number of users
they’re probably going to do similar kinds of things, they’re probably going
to have access to the same kinds of data and so what we do is we
extract the set of permissions and resources into a named entity called a
role, so a role is just a collection of
permissions and resources and unlike users we can assign a role to another
role, so let me explain what that means with an example, did I miss a slide…
okay I’m gonna just go backwards after.. So for example, all the analysts at our
company are gonna be assigned the role analyst and all analysts need to be able
to query the customers table and Joe. Smith is an analyst so we’re gonna grant
him the analyst role and that means that. Joe Smith inherits all the roles that
have been granted, excuse me, all the permissions that have been granted to
the analyst role and now let me go backwards, so you may be asking well what
about users and the interesting thing is that all users are roles rules are a
generalization of users, I’d like to think that users are just roles that can
login to the system so there’s a login flag when
if you allow a role to log in, I call it a user but otherwise it’s a role but
really roles encompass users roles are the abstraction that we want
to be thinking about here so implementing roles is you know
it required lots of implementation changes to the code and refactoring in
restructuring but that’s not really what. I want to talk about, the most important
thing is that logically roles and their relationship to each other are form a
directed graph and you can see in this example here that I have the Joe Smith
user that the Tim Jones user and I have two roles analyst and team_lead and I’m
gonna grant analyst to jsmith, jsmith is an analyst and
Tim is a team_lead but also I want the team_lead to have all the permissions of
analyst and so what that means is when we think about our graph that’s formed
we see that an edge from one node to another implies that that node is a
member of another node so jsmith is a member of analyst so jsmith is an
analyst and so if we want to identify the roles of all the roles of a
particular user or particular role we can just follow the
graph so tjones is a team_lead and is an analyst so we’re just following and
so we can’t go any more and so what we do is we we implement this directed
graph representation again in a table in. CQL and we just do that as
a collect association from node to node with edges and what’s very nice
about this feature of roles is that once we actually enforce the difference
between a user who is an individual and identity and a function which is like
what they’re doing what they’re actually doing with the database is now we get
access to a whole lot of nice security features in addition to functionality
features which I’ll which Avi alluded to and I’ll mention again in a second so
one of them is auditing, and auditing is the means by which we can record who’s
doing what at any time to the database.. We can track who is issuing queries
which tables are affected, the IP address the date, the time and this is a feature
that’s available in the enterprise edition of ScyllaDB and you can
produce this information output it into. ScyllaDB internal tables or syslog and
it’s very configurable and it’s very fine-grained and if you don’t employ a
proper separation between users and roles, you don’t get to benefit from this
information because you’re going to be reflecting very broad information that’s
not particularly useful. So this is a very very small example from our
documentation. I think the formatting is a little bit
off, I’m sorry about that, but you can see that tjones dropped the
team_roster from the NBA and I hear they’re really angry that
Before I go on,I did want to allude to what Avi mentioned earlier in his
keynote this notion of roles as being the way that we interact with the
database rather than being tied to a particular user because a user can
inherit a role, means that when we talk about features like role based SLAs
or different kinds of workflows based on
particular needs, we can actually use roles to talk about this and
implement this internally, so if we for example have an analytics workflow and a
real-time workflow then maybe we can define an analytic role and do all our
analytics with queries with that role and then we can ascribe or
define certain performance characteristics of that role of queries
executed by that role and having this model allows us to manage
things like different access patterns in the database in ways that we really
couldn’t do before when we had this broad notion of users, because how do you
deal with an analytics user you can only have one user at a time in the database
for a given session. So this is very useful and important stuff that
actually has performance implications as well