Generally, online analytical processing (OLAP) and online transaction processing (OLTP) are two different data processing systems designed for different purposes. OLAP is optimized for complex data analysis and reporting, while OLTP is optimized for transactional processing and real-time updates.
ScyllaDB has slightly different names for the same thing:
OLTP = Interactive, requests are generated individually as a result of some interactive process (website clicking, user-specific client application)
OLAP = Batch, requests normally are part of larger computations and usually will have some dependencies, i.e – some queries will run which will determine the next set of queries to run etc…
This topic covers the Workload Types, characteristics, and real-world examples.
Transcript
Workload types.
I took it from the AWS website.
They are trying to explain when to use OLAP vs.
OLTP, which is the basic two
workload characteristics that we have.
One OLAP is for analytical processing, which means that it crunches
a lot of numbers and does very long computations.
It consumes a lot of data.
And the second one is OLTP, which is transactional.
Even though we don’t have transactions in Scylla per se.
Still, a lot of those characteristics are apply
for real time users and their expectations from the database responsiveness and
what it implies.
So Scylla
has slightly different names for the same thing.
So OLTP becomes interactive.
Those are requests that are generated quite individually,
so by some users like web
clicking or user specific client application
or some phone app that needs like a small piece
of information to present in real time.. And the other one OLAP is batch.
Batch requests are
normally part of a larger computation that involve scans.
And probably they are multi-stage,
meaning that the first some portion of the data is scanned
and then the client application decides what to do
or which queries to execute next.
So the queries also have some correlation to them, or at least
they are kind of dependent.. So they are executed, maybe parallelly
but with the bounded concurrency and one after the other.
So this is the main characteristic in
as a summary.
So as far as concurrency go, OLTP often has very high concurrency
because it’s generated by different users and it’s independent.
So we have high concurrency.
It can even be viewed from the Scylla side as an unbounded concurrency
if it’s a very large number.. While batch has low to medium concurrency
because most of the time the application will have some number of threads
that executes the computation.
Each thread executes like a query at the time, so the
concurrency is bounded.
About timeout requirements, so for OLTP
or Interactive, it’s typically low.
So imagine a user clicking a button.
It doesn’t want to wait several seconds for the page to load simply
because the database hasn’t responded yet. And for OLAP or batch,
it can be even multiple seconds or several tens of seconds.
It can be tolerated because most of the time it’s
used to generate like large reports and then the data,
the computation result is saved for later use.
So just as an example,
consider some stock market website where the user can view
two kinds of information.. One is a real time stock data.
This is the OLTP the interactive, so a user just want to browse
through the stocks and get the real time
data for the performance of the stock.
And maybe this website also offers some kind of insights
about some stocks based on history
results or all sorts of computations.
So those computations normally are done
beforehand and are just saved.
The results are saved into the database and just fetched.
So the computation itself is a batch computation
and it can take a very long time.
So, as for the interactive workload, it’s a real time data.
So it has some short validity window to it.
We don’t want to stall those requests too much.
The concurrency is probably high because it’s a factor
of the number of users in the website.. So we cannot anticipate it completely.
Every user can issue several queries on the database side
simultaneously, so it’s can be effectively unbounded
and it’s also uncorrelated or independent.
So the users request the data without any dependency between each other.
So you can kind of experience
it in waves where there are times that you get a lot of requests that
at a very short period of time and other times
where you have low activity, for example, when the stock market is closed
about the batch computation,
it’s by any means not real time, of course.
The computations probably are done
at some points during the day.
Typically, the administrator or whoever runs the process
will choose a period of the day where the database activity
from the real time side is low, but it’s not mandatory.
If the computation takes a full day, so you can’t choose this period,
at least not for the stock market.. Then the results are kind of saved
and retrieved later.
It probably involves some scan
or partial scan queries that might take long time to complete.
So probably we can tolerate long timeouts.
The requests are also
probably correlated, or at least the dependent,
which means that a bunch or a batch of queries will run.
Then there will be some computation and then another batch of the query
will run in order to fetch more information.