The data model used in this application. Also a recap of basic concepts like keyspace, table, replication factor and so on. This example application uses the Time Window Compaction strategy and TTL.
so we have a cluster between the cloud Orbit on your own machines or your on-premise machines uh let’s continue with the second part uh to get some application running and the second part is basically to get something consumed the database and start using it uh that said I will start build with with hold your horses Cowboy because before moving to an application really we need to think a little bit about the data model so there are some rules of the rule rules of autumn uh that you basically need to think about the first thing that I already mentioned is replication Factor so you always need to basically adjust the replication factor to fit your use case and to feed your architecture ideally we by default suggest to use replication Factor 3 and an approach topology and uh you you have to be basically you can populate the data model with the skillsh I’m showing some example commands on the top of the page but you you still have to think about the data model uh and be diligent about how you select your primary keys and your clustering Keys the clustering keys will give the ordering of the partition but basically with the clustering key you also add the rows to a certain primary key so this influences its size and of course basically the more primary Keys you have like the better the data is distributed amongst the CPUs and you don’t want to stick everything into your uh single primary key you want to basically spread your data all over like multiple primary keys and ideally not over many clustering keys because you’ll get big partitions uh how would it translate into our demo that we will show today so I mentioned that we want to have the replication factor tree so that’s how we write it here I also mentioned that you want to use the network topology strategy I would like to emphasize these Network topology strategy on this slide it’s very very important because most of the people when they see examples they just use the simple strategy and never use Simple strategy like I even suggest to use Network topology strategy if you are going to test in your local environment I think it’s always better because if you do then any change to the topology of the cluster or change basically like how many in all CSR like or something then you automatically get it adjust it and you will see it immediately in the development as well what is strictly in the development is the replication Factor because usually the developers they don’t start three nodes at least they just have one node and with one node it’s very hard to achieve replication factor three so if you have just one node go only for application Factor one and of course if you have two nodes you need to go to application Factor two and that basically is the only problem that you might uh see when using this setting for the key space here I have a table that we will use today the application that I will present is basically uh an application that collects stocks so there will be like a symbol of the stock there will be a date when the stock value was gathered and there is a value of the stock like what is the price or the current price at the date I mentioned that it’s very important to pick a correct primary key so if you think about my primary key it’s not really ideal but it’s still I think good enough because it’s like the the symbol of the stock and what Silla is doing slice UE link uh monumentary hash and this normal tree has is done basically translated to a token and this should give you like good now distribution of those symbols amongst the CPUs in your cluster what is interesting here in my design and it’s basically it’s a flow and it’s a it’s a it’s a design uh it’s also a good design if you think about it but I’ll explain it on the next slide I’m using date as the clustering key and uh if you know about how this works then you might tell me oh you have a mistake developers and yes I might have a mistake but I’m trying to kind of think about the cardinality of the date here so my plan is to basically have this date limited using ttls so that’s why the single partition for a single symbol will not grow that much if you want to have basically have smaller partitions then you would need to have like a composite key but I will not show composite key during this demo I will leave it up to you you can look at the composite Keys now documentation it’s very good to understand it and I think it’s especially good to understand them for the time series data and this is a Time series data because we will measure the price of the stock every day so uh because we will have a stock application that will collect the prices of the stocks over one year uh this is basically where the second part of the data modeling comes in so you need to think about like I don’t really want to have big partitions uh despite the draw is not that big because it’s only like symbol uh date and value you still don’t want to have it like I don’t know 100 000 of Records per partition there so that’s why I’m continuing with the data model in a way that I will use the PTL for one year so you can set up a PTL on the table how to set it up it’s not on the slide but I I will just assume that you will look up how this can be done in the documentation but I’m showing something else that is more important here and this is basically the strategy About Time series data so it’s a good practice or a best practice to go and uh schedule our data to be within 12 to 13 windows and this basically will split your data into into 13 like packets you can call them like that and then basically if you will be looking up some older data or some historical data you’ll just need to scan or go over only the appropriate window like Sila will know that this data belongs to this certain window and it will just go there this video will be already like compacted for you and of course basically when this window comes to a comes closer to the TTL or goes over the TTL it’s lovely immediately know okay this window is already detailed and it will throw away so in this case I will have in the worst case uh uh 13 or 14 or actually 12 or 12 so 13 windows at most because there will be at most like one old window that will eventually TTL this this basically translates to my Windows size because if it’s one year so I need to divide 365 days by uh 12 and this roughly comes to a 31 days so my window is 31 days while accepting that you need to consider also one more thing so once the window is closed it’s not really good to go and modify this window so you should also try to plan your time series data to not override historical data or not go outside of the window so if you will like back populate some data within the window it’s okay compactions will take care of it but if you will start doing that for older windows then you might get into troubles because those uh all right skill overlap to Windows and it might come with someone you can see uh punishment and you don’t want that
so uh I have a question there how big will be the partition if we call it prices daily and that’s actually very easy to do like you just need to divide 365 by 31 and you will see like uh how many how many records will be scanned at one time the partition itself will be 365 rows at maximum but what you all you need to scan will be in inside one window foreign
before I go further maybe if you didn’t catch most of the stuff I said it’s it’s all well documented so you can look at the documentation about compaction strategies and you can look at other conversion strategies as well at the same time though is not the default one the default strategy is either size tiered or ICS incremental so you can look at the time in the event and the others later and compare them with fits for you