Databases, Indexing ft. Designing Data-Intensive Applications — Part 1

6 min readJul 12, 2020

Databases, The beginning

Database and indexing are hot topics for software engineers to read and get knowledge about. It’s very helpful to know what kinda database to use, how do they perform indexing, and how can we leverage it. Fortunately for us Designing Data-Intensive Applications book’s Chapter 3. Storage and Retrieval does it. I found this chapter mind blowing and indeed it is one of the coolest reads about databases. I will write a summary of this chapter today. I strongly recommend reading the chapter completely. It will definitely beat the best TV show you might have seen(yeah GOT, FRIENDS too 😉, okay okay I haven’t seen Money Heist).

Why bother about databases? : You won’t obviously be writing your own database but while designing an application you need to serve the requirement with the help of a suitable database and knowing the database working, pros and cons, ways to leverage their specialty makes you more powerful 😎.

How does Simplest database(Log structured) Look?

Let’s say the format of the data to store is a key-value pair, the value can be string, json, etc. for ex. KEY1 : {NAME : SACHIN, CITY: MUMBAI}
We will use a text file to store this data.
To add the new data we will append the file with a new key value.
To get the data we will do a linear search from the end of the file to get the first match(to get the latest updated data of a key).
Complexity of adding new data : O(1) as we are just appending to file.
Complexity of getting value of key: Worse case O(N) as we iterate over all the logs from last line of the file.

Now the data retrieval time is higher here, but we need a faster retrival rate. So let’s maintain metadata which will act as signpost to the address of desired key. Hence we use Index, an additional structure derived from primary data. One can add, remove, change index without affecting the primary data. It will fast up the reads, but for maintaining the additional structure it needs to be updated while writing, so writing gets slowed.

Simple Database + Hash Index :

We can use Hash Table to read data faster from the disc. Hash Table data structure can access the key’s value in O(1) but that is in memory. How can it support the large volume of data which cannot be fit in memory? Ans:

Store the primary data in disc and update the in-memory data structure “Hash Table”.
Hash Table’s key is the key of the data inserted and value is the offset with which it was stored in the disc file.
To read data, lookup in this Table, get the offset of the key and make a disc call to read the data with the offset in the file.
And that’s my friend’s very own index. Not the best one but is the basis of complex indexes.

Simple Database + Segmentation and Compaction :

The log file contains redundant data as one a key can have multiple entries in log and we are only reading the key’s last updated value, which can make the disc go out of space. How can we prevent this? Segment the file. Yes right, let’s write key-values in a segment until it reaches a size limit and then start writing in the new segments. But still we are not removing the redundant data!! We do it by compaction, compaction is :

Merging the segment files by creating a new merge segment file.
Keeping the key’s last updated value.
Ignoring the redundant data old values of the key.
Switch the read requests to new segments
Deleting the old segments(which are used for merging) after merging.

Assuming on average multiple writes on the same key compaction reduces the memory usage.

Simple Database + Hash Index + Segmentation and Compaction:

Each segment will have its own in-memory hash table.
A read request checks for the hash table of the latest updated segment and continues to search for key with respect to last updated time in all the hash tables.

Limitation:

Hash tables can’t scale if the keys’s volume becomes greater than memory size.
To perform range queries, it needs to scan over all the range data.

Now let’s read more to explore more better indexing structure, overcoming these limitations.

Simple Database + Sorted String Table(SSTable) + Segmentation and Compaction:

In a simple database we appended the new entry at the end of the log file but now we want to have the entries in each segment sorted and this structure is called SSTable. We will discuss its advantages soon.
Merging the segments is easier with SSTable as key-values in each segment are sorted, basically merge-sort type merging.
Since the key are sorted, all the keys are not needed to store in a hash table, which means hash table stores keys over a regular interval and for any query which falls in an interval we just traverse over all the keys in the database in this interval to get the matched key.
For reads, keys in an interval of index are needed to be traversed hence we can group them into blocks and compress them.

B-Trees Database:

Like SSTable, B-Tree stores the data sorted by keys.
Log structured db breaks the db in variable size segments and writes sequentially but B-Tree breaks the db into fixed size blocks (pages). So the page is a block of memory (ex. 4kb) which contains a series of addresses to Pages in sorted order wrt key.

Page100 : [&Page100, &Page200, &Page300, &Page400, &Page500]
Page200 : [&Page210, &Page220, &Page230, &Page240, &Page250]
Page220 : [&Page220, &Page221, &Page222, &Page223, &Page224, ……]

Search: Let’s say we have a B tree with above pages, and key to search is 222, the root page is Page100. So we will get the Page200 address from Page100, Page220 address from Page200, Page 221 address from Page 220. Note: Page222 is a leaf page.
Update: Same as search, after finding the key page, we update the value.
Insert: Find a page whose range can take this new key. If the page is full then split it into two, update the parent page pointers. (Nice algorithmic question right :P )
Reliability: While updating B-Tree might update the parent node and if the db crashes at such a point Index might end up in a corrupt state. To avoid such events Write-Ahead-Logs are maintained, an append only file which needs to be updated before any operations on B-Tree. We can use this file to get the B-Tree in consistent state after a crash. For concurrency latches are used.

SSTables and B-Tree:

SSTables usually have faster writes and slower reads compared to B-Tree, this might vary with workloads.
B-Tree needs to write at least twice, first at WAL and then at Page.
SSTable can be compressed better as B-Tree pages can still have unused memory in the pages.
SSTable might not perform well as the background compaction process can affect the read, writes.

Clustered Index:

The value of the key in Index can be the actual value(like row in relational db) or the address of the actual value.
When it values its actual value, we can say the Index stores the data and its known as a clustered index.
When the valus is the address of the actual value than its non clustered index. In MySQL, it points to the primary index.
Primary indexes are usually clustered one, whereas secondary indexes are non-clustered one.

Multicolumn index:

Combination of multiple columns can also be indexed to make queries more efficient.
The querying order of columns should be the same as that of the indexed column sequence.
R-Tree can be used to implement it.

In Memory Database:

We don’t store everything in RAM cause they are not cheap and are not durable(power loss).
But as the RAM is becoming cheaper we can use them for relatively small size datasets, by using battery powered RAM(for power loss issue), writing logs to disc.
When such db is started it uses the Disc or other replica to get into the original state.
Redis is a good example where we can use data models like sets, which are difficult with disc based databases.

Okay, this might be getting too much, we will continue in the next post…….

If you have any feedback, please feel free to comment or reply to the email.

If you like this and want to read more, please share this and subscribe Newsletter .

Find me @darshan_kadu

Databases, Indexing ft. Designing Data-Intensive Applications — Part 1

Written by Darshan kadu

No responses yet