Blog

Archive for the ‘Open Source’ Category

Database Performance with Openstack Swift Improvements – Part 3

Posted on: February 22nd, 2012 by stephenbroeker No Comments

In Part 2 of this blog post series, I proposed replacing SQLite with MySQL as a database engine and changing the database schema to use chunking. These changes ensure that database performance is consistent for Account and Container Databases.

But what about Objects? As previously stated, Object data is stored in files. Are there problems with this?

Yes. Storing Object data in files is nice in that it is simple and Object data is easy to find – just look in the file system. But what about performance?

A file system adds a lot of overhead to the equation, functionality is not free. There are a number of problems with this approach.

Database Performance Problem 1: Buffer Cache

File systems use an in memory buffer cache to speed up repetitive, sequential, small file IO. But Swift Objects are written once with no partial updates or reads. Object are always completely rewritten. They are also always completely read. So the buffer cache sucks up a lot of memory and does not improve performance. In fact it degrades performance because of overhead.

Database Performance Problem 2: Multi Use of the File System

You will recall that the file system is also used to store the Account and Container databases. In addition, XFS supports Extended Attributes. These allow name/value pairs to be attached to a file. Swift uses XFS Extended Attributes to store user defined headers (meta data).

The end result is that the file system is multi use: database, meta data, and object data. This results in poor disk performance. All three of these components use the file system in different ways and thus have different effects on the file system and causes constant disk head seeks. For the best performance, we want to minimize seeks and keep the disk heads moving in one direction.

Database Performance Problem 3: Using File System Meta Data

Querying file system meta data performs poorly. File systems are great at moving data. They are terrible at querying data. Traversing inodes and data blocks is expensive. Consider the code to do an “ls”. This is basically a straight forward problem. Get the inode for a directory, perform multiple reads on the inode data, and filter and sort the results. This is a substantial ammount of code. Now consider if the meta data was in a database. The “ls” code becomes a single database query.

Database Performance Problem 4: All Components in a Single Repository

The database, meta data, and object data are all in the same file system. They thus are all in the same repository. Thus if the repo is down, then nothing is available. Now consider moving the meta data into the database and moving the database to a different repo. The result is that the database is still available if the file system is down. And the file system is still available if the database is down.

Database Performance Solution

The solution to these four problems is to move the file system meta data into the database and then move the database to another repo. In addition, scrap the file system and change the database to point directly into raw disk partitions. This means that Object Data would no longer be stored as files but the Object Table (in the Container Database) would point to the: partition name, partition offset, and object size. An object would thus be defined by the three tuple: (partition, offset, size). We thus remove the file system in all of its complexity and overhead and use raw disk partitions. To summerize:

  1. Replace SQLite with MySQL.
  2. Separate the database repo from the file system.
  3. Remove the file system.
  4. Move all meta data into the database.
  5. Change Object Table to use raw disk partitions.

That’s how I’ve fixed the problem. How have you solved these database performance issues?

Openstack Swift Database Performance Improvements Part 2

Posted on: February 15th, 2012 by stephenbroeker No Comments

In my previous Openstack post I established the groundwork for my proposed solution to improve the performance of Openstack Swift Database that consists of two parts: MySQL and Database Chunking.

Database Performance with Openstack Improvement: MySQL

For the first part of the solution, I propose replacing SQLite with MySQL as a database engine. As the name implies, SQLite is fine for small databases, but has performance problems with larger Openstack databases. MySQL is perfect for this problem, since a database is represented as a file system directory, and database tables are represented as files.

Openstack Performance Improvement: Database Chunking

For the second part of the solution, I propose using Database Chunking. That is, breaking up the Container and Object Tables into chunks. The result would be database queries on reasonably sized tables. The table structure would be tiered and tables would either be of type “index” or “data”. Index Tables would point to the next level of table, which would be “index” or “data”. Data Tables contain the actual table data and are thus leaf nodes in the table schema.

The optimal size of each table chunk would have to be determined through experimentation, but for purposes of argument, let us assume a table chunk size of 100,000 rows. So for an empty Account, the Container Table schema would be:

Container Index 1 -> empty

After the first container is created, the Container Table schema would be:

Container Index 1 -> Container Data 1
Container Data 1 -> 1 row

When container number 100,001 gets created, the Container Table schema would be:

Container Index 1 -> Container Data 1
Container Data 2
Container Data 1 -> 100,000 rows
Container Data 2 -> 1 row

So Container Index 1 can map (100,000 X 100,000 = 10 Billion) containers. When container number 10,000,000,001 gets created, the Container Table schema would be:

Container Index 1 -> Container Index 2
Container Index 3
Container Index 2 -> Container Data 1

Container Data 100,000
Container Index 3 -> Container Data 100,001
Container Data 1 -> 100,000 rows

Container Data 100,000 -> 100,000 rows
Container Data 100,001 -> 1 row

This Database Table Schema will essentually scale forever. Thus these database changes will ensure that database performance is consistent for Accounts and Containers. But what about Objects? In my next blog, I will first identify problems with Swift Object data storage and then present a solution.

In the meantime, how do you deal with Objects?

Openstack Swift Database Performance Part 1

Posted on: February 8th, 2012 by stephenbroeker No Comments

In my last two posts (Python’s Strengths & Weaknesses) I have been describing the operation of Openstack Swift Storage. Swift storage basically consists of four components: Ring, Database, Zones, and File system. I’m proposing some performance improvements to this design. But first we need to understand the Swift database schema. An Openstack Account Database consists of two tables: Account Stat and Container. And a Container Database consists of two tables: Container Stat and Object.

Openstack Account Stat Table

The following is a detailed view of the Openstack Database Account Stat Table:
account
created_at
put_timestamp
delete_timestamp
container_count
object_count
bytes_used
hash
id
status
status_changed_at
metadata

Openstack Container Table

And the following is a detailed view of the Container Table:
ROWID
name
put_timestamp
delete_timestamp
object_count
bytes_used
deleted

Notice that both the Account Stat Table and the Container Table have deleted attributes. These attributes are required since rows in these tables are never deleted, they are just marked as deleted. The reason that rows are not deleted is that this would require some time of synchronization (locking), in case another thread was accessing the same database. And we all know that locking in the Cloud is a very bad thing, it would destroy scaling. So these tables are append or update, deletes are not allowed.

This is all well and good for performance, but happens when these tables grow? The Account Stat Table will never grow, it will always have one and only one row. But the Container Table will grow with time, as containers are created for the account. So what happens to SQLite performance when the Container Table gets large? First, since an SQLite database is a file, file performance will degrade as the file grows. Second, database query performance will also degrade as the file grows.

Container Database

The following is a detailed view of the Container Stat Table:
account
container
created_at
put_timestamp
delete_timestamp
object_count
reported_put_timestamp
reported_delete_timestamp
reported_object_count
reported_bytes_used
hash
id
status
status_changed_at
metadata

Object Table

And the following is a detailed view of the Object Table:
ROWID
name
created_at
size
content_type
etag
deleted

Notice that both the Container Stat Table and the Object Table have deleted attributes, just like the Account Database, and for the same reasons. So both the Container and Objects Tables will have performance problems as containers and objects are created over time.

Next time I’ll propose a solution that consists of two parts: MySQL and Database Chunking.