Column-oriented database management systems: what do they deliver for analysts?

AuthorKrishnamurthi, Malini
PositionReport
  1. INTRODUCTION

    Reacting to recent increases in computer processing power and memory size, as well as an explosion in the size of data warehouses, software developers have begun introducing new commercial database products based on an alternative paradigm: the column-oriented database management system or "column-store." Known to researchers since the 1970's, these specialized systems are organized to focus on individual columns within records rather than entire rows. This approach is said to sacrifice easy updatability and place a greater burden on processors in exchange for increased query speeds, a lighter input/output burden, and greater overall efficiency. A review of current research has been conducted in order to test the veracity of these claims so that corporate data analysts can determine whether column stores will truly deliver the proposed benefits. The results provide ample justification for the use of column-stores instead of traditional row-oriented systems for analytical applications and they paint a clear picture of the benefits that analysts can obtain by employing column-stores.

    Given the current popularity of column-stores as a research topic, there is an abundance of literature describing their history, characteristics, and attendant issues. Keeping in mind the perspective of the professional analyst and focusing on business-related concerns, a review of both academic and corporate literature was conducted in order to produce a document that captured the essence of the subject and deliver a clear recommendation. The items chosen for review included both scholarly works written for presentation at major international database conferences and papers produced by database manufacturers. A concise discussion of the topic and a resulting recommendation for analysts evolved out of the information extracted from the literature.

    This paper is organized as follows: Earlier sections discussed the background information of the topic. Section two presents the characteristics and features of the two databases in question. Section three presents a discussion and we end with the conclusion.

  2. THE TWO DATABASES

    Proper understanding of column-stores requires a fair amount of background information. And so, we begin with a brief description of the conventional row oriented databases.

    2.1 FEATURES OF ROW ORIENTED DATABASES

    The conventional row-oriented database management systems (also known as "row-stores"), store all attributes of a given record in a row spread across multiple pages on a disk. Identification and tracking focuses on entire rows and when queries are run, the DBMS scans all fields in the current row before moving on to the next. This places less demand on processors than a column-oriented process, but it increases the time spent on disk access (Holloway & DeWitt, 2008). This structure also slows query speed, but it handles updates quickly and efficiently. The speed reduction can be mitigated somewhat by creating indexes that pre-identify certain values relevant to common queries, but the indexes themselves can also consume system resources (Loshin, 2009).

    2.2 FEATURES OF COLUMN ORIENTED DATABASES

    Bosswetter (2009) noted that the main feature of a column-store is the splitting of each single table into a series of relations, with one for each attribute or column in a given record. Hanson and Price (2010) further elaborated that each attribute is stored in its own set of disk storage locations, or "pages," which contain only values for that particular attribute. During a query, the relevant column values are retrieved and reassembled, while those columns that are not required are not scanned or retrieved. This places a high demand on processors, but reduces the need for disk access. The integrity of each record can be maintained despite the separation in storage because each part of a given record will usually be in the same position in each column. This allows the data itself to serve as an index (Loshin, 2009). This structure is optimized for data retrieval, while the separate storage and handling of columns makes updates difficult (Harizopoulos, Abadi, &Boncz, 2009).

    2.2.1 HISTORY OF COLUMN STORES

    Researchers first created early column-stores in the 1970's and found that the new structure streamlined the processing of queries against large datasets by reducing input/output demand. However, the low processor speeds and memory sizes of that era negated this benefit because the performance level of the hardware forced systems to access their...

To continue reading

Request your trial

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT