This blog article should appeal to a wide audience. This article is not specific to DB2, but speaks to the larger role of the database in the IT organization.
What is a Database?
At the most basic level, a database is an organized collection of information. Technically, you could consider a filing cabinet a database. Even an MS Excel Spreadsheet could be considered a database. Nearly any collection of information can be considered a database. Traditionally, a Relational Database is what is meant when someone uses the term “Database”, but this is not universally true.
You need a database to store and access information. That information could be on any topic and come from any source. Often businesses have many databases, with each likely to be focused on data that is used in different ways. There may be separate databases for an E-commerce system and for internal payroll, for example. How many databases are used is extremely variable and dependent on the size and organization of a business.
The databases most frequently used in recent years are relational databases that roughly follow E.F. Codd’s rules for relational databases. Relational databases follow a consistent system of organization that includes obejects like tables that store the data, relationships between those tables, and rules about what kind of data can be in a table. This is a highly flexible model that has served us well for many years.
The language used to retreive data from relational databases is SQL (sometimes called structured query language). SQL is a language that focuses on describing the data desired in terms of the relationships between tables, without knowing where on disk the data is stored.
What is a Database Management System?
A Database Management System or DBMS is the software that is used specifically to manage data stored. Popular database management systems include IBM’s DB2, IBM’s Informix, Oracle, Microsoft’s SQL Server, MySQL, Postgres, and many others.
Most of the systems listed are most often thought of as Relational Database Management Systems (RDBMSes), but they are only relational if the objects in them conform to the rules for relational databases. They all have ways of being used as distictly non-relational. They can handle key-value stores, Object-Orientation, XML, and many other organizational schemes.
A Database Management System generally:
- Controls where data is stored on disk
- Translates SQL into the details of how data is to be retreived
- Controls access to data
- Provides transactional control, including ACID and transaction logging
- Provides objects like Indexes, Views, and Stored Procedures that are complimetary to the data
- Provides tools for backup, recovery, high availability, and disaster recovery
What is ACID?
ACID is a term used to refer to several properties of transactions. One of the points of using a database management system is to provide a system in which changes to data are not lost once they are considered committed. Most relational database management systems can be configured to allow a recovery of data to be done to any milisecond in the last day or week or month. They are not dependent on only being able to recover to the time at which a backup was last taken.
ACID stands for:
- Atomicity means that all aspects of a transaction either fail or succeed together.
- Consistency means that each transaction will bring the database from one consistent state to another – that no transaction will violate the defined rules.
- Isolation means that the data is changed in the same way by two transactions executing concurrently as it would be if those transactions were executed serially.
- Durability means that once a transaction is completed successfully, it is externalized to disk or somewhere else such that if the computer lost power (or had another error), the transaction would still be in place in the permanently stored data.
What is No-SQL?
No-SQL is a flavor of database that has become popular in recent years. It specifically rejects the relational model for one or more other models that may be more efficient for specific use cases. Common No-SQL models include key-value stores, graph databases, and document databases. The feature set provided in No-SQL databases is highly variable and must be understood thoroughly before being chosen to be used in a certain context.
Over time, some No-SQL vendors have offered partial support for SQL because of SQL’s flexibility in allowing the description of the data to be retreived without needing to know exactly where or how the data is stored. Because of this, it is sometimes now thought of as Not-Only-SQL.
No-SQL database management systems are less likely to offer rock-solid transactional control (particularly ACID), and recoverability should be investigated thoroughly. They introduce concepts like “eventual consistency” that work well for some applications and not so much for other applications.
Where Databases Live
Databases often live on their own dedicated server or cluster of servers. Databases should nearly never be accessible directly from the internet, and are often behind layers of firewalls. Even companies worth only a few million may store billions of dollars worth of data.
Often a three-tier architecture is used for internet applications. The first layer, which is directly accessible from the internet is the web server. The web servers can talk to the application servers, which are not directly accessible from the internet. The application servers then talk to the database servers, which are not directly accessible from the internet or even from the web servers.
Why a Dedicated Database Administrator is Needed
I continually run into people who think their application support team can just support the DBMS as if it were another application. I don’t believe this is true. Specialized support is required for a database. The thorough understanding of the database concepts laid out here in excruciating detail and dozens more is required. Even more so, specialized knowledge of the specific vendor of the database management system and how that vendor implements these concepts is critical. Each system has peculiarities in how various areas are handled and the best ways to accomplish a given task. Trying to support a database without a skilled professional puts one of the most valuable resources of a business – its data – at risk. A serious database failure can easily put a company out of business or lose it millions.