So I debated about even writing this post. But the truth is that I find database history fascinating. I’m sure this post has nothing on the Wikipedia article, but I’m going to give it a go anyway. Note that this is my own sometimes unsupported view of database history and may include inaccuracies
The early history of databases
As a DB2 DBA and a former IBMer, I’m biased towards IBM’s history of datbases – in which IBM seems to claim that IBM developed the first computer database either for American Airlines in 1962 (SABRE) or for the moon mission (see http://www.dbisoftware.com/db2nightshow/20110425-Z01-roger-miller.pdf) in 1964. I went to research for this blog post and could find very little to support this outside of IBM sources. In my head I will continue to link early computer databases to the space program – if for no other reason than I dreamed of being an astronaut as a child, and this is the closest I will ever come!
I say first ‘computer’ database intentionally, because examples of organized collections of data (the true definition of a database) are everywhere before the computer age. Ancient libraries and encyclopedias were certainly examples – even simple filing cabinets could be considered such. But obviously when I say database for the rest of this post, I’m referring to something on a computer.
Early databases were hierarchical, and could only be navigated based on their pre-defined structure. You had to know the structure, including fixed-width fields in order to access the data, and you couldn’t do a query as we know them today – the relationships other than one parent to one child were not easy to define. Even when they did manage to get multiple children to a parent, you couldn’t relate children to each other. They were slightly better than flat files, but bear more resemblance to flat files than to the Relational databases of today.
The generally accepted father of the Relational database(http://en.wikipedia.org/wiki/Relational_database) is E. F. Codd, who worked for IBM and published a paper on the topic in about 1970. One of the interesting things to me is where the term “relational” comes from. Some may assume it came from the fact that you defined relationships between the tables as one of the integral parts of a relational database, but in fact,
‘relation’ is another name for a table – in relational algebra. Based on user comment, a table would actually be a ‘bag’ in relational algebra. Here shows the fact that I have a computer business degree and not a computer science one. The premise that relational database is not so named based on defining the relationships between data is still solid though – it is so named based on relational algebra. Though as the commenter below stated, the relational algebra is not about the relation or table, but about manipulating it.
It seems that IBM did not recognize the power of E.F. Codd’s concept, and did not give the resources to properly developing it into a product as early as it could have. When it was developed, E.F. Codd was not directly in charge, and thus relational databases today do not exactly match some of his key points – though part of the reason for that may be the gap between theory and reality, too.
It is interesting to note that no “Relational” DBMS of today actually meets all of the rules E.F. Codd set forth to define a relational database. http://en.wikipedia.org/wiki/Codd%27s_12_rules
Which came first – DB2 or Oracle?
Several Oracle DBAs I’ve met proudly proclaim that Oracle was the “first relational database”. Oracle was “commercially available” as a relational database before DB2 in June of 1979 (http://en.wikipedia.org/wiki/Oracle_Corporation). IBM had “System R” internally before that, and “SQL/DS” in in 1981 (later renamed to DB2 for VM/VSE). DB2 as such was released in 1983.
More recent DB2 history
What we call DB2 UDB or DB2 for LUW first became available in the early 90’s. At that time, it included OS/2 support as well.
It gets hairy when you try to describe the dramatic changes in the product since its inception. Many either Oracle or DB2 or some other RDBMS could lay claim to as being the “first” to implement or the first to implement in a certain way. RDBMSes sure have come a long way since initial inception.
I started as a DBA back in 2001 – shortly after DB2 UDB version 7 became available, and worked on versions as old as 5. Enhancements that I particularly remember are: Compressed backups, online LOAD, LOAD without locking the whole tablespace, online REORG, ability to shrink DMS tablespaces, all of the online memory changes and STMM, moving archive logging away from userexit and into the db cfg, drastic locking changes with both registry parameters and the new currently committed behavior, drastic improvements in data propagator, HADR, and TSA integration for HADR. They say the GUI is better with each version, and I do give it a try from time to time, but I’ve never really been fond of GUIS. There are other things I know about and think sound great, but I just haven’t gotten a chance to try yet, like data compression, true XML, and PureScale.
Relational vs. NoSQL
This is a debate I hear from developers from time to time. They question the need to even use a relational database. I’m obviously very biased on this topic, but I just don’t think that NoSQL databases I’ve seen provide the transaction control, concurrency, and flexibility that a relational database can provide. If all you’re using a database for is to store static settings, well, yeah, NoSQL might be better for that – but if you’re running an OLTP database with concurrent users, I just don’t see NoSQL as a viable option.
So, want to argue or agree with me on anything? Comments are always welcome, even if it’s to point out something I missed or something that I’m wrong on. If anyone has nice detailed links or documents on DB2 or database history, I’d love to read them.
Corrected/updated on 2/6/2012 based on user comment.
Interesting article. Until unlighted ram a db is needed. It will be the main core of all applications. Ironically, it will be the first things that the applications blame for their poor coding. I find the debate if who’s first entertaining and mostly semantics
All dba’s no matter what dialect will agree. Relational format was a profound change in the industry. (yes my roots are aforementioned hierarchical). Which ever flavor you pick all the work comes from the roots and founder of the relational model: Dr Codd.
So even with super-fast storage (unlighted ram), won’t we still need a way to organize the data? And a way to query the same data in different ways?
(and thanks for reading the blog – always thrilled to get comments and even more so from people that I’ve learned so much from)
Nice write-up… just a quick correction (or perhaps, more accurately, an extended explanation). OS/2 Extended Edition Database Manager was the precursor to DB2 UDB. OS/2 1.0 was released in December, 1987. OS/2 1.20 EE was not released until early 1989. Eventually IBM completely rewrote the database manager software in the Toronto Lab. At this point Database Manager became DB2. The name was appended with a slash and extra description depending on the OS it ran on: DB2/2 for OS/2 and DB2/6000 for the RS/6000. It wasn’t until later in the 1990s that the name UDB was used.
Thanks for the details!
Nice write up. Regarding the debate about when the first commercial relational data base was released, I would direct you to the website http://www.multicians.org which chronicles the history and people associated with the Multics operating system. In 1976 Honeywell released MRDS (Multics Relational Data Store). See: http://www.multicians.org/history.html and http://www.mcjones.org/System_R/mrds.html.
As described, the architects of MRDS were heavily influenced by the System R work done by E.F. Codd and others at IBM. The frustration of the System R proponents at IBM’s decision to not produce a commercial version was paralleled by the Multics and MRDS proponents’ dismay at Honeywell management’s unwillingness to place bigger bets on Multics and MRDS.
MRDS on Multics had a couple of key technical advantages over the implementation of System R, which really helped commercial viability. The Multics operating system had built in support, implemented in hardware, for information sharing, ACL-based access control, , and inter-process communication, all available to user-level programmers. None of these were available in the IBM operating system at the time.
Although Honeywell exited the general-purpose computer business, and Multics was never a great commercial success, in the early to mid 1980s, Multics systems were used by several very large enterprises, including Fortune 10 companies and government agencies, to handle their largest data management applications.
Interesting. Am I correct in understanding that MRDS on Multics is no longer a going commercial concern? Are there still legacy systems using it?
Honeywell Multics is no longer available. The website http://www.multicians.org covers the whole history, including the shutdown of the last site in 2000. Honeywell transferred its general-purpose computer business to the French company Bull H.N. in 1986.
A number of important inaccuracies:
First, a relation is not a table. A relation is a set, a table (without a natural key) is a bag.
Second, neither the definition of a relation, nor of a table, have anything to do with relational algebra. The relational algebra is about the manipulation, not the nature, of a relation. What we can say is that relations are defined on set theory, and the relationa algebra and calculus are defined on predicate logic.
Thank you for the corrections. I’ll update the post based on this. Despite having a college degree with a focus in database management and over 10 years of experience as a DBA, I’ve never done relational algebra.
You are welcome!
It is a pity education has gotten so bad, we have to educate ourselves…
Not so fast. I refer both you and Leandro to the Wikipedia entry on relational algebra: http://en.wikipedia.org/wiki/Relational_algebra which discusses relational algebra, sets, and relational data bases.
Other references include this article here on relational algebra: http://c2.com/cgi/wiki?RelationalAlgebra and to the paper here: http://infolab.stanford.edu/~widom/cs346/ioannidis.pdf which discusses relational algebra in the context of query optimization.
Some more history: I was part of a small marketing team at Honeywell responsible for Multics and put together the marketing materials which attempted to explain the concept of relational data bases to people who literally had never heard of them before. In order to do this, we had to interpret and explain the complex technical jargon from the IBM Systems Journal articles by Ted Codd and company into stuff people could understand. To give you as sense of the language those papers were written in, see: http://en.wikipedia.org/wiki/Relational_model.
In this effort, I talked about Set Theory with presentation slides of Venn Diagrams to describe the join (i.e. AND) and the intersection (i.e. OR) of two sets (i.e. DOMAINS). In order to talk about elements of sets and tuples. In order to explain this to people who were familiar with then-contemporary commercial network and hierarchical data bases, I would tell them that 5-tuple was the same as a data base record with five fields.
We never talked about tables. That came later. When I first heard relational data bases described as a bunch of tables with rows and columns, my reaction was, why didn’t I think of that? Because that’s what effective marketing is all about. It was described by a VP marketing mentor of mine as “taking the square pegs that engineering produces and fitting them into the round holes the customers have.”
Leandro is right about one thing. Tables have nothing to do with relational algebra. In this context, tables have to do with marketing. But if I were Leandro, I’d be thankful the marketeers abused the mathematics. If they hadn’t, generations of data base architects and programmers would be in other lines of work.
Awesome, thanks so much for the details!
This seems to be first database ever…
As much as I’d like IBM not to have been involved, I won’t pretend they weren’t.
However, any database of that era was not relational.
I am just suggesting that this database should not be missed from a history of databases (even it’s not relational). It can be easily categorized as one of the most important databases in the world, due to its consequences. I stress that it should be a historical landmark. Of course, it’s just an opinion.