Posted by admin on May - 10 - 2010 20 Comments

The road to SQL started with Dr. E.F. Codd’s paper, “A Relational Model of Data for Large Shared Data Banks“, published in Communications of the ACM in June 1970.   His colleagues at IBM, Donald Chamberlin and Raymond Boyce were working on a query language (originally named SQUARE, Specifying Queries As Relational Expressions) that culminated in the 1974 paper, “SEQUEL: A Structured English Query Language“.  Since that time, SQL has become the dominant language for relational database systems.  In recent years, frameworks and architectures have arrived on the programming scene that attempt to hide (or completely remove) the use of SQL and relational databases allowing developers to focus even more on user interfaces, business logic and platform support in our application development.  We have also seen the rise of alternatives to relational databases, the so-called “NoSQL” data stores. Are we witnessing the end of SQL and relational databases?

In a December DDJ podcast interview by Mike Riley, I was asked the question: “With the growing trend of ORMs (Object Relational Mapping), some software developers are saying that SQL is becoming less relevant. What are your opinions about that statement?”  Thinking over the holidays about that question and the implications including and beyond ORMs, I spent some time thinking more about frameworks like Ruby on Rails Active Record and Hibernate.  These frameworks still rely on team members who work on the design, development and maintenance of relational databases.  The work that Microsoft has done with LINQ also reduces the impedence mismatch between programming languages and database languages.

The “NoSQL movement” and Cloud based data stores are striving to completely remove developers from a reliance on the SQL language and relational databases.  Some developers think that the NoSQL movement is something completely new.  Object databases appeared in the 1980s and Ray Ozzie started the commercial document-centric datastore business with Lotus Notes in the 1990′s.  Charlie Caro, Senior Software Engineer who works on Embarcadero’s InterBase SQL database engine, told me “At the time, it seemed almost heretical that a data model that ignored concurrency control could flourish. But Ozzie recognized the benefits of distribution, replication and ease-of-setup would outweigh the very, infrequent occurrence of update conflicts in managing documents and messages. Even so, there were opt-in capabilities if documents needed to be locked for safe modification without data loss. But the default was to just let it fly with no update coordination.”

NoSQL, according to WikiPedia, is “an umbrella term for a loosely defined class of non-relational data stores”.  The term was first used by Rackspace employee Eric Evans.  In his blog post last October, Eric talks about the name NoSQL (now generally known to mean Not Only SQL).  The real nugget in the post is “the whole point of seeking alternatives is that you need to solve a problem that relational databases are a bad fit for.”  Adam Keys in his The Real Adam blog post offered up an alternative term, “Post-Relational”.  One goal of some NoSQL databases is to remove the overhead and memory footprint of relational databases.  Other goals of NoSQL databases include: closer affinity with programming languages, use of web technologies and RPC calls for access and optional forms of data query.

In a recent blog post, “The “NoSQL” Discussion has Nothing to Do With SQL” Professor Michael Stonebraker discusses comparisons of SQL and NoSQL databases.  SQL and NoSQL databases can be compared using any or all of the following features and characteristics.  (Note: there are many more that can be added to this list for either or both types of databases. Post a comment with additional items that might characterize the differences between the two types):

  • Vertically and Horizontally Scalable – relational databases (traditionally) reside on one server which can be scaled by adding more processors, memory and storage to provide scalability.  Relational databases residing on multiple servers usually use replication to keep the databases synchronized.  NoSQL databases can reside on a single server but more often are designed to work across a cloud of servers.  (NoSQL: Distributed and Scalable Non-Relational Database Systems)
  • Columns, Key/Value Store, Tuple Store – relational databases are usually comprised of a collection of columns in a table and/or view (fixed schema, join operations).  NoSQL databases often store a combination of key and value pairs or Tuples (schema free, an ordered list of elements).
  • In-Memory Dataset, On-Disk Storage – relational databases almost always reside on a disk drive or a storage area network.  Sets of database rows are brought into memory as part of SQL select or stored procedure operations.  Some (but not all) of the NoSQL databases are designed to exist in memory for speed and can be persisted to disk.
  • Document-Oriented, Collection-Oriented, Column-Oriented, Object-Oriented, Set-Oriented, Row-Oriented – Document-oriented databases contain records of documents, fields and XML.  Collection-oriented datasets provide closer affinity to object-oriented programming languages.  Relational databases are characterized by data organized in tables, rows and columns (Column-Oriented).  SQL select operations normally return cursors to a single row or a set of rows containing the columns specified.  Object-Oriented databases have been around since the dawn of object-oriented programming but relational databases are by far the dominant database architecture in use today (and for years to come).  Are object databases NoSQL databases?  The rise of object-relational mapping (ORM) frameworks marries object-oriented programming to the larger number of relational databases. NoSQL databases data are often organized into objects, key/value pairs or tuples.  NoSQL database election operations are often done in code or an interface.

In an email conversion, Charlie Caro told me the following: “If Facebook has to manage 100,000,000′s of user profiles, a distributed, context-free key-value store is probably the way to go. Simple lookup by large numbers of users but only a single updater of the profile by the owning user means a transactional database is probably overkill. Multiple readers, with one updater, needs no concurrency control.  In many cases, the simpler setup and ease-of-use of the NoSQL solutions is what attracts its user community. SQL databases require more setup (schema et al) but those schemas are exactly what give relational databases higher performance opportunity in parallel DBMS implementations. This ease of use premium finds its way in programming languages as well. Many developers today prefer scripting languages over their compiled counterparts, which have static type checking for safer programs. Scripting languages are just so forgiving and easy to get started with and projects to compile scripts into .NET/Java intermediate byte codes are increasing their performance.”  We both agreed that it’s really all about having the right tool for the job, and it always has been!  No sense hammering a screw into the wall when you can use a screwdriver.

For years to come, most of us will still rely on and use relational databases and SQL.  I certainly will.  I will also continue to look for better ways to isolate and encapsulate data access in my applications.  As always, any project decisions have to match with the user and business requirements.  For new projects, I believe, we have genuine non-relational alternatives on the table (pun intended).

I covered some background on the SQL language and relational databases, the current and future for relational databases, the rise of frameworks that hide some of the complexities of database programming and the rise of the NoSQL databases.  In this second installment I will take a walk through (some of) the currently available open and closed source NoSQL databases.  Then, in part 3, I will point you to NoSQL Internet resources, past/upcoming events and offer some guidance for developers.

It’s amazing to see how many open and closed source alternative data stores have appeared.   At the same time more are appearing every day. If I have left off one or more of your favorite NoSQL databases post a comment.  Below you will find many different types of NoSQL databases:  Document-Oriented, Collection-Oriented, Column-Oriented, Object-Oriented, Graph-Oriented, Set-Oriented, Row-Oriented and more.

AllegroGraph

Company/Org: Franz Inc.
Type: Graph
Description: Modern, high performance, persistent graph database.
Storage: Disk based, meta-data and data triples.
API(s): SPARQL, Prolog

BerkleyDB

Company/Org: Oracle
Type: Key/Value
Description: C language embeddable library for enterprise-grade, concurrent, transactional storage services. Thread safe to avoid data corruption or loss
Storage: B-tree, hash table, persistent queue
API(s): C, C++ and Java
Notes: Use BerkleyDB XML layer on top of BerkleyDB for XML based applications. Comparison of BerkleyDB and relational databases

BigTable

Company/Org: Google
Type: Sparse, distributed, persistent multidimensional sorted map.
Description: Distributed storage system for structured data. Data model provides dynamic control over data layout and format. Data can live in memory or on disk.
Storage: Data is stored as an uninterpreted array of bytes. Client applications can create structured and semi-structured data inside the byte arrays.
API(s): Python, GQL, Sawzall API, REST, various.
Notes: Overview: Bigtable: A Distributed Storage System for Structured Data (PDF format)

Cassandra

Company/Org: Apache
Type: Dimensional hash table
Description: Highly scalable distributed database. Combines Dynamo’s distributed design and Bigtable’s column family data model.
Storage: Clusters of multiple keyspaces. The keyspace is a name space for column families. Columns are comprised of a name, value and timestamp.
API(s): Java, Ruby, perl, Python, C#, Thrift framework.
Notes: Open sourced by Facebook in 2008. Wiki, FAQ, Examples

CouchDB

Company/Org: Apache
Type: Document
Description: Distributed database with incremental replication, bi-directional conflict detection and management.
Storage: Ad-hoc and schema-free with a flat address space.
API(s): RESTful JSON API. JavaScript query language.
Notes: CouchDB Introduction, Technical Overview

db4o

Company/Org: Versant
Type: Object
Description: Java and .NET dual license (commercial and open source) object database.
Storage: Data objects are stored in the way they are defined in the application.
API(s): Java, .NET languages.
Notes: db4o db4o database runtime engine, about db4o

Dovetaildb

Company/Org: Millstone Creative Works
Type: JSON-based
Description: Schemaless database similar to Amazon’s SimpleDB. Open source, standalone Java application server.
Storage: JSON data format, “bags” (similar to tables).
API(s): HTTP and Javascript APIs
Notes: Dovetaildb JavaScript API reference manual

Dynomite

Company/Org: Cliff Moon
Type: Key/Value
Description: Open source Amazon Dynamo clone written in Erlang.
Storage: Distributed key/valve store, Pluggable storage engines.
API(s): Thrift API
Notes: Dynomite Wiki

eXtreme Scale

Company/Org: IBM
Type: In-memory grid/cache
Description: Distributed cache processes, partitions, replicates and manages data across servers.
Storage: Data and database cache, “near cache” for local subset of data. Java persistent cache. Map reduce support.
API(s): Java APIs, REST data service
Notes: eXtreme Scale Document library web site

GT.M

Company/Org: FIS
Type: Hierarchical, multi-dimensional sparse arrays, content associative memory
Description: Small footprint, multi-dimensional array with fill support for ACID transactions, optimistic concurrency and software transactional memory.
Storage: Unstructured array of bytes. Can be Key/Value, document oriented, schema-less, dictionary or any other data model.
API(s): Mumps, C/C++, SQL
Notes: GT.M FAQ

hamsterDB

Company/Org: Christoph Rupp
Type: Embedded storage library
Description: Lightweight embedded database engine. Supports on disk and in memory databases.
Storage: B+tree with variable length keys.
API(s): C++, Python, .NET and Java
Notes: hamsterdb FAQ, examples, tutorial

HBase

Company/Org: Apache
Type: Sparse, distributed, persistent multidimensional sorted map.
Description: Open source, distributed, column-oriented, “Bigtable like” store
Storage: Data row has a sortable row key and an arbitrary number of columns, each containing arrays of bytes.
API(s): Java API, Thrift API, RESTful API
Notes: Part of Apache Hadoop project. HBase Wiki, FAQ

Hypertable

Company/Org: Zvents Inc.
Type: Sparse, distributed, persistent multidimensional sorted map.
Description: High performance distributed data storage system designed to run on distributed filesystems (but can run on local filesystems). Modeled after Google Bigtable.
Storage: Row key (primary key), column family, column qualifier, time stamp.
API(s): C++, Thrift API, HQL
Notes: Hypertable Architectural overview, FAQ

Infinispan

Company/Org: JBoss Community
Type: Grid/Cache
Description: Scalable, highly available, peer to peer, data grid platform.
Storage: Key/Value pair with optional expiration lifespan.
API(s): Java, PHP, Python, Ruby, C
Notes: Infinispan FAQ, Wiki

InfoGrid

Company/Org:
Type: Graph
Description: Internet graph database made up on nodes and edges. Supports in-memory and persistent storage alternatives including RDBMS, file system, file grid, and custom storage.
Storage: Nodes (meshobjects) and edges (relationships). Meshobjects can have entity types, properties and participage in relationships. MeshObjects raise events.
API(s): RESTful web services.
Notes: InfoGrid Overview, FAQ

Keyspace

Company/Org: Scalien
Type: Key/Value
Description: Distributed (master/slave) key-value data store delivering strong consistency, fault-tolerance and high availability.
Storage: Uses BErkeleyDB library for For local storage. Key/Value pairs and their state are replicated to multiple servers.
API(s): C/C++, Python, PHP, HTTP
Notes: Keyspace Overview, FAQ

MemcachedDB

Company/Org:
Type: Key/Value
Description: High performance, high realiability persistent storage engine for key/value object storage.
Storage: Uses BerkeleyDB as storage library/backend.
API(s): Memcache protocol, C, Python, Java, perl
Notes: MemcacheDB complete guide (PDF format)

Mnesia

Company/Org: Ericsson
Type: Key/Value
Description: Multiuser distributed database including support for replication and dynamic reconfiguration.
Storage: Organized as a set of tables made up of Erlang records. Tables also have properties including type location, persistence, etc.
API(s): Erlang
Notes: Mnesia Reference manual

MongoDB

Company/Org: 10gen
Type: Document
Description: Scalable, high-performance, open source, schema-free, document-oriented database
Storage: JSON-like data schemas, Dynamic queries, Indexing, replication, MapReduc
API(s): C,C++, Java, JavaScript, perl, PHP, Python, Ruby, C#, Erlang, Go, Groovy, Haskell, Scala, F#
Notes: MongoDB Documentation Index

Neo4J

Company/Org: Neo Technology
Type: Graph
Description: Embedded, small footprint, disk based, transactional graph database written in Java. Dual license – free and commercial.
Storage: Graph-oriented data model with nodes, relationships and properties.
API(s): Java, Python, Ruby, Scala, Groovy, PHP, RESTful API.
Notes: Neo4J Wiki, API, FAQ

Redis

Company/Org:
Type: Key/Value
Description: Key/Value store with the dataset kept in memory and saved to disk asynchronously. “not just another key-value DB”
Storage: Values can be strings, lists sets and sorted sets.
API(s): Python, Ruby, PHP, Erlang, Lua, C, C#, Java, Scala, perl
Notes: Redis Wiki

SimpleDB

Company/Org: Amazon
Type: Item/Attribute/Value
Description: Scalable Web Service providing data storage, query and indexing in Amazon’s cloud.
Storage: Items (like rows of data), Attributes (like column headers), and Values (can be multiple values)
API(s): SOAP, REST
Notes: SimpleDB FAQ, Getting Started Guide, Developer Guide, API

Tokyo Cabinet

Company/Org: Mikio Hirabayashi
Type: Key/Value
Description: Library (written in C) of functions for managing files of key/value pairs. Multi-thread support.
Storage: Keys and Values can have variable byte length. Binary data and strings can be used as a key and a value.
API(s): C, perl, Ruby, Java, Lua.
Notes: Tokyo Cabinet Specifications, presentation (PDF format). Also available: Tokyo Tyrant (remote service), Tokyo Distopia (full text search), Tokyo Promenade (content management).

Voldemort

Company/Org: LinkedIn
Type: Hash Table
Description: “It is basically just a big, distributed, persistent, fault-tolerant hash table.” High performance and availability.
Storage: Each key is unique to a store. Each key can have at most one value. Supported types: JSON, string, identity, protobuf, java-serialization.
API(s): Java, C++, custom clients
Notes: Project Voldemort Wiki, Client how-to

It’s one thing to have lots of choices for non-relational databases.  Building up a NoSQL knowledge and experience base will definitely help managers, architects and developers compare and contrast what they already know about relational databases. Relational databases and the SQL language are still the architecture and lingua franca for the design, development and management of database applications.  While we are still at the beginning of the use of databases in cloud infrastructures, we can move forward faster because of all of the work and collaboration that is taking place.  Depending on the user and business requirements, we can choose between existing relational database technologies or the NoSQL alternatives.

Stay tuned to the conclusion (part three) of this blog series where I’ll give you links to additional NoSQL Internet resources, past/upcoming events, offer some guidance for developers and highlight some of the comments and answer some of the questions from the first two parts.

If you are looking for more information about NoSQL and non-relational databases, check out the following sites, blog posts, and articles:

There are several upcoming and recent NoSQL conferences and meet ups with great information and solutions for architects and developers.  Here are just a few:

It has been great to see the comments and suggestions that have been posted on my Computerworld blog and also on Digg.  Thank you to everyone for joining the discussion about relational and non-relational database development.  Here are excerpts from the many comments.

  • Emil Eifrem (Neo4j) commented: “You talk about scaling to size and handling Facebook’s 100M user profiles. That’s an important use case and one that for example a key-value store handles brilliantly. But it turns out most companies aren’t Facebook. You can categorize the four emerging categories of NOSQL databases (key-value stores, column family stores, document dbs and graph databases) along the axes of scaling to size and scaling to complexity. For more information about that, see this blog post. Graph databases (like e.g. Neo4j, which I’m involved with, or Sones) excels at representing complex and rapidly evolving domain models and then traversing them with high performance.”
  • Mongo-DB Developer commented: “We have seen the most common use case to date being use of nosql solutions as operational data store of web infrastructure projects. By operational, I mean, problems with real time writes and reads (contrast with data warehousing with bulk occasional loading). For these sort of problems these solutions work well and also fit well with agile development methods where the somewhat ‘schemaless’ (or more accurately, columnless) nature of some of the solutions, and the dynamically typed nature of the storage, really helps.”
  • Peter R commented: “I have already seen, in the domain I work in, the movement away from straight up SQL databases. XML databases are one technology that will be stealing a lot of SQL’s thunder (if they haven’t already). Do I think SQL will ever die? No. But the key is that there will be/are more options that need to be thought about when designing a system now.”
  • Anonymous commented: “I agree object databases have a purpose. They are great for large datasets that need to be replicated and called by a key. However SQL provides a very important capability and that it is to be able to query data across a number of datasets very efficiently, this will be very hard to duplicate in a simple key value database.”
  • Johannes Ernst commented: “One of the difficulties for “normal” developers with many of the NoSQL technologies that you’ve described so far has been the learning curve and the additional work required: e.g. it’s easy and everybody knows how to put “every customer can place one or more orders” into a relational database, but what if the only thing you have is keys and opaque values? Compared to many other NoSQL alternatives, graph databases provide a high level of abstraction, freeing developers to concentrate on their application, while still bringing many of the same NoSQL benefits. For example, in InfoGrid (http://infogrid.org/), a project I’m involved in, you can define “Customer” and “Order” and their relationship, and the InfoGrid graph database takes care of storing and retrieving data and enforcing the relationship. In our experience, that makes graph databases much more approachable to developers than many other NoSQL technologies.”
  • Database-ed commented: “The problem is that when folks think about storing information that they need to retrieve, they are so ingrained to SQL that they fail to think of other means. The Facebook example is a case in point. Who is ever going to ask for an accurate report of every user in Facebook? If you miss something the first time you go looking, you can always present it later. The end user doesn’t know you lost it, they assume it didn’t exist at the time and now it does. Yet you still need to store the data for easy retrieval. One problem with SQL is that it ties you into the relationships. Facebook is about letting people build the relationships based on the fields they want to build them on, not the ones you might think of. I know, it can be done within the confines of SQL, but it is a lot harder to do when the size gets large.”
  • Raptor007 commented: “Some tasks that are poorly serviced by SQL may get switched over to a new method, but other implementations that are perfectly suited to SQL will continue using it. As they quoted Eric Evans in the article, “the whole point of seeking alternatives is that you need to solve a problem that relational databases are a bad fit for.”
  • Miracle Blue commented: “While I highly doubt there’s going to be any significant migration away from SQL and the like any time soon, I think more web developers will start experimenting with data stores and other data solutions as we move further into the cloud.”
  • TheUnGod commented: “And as companies turn to ask their SQL DBAs what they think of this, they’ll say “lets stick with SQL.” Honestly, there are so many people that support SQL right now that will not switch any time soon this article is just bogus. You can’t make a switch like that until people can support it properly.”
  • SteelChicken commented: “Document centric is pretty dumb if you plan on doing any sort of analytics and data mining. Great for workflow and such.”
  • Angusm commented: “The significance of the NoSQL movement is that it adds new tools that offer better solutions to specific problems. The future probably belongs to NoSQL in the sense of ‘not-only SQL’, rather than ‘no SQL’. Don’t imagine that NoSQL solutions offer a free lunch though. I had an educational experience when I changed a view definition in a CouchDB data store and my first trivial query took an hour to come back. CouchDB can be pleasingly fast when all its indexes are built, but if you have to rebuild those indexes from scratch … well, let’s just say that’s not something you want to do on a live client-facing site.”
  • Afex2win commented: “digg is one of the bigger proponents of Cassandra, a distributed data store in the vein of which the article is talking about. http://about.digg.com/blog/looking-future-cassandra
  • Drmangrum commented: “SQL will be around for awhile. It’s good at doing what it was designed to do. However, there are many times when people use SQL simply because there is nothing better out there. As data complexity rises, a new method for accessing and persisting that data will have to be investigated. Part of the problem with many of the alternate solutions is that few people know how to use them.”

For years to come, most of us will still use relational databases and SQL.  At the same time, I will continue to look for better ways to design and implement databases in my applications.  As always, any project decision has to match with the user and business requirements.  For new projects, I believe, we now have genuine non-relational alternatives on the table (pun intended).

Programming is Life!

<blogs.computerworld.com>

Categories: IT News, SQL

20 Responses so far.

  1. Great site. A lot of useful information here. I’m sending it to some friends!

  2. 3 things I would bring to a desert island: Titleist golfball, a pair of shoes and this blog, great site ;)

  3. admin says:

    Thanks 4 All for attention :)

  4. Wow this is a great resource.. I’m enjoying it.. good article

  5. Lucy Dalcour says:

    Thanks I really needed this.

  6. Keep posting stuff like this i really like it

  7. [...] The end of SQL and relational databases ? « Febryadi.com [...]

  8. Java Jobs says:

    Hey…..nice post!!

    Awesome, No more words to explain :) :) :D just….cool blog.

  9. [...] As the amount of  data volume continues to increase at dramatic levels, and as the speed in which systems need to be deployed increases, there is a need for storing data in scalable, frictionless, inexpensive systems.  NOSQL databases aim to do just that.  These NOSQL databases need to be highly scalable, and need to be extremely easy to integrate and deploy.  Examples and explanations can be found here. [...]

  10. Great post! I’ll subscribe right now wth my feedreader software!

  11. Hey…..nice post!!

    Awesome, No more words to explain :) :) :D just….cool blog.

  12. nice. this blog is getting updates?

  13. Hi,just identified your web-site when i google something and wonder what hosting do you use for your website,the speed is more faster than my website, i really need it.will back to check it out,thank you!

  14. Jed Sherk says:

    I really like what you post here, very refreshing and intelligent. One thing though, I’m running Firefox on Fedora and parts of your site structure are a little misaligned. I know it’s not a popular setup, but it’s still something to keep an eye on. Just giving you a heads up.

  15. Hay adminstrator , i with Your posting. LOL Please come to my blog

  16. Thank you for providing this great blog. See mine!

  17. have been following ur site for 3 days. really love what you posted. by the way i am conducting a research concerning this issue. do you know any good blogs or online forums where I can get more? many thanks.

  18. Maris Brano says:

    Awesome post mate.

  19. Amazing, this is very interesting to read. Have you ever considered submitting articles to news papers?

  20. Have you ever considered about adding a little bit more than just your articles? I mean, what you say is fundamental and everything. However think about if you added some great graphics or videos to give your posts more, “pop”! Your content is excellent but with pics and videos, this website could undeniably be one of the greatest in its field. Very good blog!

Leave a Reply


  • RSS
  • Delicious
  • Digg
  • Facebook
  • Twitter
  • Linkedin
  • Youtube

Search Site

Popular Posts

Review: The Apple iP

In these times we have little opportunity for mystical religious ...

Everything you need

The iPhone OS 4.0 event just wrapped up and the ...

The end of SQL and r

The road to SQL started with Dr. E.F. Codd's paper, ...

Microsoft Arc Keyboa

I am losing the war on clutter. Apparently my desk ...

John Shepherd-Barron

John Shepherd-Barron, the Scotsman credited with inventing the world's ...

Sponsors