Databases: Intro and Overview


A database is a collection of data organized in a way that makes searching and retrieval relatively easy and more efficient. Adding a database to a site can really help in terms of content management, site freshness, and making a site more “dynamic.” It can also make it much easier to manage the design of a site, as you'll have less pages to deal with.

For database that will be tied to a web site, the most commonly used types are ?lat-file?and relational.

Flat-File Databases
A flat-file is basically what it sounds like, a single file which represents a single table of information. Within a flat-file database, fields are usually separated by a tab or a comma, while individual records are separated by a new line. These databases can be difficult to sort on and are very prone to data corruption and redundancy. They also cause headaches when you need to modify data (just imagine having to go through 15,000 lines of records to update 10 records). They can, however, be very useful for single column data, such as a word list used to check password validity.

Relational Databases
In a relational database data is store in multitables tables, with each table containing a related set of data. Each table is comprised of a set of rows and columns, with each row having the same number of columns. It is easier to retrieve a subset of data from a table, or to combine multiple tables into a larger data set when you use a relational database. For more web applications, a relational database will provide the best results

There are dozens, if not hundreds, of relational database systems on the market, however the five most common ones are: Oracle, MS SQL, MS Access, mySQL, and PostgreSQL. Let's take a quick look at each:

Oracle

The ?randdaddy?of all databases, Oracle is definitely designed with power-users in mind, and it is not recommended for new database users. If you are running an extremely large site, such as Amazon, however, Oracle would probably be the database for you.

Current Version: 10g
Cost: $4,995 – 40,000 and up per processor
Platforms: HP-UX, Linux, Mac OS, Windows, and Solaris

Pros
Cons

Powerful – There are few things a Oracle database can not do

Experience – Oracle has been around for 27 years

Bragging Rights – Instantly impress your peers and it looks good on your resume

Cost – the prices are astronomical and if you have an Oracle database you will also need a LARGE server to house it

Complexity – Because it is so powerful, Oracle can be extremely challenging to learn

MS SQL Server

Microsoft SQL Server is a good database that offers most of the features needed by users, while being slightly more affordable than Oracle. It? only as big as it needs to be, so it? also easier to learn.

Current Version: 2000
Cost: $5,000 – 20,000 per processor
Platforms: Windows only

Pros
Cons

Less Bulky ?MS SQL databases take up less hard drive space than comparable Oracle databases

Easy to Manage ?SQL Server is relatively quick and easy to use and administer when compared to others such as Oracle

Security ?most of you have probably heard about the Slammer issue, and other security problems that have plagued MS SQL

Limits ?MS SQL may get bogged down by extremely large databases (i.e. Amazon size)

Microsoft Access

MS Access is often the most underrated and undervalued relational database. This database is the easiest to learn with and offers the most extensive GUI interface.

Current Version: 2003
Cost: $229 ?also included in most MS Office versions
Platforms: Windows only

Pros
Cons

Affordable ?Compared to the thousands you? spend on Oracle or MS SQL, the Access price is negligible
Simplicity ?Its GUI interface makes it very easy to use and most users can pick it up quickly

Capacity ?ccess was never intended to handle a large number of concurrent users, so it can get bogged down on a heavily trafficked site
Simplicity ?ccess lacks some features that are standard in other relational databases, such as stored procedures and some complex queries

mySQL

This extremely popular open-source database is well known amongst those who run Linux systems or who have websites hosted on Linux servers.

Current Version: 4.0.20
Cost: Free
Platforms: Linux, Windows, Solaris, FreeBSD, Mac OS X, HP-UX, IBM AIX, QNX, Novell Netware, OpenBSD, SGI Irix, Dec OSF

Pros
Cons

Cross-Platform ?No one beats mySQL in its ability to run on multiple OSes
Non-Greedy ?mySQL doesn? need as many resources as some of the higher end databases, so it can be run on a less powerful server

Features ?mySQL has greatly improved in its support of common SQL functions in recent versions, but it still lacks the ability to have stored procedures, triggers, and views; also, no production version supports subqueries

PostgreSQL

PostgreSQL is the ?racle?of the open-source databases, with a robust set of features and ability to handle large databases with ease. Unlike Oracle, however, it is relatively easy to install, easier to learn, and doesn? kill your budget.

Current Version: 7.4.2
Cost: Free
Platforms: Linux and Windows

Pros
Cons

Powerful ?Nearly as robust as Oracle
Cost ?Can? beat free ?
Updated Regularly ?PostgreSQL is regularly updated, usually once a month or more, thanks to its open source nature

Windows ?While PostgreSQL is available for Windows, it takes some hoops to get it installed and will likely overwhelm those who are not familiar with Linux systems

How Do You Choose?
There are several things you should look at when considering which database you will use (presuming you have a choice).

How will it be used?
For a site that has simple databasing needs, a larger database would only be overkill. Most consumer and small business sites, for example, do not need a database like Oracle or even MS SQL.

How many users?
If you have a low-traffic site, you have a wider range of options available to you. If you have get a lot of traffic, though, you will want to stick to higher end databases that can handle the load. MS Access, for example, can not handle a lot of traffic, so it would not be good on a database heavy site that gets tons of hits every day.

Experience of Administrator
If the person who will be creating the database has little databasing experience, you don? want to throw them into the fire with an extremely complicated system. I.E. If your administrator is new to databasing, do not give them Oracle to learn on! That's just plain cruel! In terms of ease in learning. MS Access is the easiest, while PostgreSQL, mySQL, and MS SQL are about the same in terms of learning curves.

Server Restrictions
You don? want to buy a database that requires a more powerful server than you have (or can afford), so make sure you look at each database? minimum and optimal specifications to see what will and won't work with your system. Depending on your site load and databasing needs, you may need to consider a second server, which would also add to the overall cost.

Costs
If you're on a tight budget, obviously Oracle or MS SQL can be out of reach as far as pricing goes. You should also consider server cost, if you have to buy a server to go with the database, and the costs of administration (or paying an administration for higher end databases like Oracle or MS SQL).

Company Stability
You should also consider how long the company or organization that makes the database is in business, and how stable the business is. You don't want to get stuck with a database that may no longer be supported in a year or two. This is why it's a good idea to stick to one of the “big five” as they are all well established and unlikely to go anywhere for a long time to come.