Database Design: Normalization


Why should a web developer care?
A badly designed database is not only a pain to work with, but it can bog down your web server. In a worse case scenario, a bad database design can make your server grind to a halt! Although it may take an initial switch in thinking, most web developers will find that a well designed and normalized database is much easier to work with, massage for data, and run complex queries against. It's also easier to maintain data integrity with a good database

Normalization

So, what is normalization, you ask?
Normalization is basically a fancy databasing term for “optimizing the database tables.” If a database is normalized, then it is optimized.
The normalization process gets rid of redundant data and ensures that each field in a table relates to the other fields

Why Normalize?
Why Not?

A normalized database runs more efficiently. Queries may seem more complex, but they result in mor accurate data.

A normalized database tends to use less hard drive space than a non-normalized one.

A normalized database has fewer problems with data integrity as data that would otherwise “repeat” is in just one place.

A normalized database requires more processing power and can appear to be slower than a non-normalized database

Using a flat-file system, in which case you just can't normalize and still use your database.

Queries are more complex if you need data from multiple tables.

Laziness…let's be honest. It takes more work to normalize than to just throw it all in one table, especially if you are not going to have to be dealing with it for long.

Normalization: The Normal Forms!
There are five regular normal forms, or guidelines, in the normalization process. For web sites and most other databases, the third normal form (or 3NF) is the form you want to get to.
The higher forms are not generally used in production databases and tend to be more theoretical than practical.

Those higher forms include:

  • The Boyce-Codd Normal form which usually follows third normal form
  • The fourth normal form, which is usually only needed with tables that have many-to-many relationships
  • The fifth normal form which is really more of a checker, as it requires that you be able to reconstruct the original table after you have finished normalization

Let's look at the first three forms in more detail.

Zero Normal Form

The term zero normal form is sometimes used to refer to a database that is not normalized all. A zero normal form database can usually be identified by its containing only one huge table with all the data shoved into it.

Table: instructorsschedule
This table is in zero normal form, with all of the data shoved into a single large table.
This table is in zero normal form, with all of the data shoved into a single large table.

1st Normal Form

The first normal form requires that related data be identified by a primary key (unique identifier), and that there are no repeating or multi-valued groups of data (such as author1, author2, author3) in a table.

Notice in our instructorsschedule table above, that while we do have a primary key, we also have several fields doing the name1 convention. Let's fix that and get that table into first normal form.

Table: instructors

The instructors information has been placed in a table called instructors, with the instructorid as its primarykey.
Table: classes

All of the data that was repeating in the instructorschedule table is now in its own table called classes. Notice there are no more fields such as course1, course2, etc. The instructorid in this table relates to the instructorid primary key in our new instructors table.

2nd Normal Form

The second normal form requires that all tables be in the first normal form. It then requires that in a table which has a primary key made of multiple fields, all fields in the table must be related to ALL of the primary key fields.

Our instructors table is fine, however in the case of the classes table, we need to do some fixing. Our primary key is made up of five fields, and the departmentaddress, departmenttelephone, etc, fields only relate to one of those fields.

Table: departments

First we'll move the department info its own table called departments, with a departmentid as its primary key.

Table: classes

In the classes table, we remove all the department fields except the one that is part of the primary key. Instead of being the department name, it is now a foreign key that refers back to the departments table's primary key.
Table: instructors

By having department info in its own table, we can now add a departmentid field to the instructors table to indicate which department that instructor is a part of.

3rd Normal Form

The third normal form requires that all tables be in second normal form and that all data in a table depend directly on the primary key and not each other.

In the case of our tables above, instructors is in 3rd normal form, as is departments, but the departmentid field in classes only relates to the coursename and none of the other fields. And what happens if we want to add more details about the courses?

Table: classes

Instead of using a coursename field, we now refer to a courseid that relates to our new courses table. Notice we also got rid of the multiple primary keys in favor of a single classid, so it will be easier to refer this table to others if we expand the database further.
Table: courses

Our new courses table includes the course name, which department the course is part of, and some new information such as the number of credits it provides and a course description.

And now our database is in 3rd normal form and ready for use on a website or as a standalone database!