Link Details

Link 135126 thumbnail
User 327967 avatar

By selikoff
via selikoff.net
Published: Nov 20 2008 / 06:21

As someone with a long history in database optimization and who even did their Master’s project on Database normalization, I’m probably the last person in the world to argue against database normalization. From a theoretical standpoint, database normalization is a wonderful thing, helping to organize your data into easy to manage and understand parts. For this article, I will play devil’s advocate and argue why too much normalization can be a bad thing
  • 18
  • 6
  • 1628
  • 642

Comments

Add your comment
User 327526 avatar

chase.saunders replied ago:

0 votes Vote down Vote up Reply

misinformation

User 306525 avatar

newton_dave replied ago:

0 votes Vote down Vote up Reply

Useless comment: how about a rebuttal with content? Which point(s) did you disagree with and why?

User 327526 avatar

chase.saunders replied ago:

0 votes Vote down Vote up Reply

Sorry newton_dave, I should have replied to your comment but posted started a new thread by mistake (below). In the mean time, I've blogged this: http://ramthemdown.wordpress.com/2008/11/21/denormalization-is-bad-umkay/

User 327967 avatar

selikoff replied ago:

0 votes Vote down Vote up Reply

I replied in your blog but I'll repost it here for others to comment:

"My work on relational database theory has strongly convinced me..."

That's just it, your working from theory. In practical systems, you may encounter cases where denormalized data, such as materialized views, are a good thing. My argument wasn't that we should get rid of normalization, but that there is such a thing as excessive normalization in some real-world systems.

"If denormalizing speeds up your stuff ... you need a better database."

That's just silly. Denormalization will almost always be faster. Think about it... what if all the data you needed for a given query was in a single record of a table? Then, no query would perform any joins and given proper search indexing, you'd have an extremely fast system for Reads, far faster than any fully normalized system. That doesn't mean we should denormalize the entire database of course, since we know that denormalization leads to update/maintenance issues as well as potential storage nightmares. You should really fully understand why denormalization can be bad, before taking the position we should get rid of it all together.

User 327526 avatar

chase.saunders replied ago:

0 votes Vote down Vote up Reply

Scott, I approved your comment in my blog and replied there. Even though you always delete mine :)

http://ramthemdown.wordpress.com/2008/11/21/denormalization-is-bad-umkay/#comment-11

User 327526 avatar

chase.saunders replied ago:

0 votes Vote down Vote up Reply

Fair enough... as Date and Darwen have shown with extensive benchmarking and analysis, only poorly optimized databases show a speed increase with denormalized data. I have never seen a test that showed a speed increase in Oracle, SQL Server, or Postgres. Have heard that this can work in MySQL, but as that tool matures it will surely stop working. The reason is fairly obvious... there is less data to read when data is normalized, and an efficient query analyzer can set up a wonderfully efficient join using pointers and such.

Furthermore, denormalized data introduces what are known as the insert anomaly, the delete anomaly, and the update anomaly. For example, if you delete a category in a denormalized system you have to delete it from potentially hundreds or thousands of records. This might not happen very often, so it's inefficiency might be outweighed by faster reads (if that was really the case). However, it's very difficult to patch up all the data in a complex system... we've all seen examples of the mess that results in, I imagine.

Most egregious is the author's suggestion that normalization 'causes' problems in optimizing queries and such. This seems so off base as to not require debunking... it's on the author's burden to provide examples.

I have heard that in certain, massively distributed systems, there is a parallelism argument for denormalizing. I am skeptical of that, but not qualified to counter it at this point. I will note, however, that I have spend a lot of time reading literature from the high-end data warehouse and OLAP providers, and haven't heard this technique mentioned. They all want you to normalize (with star configuration or something) and then optimize their joins for that scenario.

All of these arguments went down in the 60's and early 70's. The relational tools weren't nearly as advanced in that time, and the doubters made similar arguments. Relational won anyway, despite the fact that most systems at that time were high-volume ones with programmers sitting around ready to optimize.

This "denormalization is good" meme is popping up all over the place lately, almost always by PHP/MySQL programmers who haven't made any effort to learn the history, or more importantly, the current research on this. (Though in this case it's apparently a Java /MySQL guy. Not that I'm dissing on MySQL, I have used it here and there.)

It's a meme that spreads not because it's true, but because it sounds true and gives it's proponents a sense that they understand and control an esoteric tradeoff. It's much less palatable to say, "a good RDBMS is the most highly optimized piece of software on the planet... the best way to optimize performance is to normalize your database and trust the awesomeness of the uber-elite dudez who wrote it." But I think it's the truth.

To me, it's extremely dangerous. Most programmers and DBA's aren't there aren't going to dig deep enough to get at the facts... in fact for many newer folks this advice would probably be harmful even it were true, because the tradeoff decisions required would be very complicated to make correctly. Fair?

User 327967 avatar

selikoff replied ago:

0 votes Vote down Vote up Reply

I know all about history chase and anyone who says there is no place for denormalization hasn't done real database and application development. There are times, such as materialized views, where denormalized data is extremely helpful. Granted not all systems offer materialized views so develops may need to rely on other, denormalized tables.

As for your comments about the 60s and 70s, was that when 10 megabytes of disk space was huge and multiprocessing didn't exist in most environments? Keep up with the times man... things have substantially changed.

User 327967 avatar

selikoff replied ago:

0 votes Vote down Vote up Reply

I added a second article in the series, feel free to check it out:
http://www.selikoff.net/blog/2008/11/23/denormalized-databases-a-better-example/

Add your comment


Html tags not supported. Reply is editable for 5 minutes. Use [code lang="java|ruby|sql|css|xml"][/code] to post code snippets.