When I first looked at the database of an application for one of my employers, I was bit surprised with the database design that was made by previous software engineers/analysts. Two things that most surprised me were no integrity constraints and mapping tables between tables having one-to-many relationship. I’ll focus on the first one here but second one will also be part of it because it is also related to data integrity. The reason I got for not using integrity constraints was performance. Before commenting on it and going further into the details let me quote from few resources about not having integrity constraints.
“Q) When not to use referential integrity?
Ans) The short answer is Never. The designer of an application has a fiduciary responsibility to his employer/client and needs to ensure that data is as acurate as possible. To not enforce referential integrity is to tempt fate. Employees get fired for building systems that contain bad data leading to bad business decisions. Consultants get sued.”
“So if these rules are being examined for each and every database transaction, what is that doing to my system performance and response time? The answer is that it depends. Several things such as the volume of transactions and the types of constraints defined will affect performance. If you define cascading deletes across nine related tables, you are going to see a lag in response time while the database determines how many rows must be deleted from each table. This will also multiply the performance hit of other database features, such as journaling. So keep in mind that, yes, there is a cost for referential integrity in terms of system performance.
On the other hand, you might experience an improvement in performance, since the rules and relationships are enforced at the system level in the database. Instructions executed at this level run more efficiently than similar logic placed in a high-level language. Just as you would weigh the pros and cons of creating additional indices over your database, you should also consider the factors associated with adding referential integrity. “
“Why Disable Constraints?
During day-to-day operations, constraints should always be enabled. In certain situations, temporarily disabling the integrity constraints of a table makes sense for performance reasons. For example:
- When loading large amounts of data into a table using SQL*Loader
- When performing batch operations that make massive changes to a table (such as changing each employee number by adding 1000 to the existing number)
- When importing or exporting one table at a time
Temporarily turning off integrity constraints can speed up these operations.”
And the most important one:
The whole article is worth reading. It explains data modeling mistakes and the number one is not having referential integrity. I’ll quote its conclusion here:
Well, hopefully I’ve convinced you to avoid the urge to be a lazy data modeler, design for the future, use a data modeling tool, and drop constraints during bulk load operations. In short, always use referential integrity. But if not, hopefully you’ll at least understand when people curse your name several years from now. :)”
Well, these give you a fine idea whether one should use referential integrity constraints (RIC) or not. Even if it is a performance reason, one should use RIC since ultimately data integrity has to be checked either at application level or system level and as said above instructions executed at system level are much faster and efficient. And, if there is no data integrity check then your data is at risk, as in the above mentioned case where on production system, a record in one table was mapped to multiple due to mapping table and no integrity constraints. We had several issues on production system due to this including deadlocks and the one just mentioned. There are several bugs posted on the bugzilla for that project which are examples of this.