Referential integrity

Referential integrity is a property of data stating that all its references are valid. In the context of relational databases, it requires that if a value of one attribute (column) of a relation (table) references a value of another attribute (either in the same or a different relation), then the referenced value must exist.^[1]

For referential integrity to hold in a relational database, any column in a base table that is declared a foreign key can only contain either null values or values from a parent table's primary key or a candidate key.^[2] In other words, when a foreign key value is used it must reference a valid, existing primary key in the parent table. For instance, deleting a record that contains a value referred to by a foreign key in another table would break referential integrity. Some relational database management systems (RDBMS) can enforce referential integrity, normally either by deleting the foreign key rows as well to maintain integrity, or by returning an error and not performing the delete. Which method is used may be determined by a referential integrity constraint defined in a data dictionary.

The adjective 'referential' describes the action that a foreign key performs, 'referring' to a linked column in another table. In simple terms, 'referential integrity' guarantees that the target 'referred' to will be found. A lack of referential integrity in a database can lead relational databases to return incomplete data, usually with no indication of an error.

YouTube Encyclopedic

1/5
Views:
26 541
8 882
10 661
130 094
85 090

Transcription

This video introduces a very important type of constraint known as referential integrity. As a reminder, integrity constraints in a database restrict the allowable data beyond what's already restricted by the structure and types of the database. Now the term referential integrity refers to integrity of references that appear in the database. In a relational database, a reference from one couple to another occurs through specifying values, and integrity, referential integrity says that those values are valid. It's sort of equivalent to having no dangling pointers if we used pointers in the database; we'll see a number of examples. So let's look at our simple example database with the students, colleges and students applying to colleges, that we've been using for all our demos. Let's suppose for example, that we had a student, 123, who had applied to Stanford, for some major, let's say CS, with a decision of yes. When we have this tuple, the value in the SID field here of 123 presumably refers to a value here of an actual student. So maybe 123 is student Mary with some GPA in some high school. And furthermore, the Stanford value in the C name or college name attribute is presumably referring to the college name in the college table. So we would expect to have a Stanford value here. Referential integrity talks about these values here, referencing the corresponding values in the other tables. Specifically if we have referential integrity from a attribute A say of a relation R to an attribute B of another relation S. What that's saying is that every value that appears in the A column of relation R must have some corresponding value in the B column of relation S. So if we take a look at our example, we would say then that we would have referential integrity from the SID column of apply, to the SID column of students. We would expect every value that appears in this column to also have a value in the other column, and similarly we all have referential integrity from the college name attribute been applied to the college name attribute, in college. Again we want every value that appears in this column, to also appear in this column. Now we might have a violation, for example, if we had a, say, 555 applying to Stanford for some major and some decision. If we have no 555 over in the student table, then this here would be considered a referential integrity violation. Similarly we might have 1 2 3, which is valid because we have a 1 2 3 student here, but if 1 2 3 is applying to Yale and we don't have Yale over here, then again we have a referential integrity violation. Now let me mention that referential integrity is directional, so we talk about, this SID here, referencing the SID in the student table. We could have referential integrity in the other direction, but that's saying something different. That would be saying that every student must apply somewhere, every value of her student ID must appear in the apply table. And this particular example, we probably would not have that be the case, we would probably be able to have students who hadn't yet applied anywhere, or colleges where no one had applied yet. And the most sensible direction for this referential integrity in this case is from the apply to the student and the apply to the college. But again, we could have it in both directions if we so wished.

Formalization

An inclusion dependency over two (possibly identical) predicates $R$ and $S$ from a schema is written $R[A_{1},...,A_{n}]\subseteq S[B_{1},...,B_{n}]$ , where the $A_{i}$ , $B_{i}$ are distinct attributes (column names) of $R$ and $S$ . It implies that the tuples of values appearing in columns $A_{1},...,A_{n}$ for facts of $R$ must also appear as a tuple of values in columns $B_{1},...,B_{n}$ for some fact of $S$ .

Such constraint is a particular form of tuple-generating dependency (TGD) where in both the sides of the rule there is only one relational atom.^[3] In first-order logic it is expressible as $\forall {\vec {x}},{\vec {y}}.(R({\vec {x}},{\vec {y}})\rightarrow \exists {\vec {z}}.S({\vec {x}},{\vec {z}}))$ , where ${\vec {x}}$ is the vector (whose size is $n$ ) of variables shared by $R$ and $S$ , and no variable appears multiple times neither in the TGD's body nor in its head.

Logical implication between inclusion dependencies can be axiomatized by inference rules^[4]^: 193 and can be decided by a PSPACE algorithm. The problem can be shown to be PSPACE-complete by reduction from the acceptance problem for a linear bounded automaton.^[4]^: 196 However, logical implication between dependencies that can be inclusion dependencies or functional dependencies is undecidable by reduction from the word problem for monoids.^[4]^: 199

Declarative referential integrity

Declarative referential integrity (DRI) is one of the techniques in the SQL database programming language to ensure data integrity.

Meaning in SQL

A table (called the referencing table) can refer to a column (or a group of columns) in another table (the referenced table) by using a foreign key. The referenced column(s) in the referenced table must be under a unique constraint, such as a primary key. Also, self-references are possible (not fully implemented in MS SQL Server though^[5]). On inserting a new row into the referencing table, the relational database management system (RDBMS) checks if the entered key value exists in the referenced table. If not, no insert is possible. It is also possible to specify DRI actions on UPDATE and DELETE, such as CASCADE (forwards a change/delete in the referenced table to the referencing tables), NO ACTION (if the specific row is referenced, changing the key is not allowed) or SET NULL / SET DEFAULT (a changed/deleted key in the referenced table results in setting the referencing values to NULL or to the DEFAULT value if one is specified).^[6]

Product-specific meaning

In Microsoft SQL Server the term DRI also applies to the assigning of permissions to users on a database object. Giving DRI permission to a database user allows them to add foreign key constraints on a table.^[7]

References

^ Chapple, Mike. "Referential Integrity". About.com. Retrieved 2011-03-20. Definition: Referential integrity is a database concept that ensures that relationships between tables remain consistent. When one table has a foreign key to another table, the concept of referential integrity states that you may not add a record to the table that contains the foreign key unless there is a corresponding record in the linked table.
^ Coronel et al. (2013). Database Systems 10th ed. Cengage Learning, ISBN 978-1-111-96960-8
^ Kolaitis, Phokion G. "A Tutorial on Database Dependencies" (PDF). University of California Santa Cruz & IBM Research - Almaden. Retrieved 2021-12-10.
^ ^a ^b ^c Abiteboul, Serge; Hull, Richard B.; Vianu, Victor (1994). "9. Inclusion Dependency". Foundations of Databases. Addison-Wesley. pp. 192–199.
^ Microsoft Support (2007-02-11). "Error message 1785 occurs when you create a FOREIGN KEY constraint that may cause multiple cascade paths". microsoft.com. Retrieved 2009-01-24.
^
ANSI/ISO/IEC 9075-1:2003, Information technology—Database languages—SQL
- Part 1: Framework (SQL/Framework)
- Part 2: Foundation (SQL/Foundation)
^ Chigrik, Alexander (2003-08-13). "Managing Users Permissions on SQL Server". Database Journal. Retrieved 2006-12-17.

External links

DRI versus Triggers (archived)

v t e Database management systems
Types	Object-oriented comparison Relational list comparison Key–value Column-oriented list Document-oriented Wide-column store Graph NoSQL NewSQL In-memory list Multi-model comparison Cloud Blockchain-based database
Concepts	Database ACID Armstrong's axioms Codd's 12 rules CAP theorem CRUD Null Candidate key Foreign key Superkey Surrogate key Unique key
Objects	Relation table column row View Transaction Transaction log Trigger Index Stored procedure Cursor Partition
Components	Concurrency control Data dictionary JDBC XQJ ODBC Query language Query optimizer Query rewriting system Query plan
Functions	Administration Query optimization Replication Sharding
Related topics	Database models Database normalization Database storage Distributed database Federated database system Referential integrity Relational algebra Relational calculus Relational model Object–relational database Transaction processing
Category Outline WikiProject

This page was last edited on 24 March 2024, at 09:42

From Wikipedia, the free encyclopedia