Normalization means "splitting tables into smaller tables containing fewer attributes in such a way that the table design does not contain any problems of insertion, deletion, or updating anomalies, and guarantees no redundancy."
The evolution of normalization theories or steps of normalization, which result in different normal forms, is illustrated below:
1.First Normal Form (1NF)
2.Second Normal Form (2NF)
3.Third Normal Form
4.Boyce-Codd Normal Form (BCNF)
5.Fourth Normal Form (4NF)
6.Fifth Normal Form (5NF)
NOTE Points:⇒ 1NF is a mandatory normal form; the remaining are optional.
⇒ If you construct E-R diagrams into tables, then 4NF and 5NF need not be applied to the tables.
⇒ In practice, normalization is usually applied up to 3NF, and it is very rare to go beyond that.
⇒ 2NF deals with partial dependencies, and 3NF deals with transitive dependencies.
A relation is said to be in 1NF if it is already in unnormalized form and it satisfies the following conditions, rules, or qualifications:
1.Each attribute name must be unique.
2.Each attribute value must be single or atomic, i.e., single-valued attributes.
3.Each row/record must be unique.
4.There must be no repeating groups.
Example: How do we bring an un-normalized table into first normal form? Consider the following relation:
Solution:
The given table is not in the first normal form because the [Color] column can contain multiple values. For example, the first row includes values "red" and "green." To bring this table to the first normal form, we need to split the table into two tables.
Here are the resulting tables:
A relation is said to be in 2NF if it is already in 1NF and has no partial dependency. This means no non-prime attribute is dependent on only a part of the candidate key.
Alternatively, a relation is in second normal form if it satisfies the following conditions:
⇒ It is in first normal form.
⇒ All non-key attributes are fully functionally dependent on the primary key.
Note:⇒ Partial Functional Dependency: If a non-prime attribute of the relation is derived by only a part of the candidate key, then such a dependency is known as partial dependency.
Example:Consider the following relation
This table has a composite primary key [Customer ID, Store ID]. The non-key attribute is [Purchase Location]. In this case, [Purchase Location] only depends on [Store ID], which is only part of the primary key. Therefore, this table does not satisfy the second normal form.
To bring this table to the second normal form, we break the table into two tables, resulting in the following:
Ex. Given relation R(ABCD) and F:{AB→C, B→D} Decompose in into 2NF.
from the given FDs determine primary key. Necessary attributes to include in the key are A, B (because this attributes are not in RHS of FD).
Find the closure set of AB
AB + = ABC
= ABCD (∵ B→ D)
AB is a primary key.
From the FDs B → D is partially depending on AB. So decompose the table. (D is a non-prime attribute derived by a part of the key)
A database is in third normal form (3NF) if it satisfies the following conditions:
⇒ It is in 2NF.
⇒ There is no transitive functional dependency.
By transitive functional dependency, we mean the following relationships in the table: A is functionally dependent on B, and B is functionally dependent on C. In this case, C is transitively dependent on A via B. Specifically, a non-key attribute should not be dependent on another non-key attribute.
To summarize, a relation is in 3NF if:
⇒ It is in 2NF.
⇒ No non-key attribute is transitively dependent on a candidate key.
Example: Consider the following relation.
In the table above, [Book ID] determines [Genre ID], and [Genre ID] determines [Genre Type]. Therefore, [Book ID] determines [Genre Type] via [Genre ID], creating a transitive functional dependency. This structure does not satisfy third normal form (3NF).
To bring this table to third normal form, we split the table into two as follows:
A relation is said to be in BCNF if and only if every determinant is a candidate key.
⇒ BCNF is an advanced version of 3NF and is stricter than 3NF.
⇒ A table is in 3NF if for every functional dependency X→Y , X is the super key of the table.
⇒ For BCNF, the table should be in 3NF, and for every functional dependency, the left-hand side (LHS) must be a super key.
Example: Let's assume there is a company where employees work in more than one department. EMPLOYEE table:
➔ In the above table Functional dependencies are as follows:
EMP_ID → EMP_COUNTRY and EMP_DEPT → {DEPT_TYPE, EMP_DEPT_NO} Candidate key: {EMP-ID, EMP-DEPT}
➔ The table is not in BCNF because neither EMP_DEPT nor EMP_ID alone are keys. To convert the given table into BCNF, we decompose it into three tables:
A relation is said to be in 4NF if it is in Boyce-Codd Normal Form (BCNF) and has no multi-valued dependencies.
⇒ For a dependency A→B , if for a single value of A, multiple values of B exist, then the relation has a multi-valued dependency.
Multi-Valued Dependency (MVD)
A table is said to have a multi-valued dependency if the following conditions are true:
2.The table must have at least 3 columns for it to have a multi-valued dependency.
3.For a relation R (A, B, C), if there is a multi-valued dependency between A and B, then B and C should be independent of each other.
1.For a dependency A→B, if for a single value of A, multiple values of B exist, then the table has a multi-valued dependency.
If all these conditions are true for any relation (table), it is said to have a multi-valued dependency.
Example:
⇒The given STUDENT table is in 3NF, but the COURSE and HOBBY are two independent entities. Hence, there is no relationship between COURSE and HOBBY. In the STUDENT relation, a student with STU_ID 21 contains two courses, Computer and Math, and two hobbies, Dancing and Singing. So there is a multi-valued dependency on STU_ID, which leads to unnecessary repetition of data.
⇒To make the above table into 4NF, we can decompose it into two tables: STUDENT_COURSE and STUDENT_HOBBY.
Concept of Surrogate Key:
⇒A surrogate key is an alternative to a primary key that allows for the unique identification of records without using natural or composite keys. It is an artificial key used to uniquely identify data in a database.
⇒ Characteristics of a surrogate key:
1.Uniqueness: The value is never reused and is unique within the system.
2.System Generated: It is generated by the system and is usually an integer.
3.Non-Manipulable: The value cannot be manipulated by users or applications.
4.Simple: The value is a single, numeric key, not a combination of values from multiple domains.
⇒Generation Methods: Surrogate keys can be generated in various ways, and most databases offer mechanisms to generate them:
1.Oracle: Uses SEQUENCE
2.MySQL: Uses AUTO_INCREMENT
3.SQL Server: Uses IDENTITY
Decomposition of a relation is necessary when a relation in the relational model is not in the appropriate normal form. Relation RRR is decomposed into two or more relations if the decomposition is both lossless join and dependency preserving.
Types of Decomposition:
1.Lossy Decomposition: If we decompose a relation RRR into relations R1R_1R1 and R2R_2R2, the decomposition is lossy if R1∩R2>RR_1 \cap R_2 > RR1∩R2>R. This means the join of R1R_1R1 and R2R_2R2 does not yield the original relation RRR. One disadvantage of decomposition into two or more relations is that it can cause issues with certain attributes, such as name and department, being misaligned or lost.
This process ensures that the resulting relations maintain the original data without redundancy and adhere to the necessary functional dependencies.
This relation is decomposed into two relation no_name and name_dept:
In lossy decomposition, spurious tuples are generated when a natural join is applied to the relations in the decomposition.
The above decomposition is a bad decomposition or Lossy decomposition.
2.Lossless Join Decomposition: Decomposition is lossless if R1 ∪ R2 = R.
⇒ This is also referred to as non-additive decomposition. The lossless-join decomposition is always defined with respect to a specific set F of dependencies. To check for lossless join decomposition using the FD set, the following conditions must hold:
1.The union of attributes of R1 and R2 must be equal to the attributes of R. Each attribute of R must be either in R1 or R2.
Att(R1) ∪ Att(R2) = Att(R)
2.The intersection of attributes of R1 and R2 must not be NULL.
Att(R1) ∩ Att(R2) ≠ ∅
3.The common attribute must be a key for at least one relation (R1 or R2).
Att(R1) ∩ Att(R2) → Att(R1)
or
Att(R1) ∩ Att(R2) → Att(R2)
Example 1: A relation R(A,B,C,D) with FD set (A→BC) is decomposed into R1(ABC) and R2(AD), which is a lossless join decomposition as:
1.The first condition holds true as Att(R1) ∪ Att(R2) = (ABC)∪(AD) = (ABCD) = Att(R)
2.The second condition holds true as Att(R1) ∩ Att(R2) = (ABC)∩(AD) ≠ ∅
3.The third condition holds true as Att(R1) ∩ Att(R2) = A is a key of R1(ABC) because A → BC .
If we decompose a relation R into relations R1 and R2, all dependencies of R must either be a part of R1 or R2 or must be derivable from the combination of FDs of R1 and R2.
The dependency-preserving decomposition is another property of a decomposed relational database schema D in which each functional dependency X→Y specified in F either appears directly in one of the relation schemas Ri in the decomposed D or can be inferred from the dependencies that appear in some Ri.
Decomposition D={R1,R2,…,Rn} of R is said to be dependency-preserving with respect to F if the union of the projections of F on each Ri in D is equivalent to F. In other words, the join of R1,R2,…,Rn over X preserves the dependencies because each dependency in F represents a constraint on the database. If decomposition is not dependency-preserving, some dependency is lost in the decomposition.
Example 1: A relation R(A,B,C,D) with FD set (A→BC) is decomposed into R1(ABC) and R2(AD) , which is dependency-preserving because the FD A → BC is a part of R1(ABC) .
Example 2: Let a relation R(A,B,C,D) and a set of FDs F ={A→B,A→C,C→D} be given. The relation R is decomposed into:
⇒ R1 = (A,B,C) with FDs F1={A→B,A→C}
⇒ R2 = (C,D) with FDs F2 = {C→D}
F′ = F1∪F2 ={A→B, A→C, C→D}
So, F′ = F .
Thus, the decomposition is a dependency-preserving decomposition.