Schema Refinement

Introduction

Schema refinement refers to the process of refining a schema using various techniques. The best technique for schema refinement is decomposition.

Normalization means "splitting the tables into smaller tables with fewer attributes in such a way that the table design does not contain any problems of insertion, deletion, or update anomalies and guarantees no redundancy."

Normalization, or schema refinement, is a technique for organizing the data in a database. It is a systematic approach to decomposing tables to eliminate data redundancy and undesirable characteristics like insertion, update, and deletion anomalies.

⇒ Redundancy: Refers to the repetition of the same data or duplicate copies of the same data stored in different locations.

⇒ Anomalies: Refer to the problems that occur due to poorly planned and unnormalized databases where all the data is stored in one table, sometimes called a flat-file database.

Anomalies or Problems Facing Without Normalization (Problems Due to Redundancy)

Anomalies refer to the problems that occur in poorly planned and unnormalized databases where all the data is stored in one table, sometimes called a flat file database. Let's consider such a schema. Here, all the data is stored in a single table, which causes redundancy of data, or anomalies, as S_ID and S_name are repeated multiple times for the same C_ID. Let us discuss anomalies one by one.

Due to the redundancy of data, we may encounter the following problems:

1. Insertion Anomalies: It may not be possible to store some information unless some other information is stored as well.

2. Redundant Storage: Some information is stored repeatedly.

3. Update Anomalies: If one copy of redundant data is updated, inconsistency is created unless all redundant copies of data are updated.

4. Deletion Anomalies: It may not be possible to delete some information without losing some other information as well.

Problems in Updating / Update Anomaly

1. Update Anomaly: If there is an update in the fee from 5000 to 7000, then we have to update the FEE column in all the rows; otherwise, the data will become inconsistent.

Insertion Anomaly and Deletion Anomaly:

These anomalies exist only due to redundancy; otherwise, they do not exist.

Insertion Anomalies:

When a new course, say C4, is introduced, but no student is enrolled in the C4 subject. Because of the insertion of some data, it is forced to insert some other dummy data.

Deletion Anomaly:

Deletion of the S3 student causes the deletion of the course. Because of the deletion of some data, it forces the deletion of some other useful data.

Solutions to Anomalies:

The MAX() function returns the maximum value of a selected column.

Decomposition of Tables – Schema Refinement

Purpose of Normalization:

⇒ Minimize redundancy in data.

⇒ Remove insert, update, and delete anomalies during database activities.

⇒ Reduce the need to organize the data when it is modified or enhanced.

⇒ Normalization reduces a complex user view to a set of small and subgroups of fields or relations. This process helps to design a logical data model known as a conceptual data model.

Advantages of Normalization:

1. Greater overall database organization will be gained.

2. The amount of unnecessary redundant data is reduced.

3. Data integrity is easily maintained within the database.

4. The database and application design processes are much more flexible.

5. Security is easier to maintain or manage.

Disadvantages of Normalization:

1. Normalization produces many tables with a relatively small number of columns. These columns then have to be joined using their primary/foreign key relationship.

2. This has two disadvantages: