teachask

DBMS

 

DBMS

CHAPTER-1

INTRODUCATION TO DBMS





System

A system is an integration of entities, alternatively designed as components, which have integration among them.

For E.g., consider a particular department in a college (or) university. The entities of the department are students, non-teaching staff, infrastructure etc. These entities interact with another.

          The information system can be either a conventional file processing system (or) database management system.

Explain the Conventional File Processing System:

          In the conventional File Processing System each and every sub system of the information system will have own set of files. As a result there will be a duplication of data between various sub systems.

          The concept of Conventional File Processing System is shown below.

 

 

 

 

 

 

 

 


   The above system consists of 3 sub-systems namely application-x, application-y and applications.

    It is clearly that some of the files are duplicated in different subsystems of the conventional file processing system. This will intern increase the data redundancy.

Example of Conventional File Processing System:

    Consider the example of a hospital system. The following diagram of the hospital is shown below.

    The patients come to the hospital from the society. Upon the arrival of a preliminary registration is done by seeking information about the patient. Then depending on the type and illness, the patient will either be treated as out-patient (or) in-patient. In some cases initially a patient will be treated as out-patient and then the patient will be admitted as outpatient if necessary. Finally the bills are to be paid before the patient is discharged. In this system we are using four files. The files are

Patient File:    At the Registration.

In-patient Treatment File:  at the in-patient section.

Out-patient Treatment File: at the out-patient section.

In Voice File: at the accounts section.

    The files are maintained in different sections of the hospital in a decentralized manner, certain data items (patient no, patient name, patient address) will be duplicated. This will have some undesirable results. Some of the difficulties of the conventional file processing system of the hospital are the following.

There may be a duplication of data in different sections of the hospital which would lead to inconsistency of data.

A lot of paper work and telephone calls would be required to synchronize file structure.

The system cannot provide answer to complex queries.

What are the Drawbacks of Conventional File Processing System ?

A list of drawbacks of the Conventional File Processing System is presented below.

Uncontrolled Redundancy of data.

Inconsistency of data.

Inflexibility

Lack of backup and recovery

Limited data sharing.

Poor enforcement of standards.

Unable to represent Relationships among data.

Excessive program maintenance and low programming productivity.

Uncontrolled Redundancy of data

    Each sub system of an organization maintains own set of files without data sharing, the same data will be available in different files. This will result increased disc space, increased time of data entry and inconsistency of data.

Inconsistency of data

    The uncontrolled redundancy of data will permit the system to have the same data in different files. As a result, a particular data element like patient name will be entered differently in different files. Which is nothing but inconsistency of that particular data element . while performing the basic data operations like updation, deletion, retrieval etc. This inconsistency will give misleading results.

Inflexibility

In the conventional file processing system generally top down approach will be followed in file design. In this approach a set of meaningful entities of the proposed system will be identified along with their attributes to create the files. The actual set of reports which is required for the system will not be considered on this approach. As a result , it may not be possible to meet the requirements of the users fully. Also in the future , if there is a some changes in the user requirements , then the conventional file processing system will not be flexible to provide the results.

Lack of Backup and recovery

    In this conventional file processing system there is no implicit facility for backup and recovery from system failure. It means that when an application program failed in middle of its work on its updating on a file.

Limited data sharing

    In the conventional file processing system the data is stored in a decentralized manner, hence   sharing of the data is complex one.

Poor enforcement of standards

    Since different applications and their respective files were developed by different groups will design data fields, since each group will follow its own standards for fields name, fields width, fields type etc. This will create a serious difficulty while modifying programs and data structures by different groups of users which will leads to low programmer productivity.

Unable to represent relationships among data.

    In the conventional file processing system there is no implicit facility to represent relationship among data in different file for a single system.

Excessive program maintenance & Low program productivity

    Since the different applications are developed differently by different groups in terms of file specifications and program specifications, it will be very difficult to modify the programs and data structure at a later stage by a different group. Many of program variables may be defined differently in different programs. All these difficulties will leads to excessive maintenance.

    Programmer productivity is a measure of time taken to develop an application. If the time taken to develop an application is low then the programmer productivity is high and vice versa.

What is DATABASE?

Database is a collection of related data (or) files.

Data means known facts that can be recorded and that have implicit meaning.

For E.g., consider the names, telephone numbers and addresses of the people. We have recorded this data in an address book (or) we may have stored it on a file in the hard disk, using a computer and software such as Microsoft Excel (or) MS Access. This collection of inter related data is a database. Generally a database contains one data file (or) large number of data files. The database is organized in such a way that a application programs will quickly retrieve the required data.

What is Database Management System (DBMS)?

    A database management system (DBMS) is a collection of programs that enables users to create and maintain a database.The DBMS is general-purpose software system that facilitates the defining, constructing, manipulating and sharing database among various users and applications.

Defining a database involves specifying the data types, structures, and constraints for the data to be stored in the database.

Constructing the database is the process of storing the data itself on some storage medium that is controlled by the DBMS.

Manipulating a database includes such functions as querying the database to retrieve specific data, updating the database to reflect changes in the database, and generating reports from the data.

Sharinga database allows multiple users and programs to access the database simultaneously. Other important function provided by the DBMS is unauthorized users cannot access the database.

 

Example of Database Management System

Consider the example of the hospital system which deals in-patients as well as out- patients. The hospital system is shown below.

 

 

 

 

 

 

 

 

 


 In the Conventional File Processing System there will be a separate system for in-patients, Out-patient and Accounts. But under the database approach, all the necessary files are included in a single database, which minimizes redundancy of data and facilitates sharing. As a result all the drawbacks of Conventional File Processing System are eliminated in the Database System.

What are the Advantages of Database Management System:

 The Advantages of Database Management System are

Controlled Redundancy of data.

Consistency of data.

Flexibility

Backup and recovery

Enhanced Data sharing.

Better enforcement of standards.

Relationships among data.

Reduced  program maintenance and Increased  programmer productivity.

 

1. Controlled Redundancy of data

The database approach uses a centralized place to save data hence the amount of data redundancy will be minimized. Data redundancy can be minimized by applying normalization process in database design.

 

2. Consistency of data

The controlled redundancy will minimize the presence of same data in different files. This will lead consistency of data by reducing multiple inserts and updates on same data at different places.

3. Flexibility

          In database approach , the database design based on bottom up approach. In this approach all the reports that are currently used and various expected reports are taken into design the database. When some changes in report requirements occurs revision of database can be done with minor changes in the database.

4. Backup and Recovery

          In DBMS if a transaction fails in middle of its execution due to system failure then DBMS will get back the data into its consistent state as before.

5. Enhanced Data Sharing

In database approach uses centralized database hence same data can be shared by different applications or users simultaneously with concurrency control. In DBMS multiple users will access same data and can do changes.

6. Better enforcement of standards

          Since different files of the database are design at a time of a different subsystems, it will be better enforcement of standards in terms of defining field name, field width , field type etc.,

7. Relationships among data

          We can apply relationships among data to improve performance of applications and consistency (correctness) of data. For example foreign key constraint makes a relationship between Employee and Department information.

8. Reduced program maintenance and increase of programming productivity.

          Different applications are developed under the coordination of the database administrator. As a result, there will be a integrated effort among the different group in terms of file design and program design. This will reduce the program maintenance.       

Programmer productivity is measure of time taken to develop an applications. In database approach , data is separated from programs. There are many fourth generation languages available to access and manipulate the database.

Because of advanced capabilities of fourth generation languages , the time taken to develop an application will be less when compared to the time taken to develop using conventional file processing system.

What are the Application of DBMS ?

         The application of database is:

Banking: For customer information, accounts, loans, Bank transactions.

Airlines: For reservations and scheduled information.

Universities: For student information, course registration and grading.

Telecommunication: For keeping records of call mode, generating monthly bills,                  

     maintaining balances on pre-paid cards and storing information about the communication 

    Network.

What are the Disadvantages of DBMS ?

        The following are the disadvantages of using DBMS.

Increased complexity.

Requirement of new and specialized manpower.

Large size of DBMS.

Increased installation and maintenance cast.

Conversion cost.

Increased complexity

A multi user DBMS becomes an complex piece of software due to expected functionalities from it. It becomes necessary for database designers, developers, database administrator and end users to understand these functionalities. Failure to understand that can lead to bad designed decisions.

Requirement of new and specialized manpower

Because of rapid in database technology and organizations, business need to trained manpower on regular basis to design and implement of database administrative services and manage a staff of new people, therefore an organization needs to maintain specialized skill person.

Large size of DBMS  

The DBMS occupies many Giga Bytes of storage space and requires more amount of main memory to run efficiently.

Increased installation and maintenance cost

The DBMS software has a high initial cost. It requires trained person to install and operate. and also has more annual maintenance. Installing such software’s also requires upgrades to the hardware and software.

Conversion cost

The conversion cost from old database technology to modern database environment is high.

Explain the DBMS Architecture 

The main aim of database system is to provide an abstract view of data hiding of certain detail of how data is stored. And manipulated, to satisfy these needs to develop architecture for database system.

In early days the whole DBMS package was a single package where as modern DBMS is based on client-server architecture.

Under the client-server architecture the database is not present in the client machine. But the client machine connected to the database system through Network and server.

There are two types of DBMS architecture as shown below.

 

 

 

 

 

 


          Client

 

 

 

 

Application srever

 
 


          Server

 

 

          Two Tire Architecture   Three Tire Architecture

 

 

In two-tier architecture, the application is a component that resides on the client machines that communicate server machine through query language statements.

In three-tier architecture, the client machine cannot directly communicate with an application server, the application server communicates with a database to access data. Three tier architectures are more suitable for large applications.

Explain the Database Architecture

        A database Architecture is shown below depending upon the three tier architecture. It contains of 3 levels.

Internal level

       The internal schema defines the internal level. The internal level is the lowest level of data abstraction. This level indicates how the data will be stored into the database and describes the file structures and data structures and methods to be used by the data base.

Conceptual level

The conceptual schema defines the conceptual level. The conceptual level in the middle level abstraction. This level indicates entities, attributes, relationship between entities and attributes.

External level

External schema defines the external level. The external level is the highest level of data abstraction. This level describes part of database. i.e., relevant to the user.

 

 

 

 

What is Data independency ?

        Data independency is the characteristic of database. To change the schema at one level without changing the schema at the higher level. There are two types of data independency as shown below.

 

 

Physical independency

In physical independency, changes to the internal schema such as file structures, accessing methods and devices used for store the data must be possible, without changing the conceptual schema and external schema.

Logical independency

In logical independency, changes to the conceptual schema such as addition and deletion of entities, addition and deletion of attributes, addition and deletion of relationships must be possible without change in external schema.

 

What are the Functions or services of DBMS ?

The functions and services of DBMS are

Data storage Management

DBMS creates the structure for database in the physical storage devices. It provides a mechanism for permanent storage of dat.

Data Manipulation Management

     The DBMS provides ability to add new data into the database (or) retrieve, update and delete existing data in the database.

Data Definition Management

The DBMS creates the structure of data in which the data is stored.

Data dictionary

The DBMS provides a data dictionary in which stores the description of data items.

Authorization

     The DBMS protects the database against unauthorized access either intentional (or) accidental.

Backup and recovery

     The DBMS provides a mechanism for Backup data periodically and recovery from different types of failures.

Concurrency control  

The DBMS supports sharing of data among multiple users. The DBMS provides a mechanism for concurrent access to the database.

Transaction Management

The transaction in a series of database operations, which access (or) changes the content of the database. This is done by the transaction management.

Data Independency Service

The DBMS supports the independency of the programs from its structure of the database.

Integrity Service

The DBMS provides integrity service to store the data into the database (or) to change the data into the database follows certain rules.

 

What are the Functions database Administrator ?

         Database administrator is an individual person with an overview of one (or) more databases and also controls the design and use of database.

          Functions and responsibilities of DBA are

Defining conceptual schema and database creation

The DBA creates the conceptual schema such as defining entities and attributes, deleting entities and attributes and modifying entities and attributes etc. The DBA also creates the structure of the database.

storage structure and access method definition

The DBA defines the storage structure of the data. And access methods of the database.

Granting authorization to the user

The DBA grants the access to use the database to its users. The authorization information is kept is a system, the database system consults whenever someone attempt to access the data in the system.

Routine maintenance

The DBA maintains periodically backups of the database either on hard disc (or) CD to prevent loss of data in case of failure.

Job monitoring

The DBA is responsible for the performance of data is not decreased.

What are the various components of Database Systems  ?

       The database system is composed of the five major components.

Hardware

Software

People

Producers

Data

 

 

Hardware: Hardware refers to physical components of the system.

      E.g. storage devices, printers etc.

Software: software is a set of programs. To make database system function fully, three types of software’s are needed. They are

      a.  Operating System Software

      b.   DBMS software.

      c.   Application Programs and utility software.

       a. Operating system software:  operating system manages all hardware components and run other software’s on the computer.

      E.g. WINDOWS, LINUX

      b. DBMS Software:   manages the database within the database system.

      E.g.   Oracle, SQL, MS Access

      C. Application programs and utility software:  Application programs are used to access and manipulate data to generate reports and making decisions.

Utilities are the software tools used to help, manage the database systems computer components.

People: There are five types of users in the database system.

System Administrator: to see the database systems general operation.

Database Administrator: see the database is functionality properly.

Database designers: design the database structure.

System analyst and programmers: Design and implement the application programs.

End user: use the application programs to run the organization daily operations.

Procedures:  procedures are the set of rules based on design and use the database.

Data:  The data is a facts stored in the database. Because data are the raw material from which the information is generated.

What is Data?

          A system consists of interrelated entities, each entity has a set of attributes of entities of the system.

What is Information ?

          Information is nothing but processed data.

Define Meta data

          Meta data is the data about the data i.e., information for accessing the data.

Explain the Terminology of a file

Field: A field is the lowest level of data item of an entity which is alternatively called as an attribute of that entity.

        Emp               Empno        Empname         Empaddress

Record: Record is the collection of fields (or) attributes of an entity.

                               Empno        Empname         Empaddress

                                   1               sweaty              banglore          

File: File is a collection of records having same set of fields arranged in the same sequence.

                               Empno        Empname           Empaddress

                                    1             manimala             Chennai

                                    2                priya                Hyderabad

Key field (or) Primary key: A key field is said to be key field (Or) primary key if it can identify a record uniquely in a file.

 

e.g.         student no in student file

               emp no in emp file.

Non key field (or) secondary key: A field is said to be Non key field (or) secondary key if it cannot identify a record uniquely in a file.

e.g.           student name in student file.

                 Emp name in emp file.

Schema: it is a overall view of all the files in the database.

Subschema: A portion of the schema of a database is called as subschema.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

CHAPTER-2

Data Models

Data Modeling and Data Models:

Data model:

A data model is a collection of concepts that can be used to describe the structure of a database

Data modeling in the first step in designing a database refers to the process of creating a specific data model for a problem.

        A model is an abstraction of a real world object. A data model represents data structures and their characteristics, relations, constraints and transactions.

        Data model is an iterative process we start with a simple understanding of the problem increases, and finally design a database in a specific database model.

Importance of Data Models:

Data Model can facilitate interaction among the designer, the application programmer and the end user.

Applications are used to transform data into information. But data are viewed in different ways by different people.

For e.g. the manager and clerk both are working in the same company, the manager have wide view of company data than the clerk.

A company president has universal view of data.

Different Managers views data differently in a company. The inventory manager is more concerned about inventory levels, while purchasing manager concerned about items and supplies.

Application programmers have another view of data i.e., concerned with data locations and formatting.

A house is a collection of roots, if someone is going to build a house, they have the overall view i.e., provided by blue print. A sound data environment requires an overall database blue print based on appropriate data model.

When a good database blue print is available, an application programmer view of data is different from the managers and end users. When a good database blue print is not available problems are likely to ensure.

Data Model basic building blocks:   

         The basic building blocks of data models are entities, attributes, relationships and constraints. An entity represents a real world object person (or) place.

 For e.g., a customer entity have different of customers.

 An attribute is a characteristic of an entity.

 For e.g. customer entity have attributes customer_no, customer_name, customer_address etc. A relationship describes an association between entities. Data models use three types of associations. One-to-many, many-to-many and one-to-one.

One-to-many (1:M, 1…..*): A painter paints many different paintings. Therefore, the database designer label the relationship PAINTER  PAINTS  PAINTINGS as one-to-many.

Many-to-many (M:N, *…….*):  An employee may learn many job skills and each job skill may be learned by many employees.  Therefore, the database designer label the relationship Employee learns skills as many-to-many (M:N).

One-to-one (1:1, 1…….1):  Each store manager manages only a single store. Therefore, the data designer label the relationship employee manages stores as one-to-one (1:1).

  By using the business rules we can properly identify entities, attributes, relationships and constraints.

Business rules:

  A business rule is a description of a policy, procedure (Or) principle within a business organization. Examples of business rule.

A customer may generate many invoices.

A training session cannot be scheduled for fever than 10 employees or for more than 30 employees.            These business rules establish entities, relationships and constraints.

       The first business rule establishes two entities (customer, invoices) and a one-to-many relationship between these two entities.

            The 2nd business rule establishes a constraint. (No fewer than 10 people (or) more than 30 people) and two entities (training, people) and a relationship between employee and training.

Discovering business rules:

      The main source of business rules are company manager, policy manager, department manager and written documents such as company’s procedures, standards (or) operation manuals. A faster direct source of business rules is direct interviews with the concerned persons.

Translating business rules into Data Model:

 General rule: A noun in a business rule be translate into an entity in that model, and a verb associating nouns will translated into a relationship among the entities.

For e.g. the business rule      customer may generate many voices “

       Containing two nouns (customer and invoices) and a verb (generate) that associates the noun.

       To proper identify the type of relationship, the relationships are bi-directional.

       For e.g. the business rule    “ A customer may generate many invoices”, the relationship is one-to-many (1:M, 1…….*). Customer is the 1 side and invoice is the many side.

Evolution of Data Models: 

Generation

 

Time

Model

Example

First

 

1960-70

File system

VMS

Second

 

1970

Hierarchical and network data models.

IMS, focus IDMS

Third

 

Mid 1970’s-present

Relational data model

M.S.Acess, Oracle

Fourth

 

Mid 1980’s-present

Object Oriented Model

 

Extended Relational Model

Versant

 

Objectivity

Fifth

 

Present-Future 

XML

Oracle log

 

Hierarchical Data Model: 

The hierarchical data model is the oldest type of data model, developed by IBM in 1968. This data model organizes the data in a tree-like structure, in which each child node (also known as dependents) can have only one parent node. The database based on the hierarchical data model comprises a set of records connected to one another through links. The link is an association between two or more records. The top of the tree structure consists of a single node that does not have any parent and is called the root node.

The hierarchical data model consists of a set of nested relationships one-to-many and one-to-one association.

In hierarchical data model the relations are presented in the form of tree-structure in which the root segment is kept at the top and further branches emanate downwards from the root segment.

In this model the type of association can be one-to-one and one-to-many. This means that many-to-one association is not permitted. This is equivalent to say that multiple percentages for a child segment is not permitted.

    The above conceptual data model can be mapped into any one ways as shown below.

       An alternative1 student file is kept at the root segment of the tree and the faculty file is kept at the bottom of the tree. By mapping the conceptual data model into the hierarchical data model the following facts are observed.

The association from student to enrollment is one-to-many. This mapped without any modifications.

The association from enrollment to subject is many-to-one. This is not permitted in hierarchical data model. Hence it is modified into one-to-one association.

The association from subject to faculty in many-to-one. This is not permitted in hierarchical data model. Hence it is modified into one-to-one association.

              In alternative1 while mapping the conceptual data model into hierarchical data model, the many-to-one association presents at two levels are modified into one-to-one association. These modifications will increase the data redundancy.

              In alternative2 the faculty file is kept at the root of the tree and student file is kept at the bottom of the tree. While mapping the conceptual data model the following facts are observed.

The association from faculty to subject file is one-to-many. So it is mapped without any modifications.

The association from to subject enrollment is many-to-one. This is not permitted in hierarchical data model. Hence it is modified into one-to-one association.

The association from enrollment to student is many-to one. This is not permitted in hierarchical data model. Hence it is modified into one-to-one association.

         Finally which alternative has less redundancy should be selected for implementation.

         In alternative2, the association change between enrollment and student. That means we are changing one type. When we compare alternative2 with alternative1, alternative2 has less redundancy and it is implemented.

Advantages:

1. It promotes data sharing.

2. Parent/Child relationship promotes conceptual simplicity.

3. Database security is provided and enforced by DBMS.

4. Parent/Child relationship promotes data integrity.

5. It is efficient with 1:M relationships.

Disadvantages:

Complex implementation requires knowledge of physical data storage characteristics.

Changes in structure require changes in all application programs.

There are implementation limitations (no multi parent or M:N relationships).

There is no data definition or data manipulation language in the DBMS.

There is a lack of standards.

 

Network Data Model:

      A Network data model consists of a set of pair wise association between the entities.

      The Network data model was created to improve database performance, database standards and represent complex relationships effectively than the hierarchical data model.

       To establish database standards, the conference of database system languages (CODASYL) created the database task group (DBTG). The DBTG define standard specifications for database creation and data manipulations.

Schema: The schema provides overall view of the database to the administrator.

Sub Schema:  The sub schema which defines the portion of the database seen by the application programs.

Database Management Language:  That defines the environment in which data can be changed. The DBTC specify 3 DML components.

A schema data Definition Language (DDL), which enables the data base administrator to create the database.

A subschema DDL, which allows the application programs to define database component that will be used by the application.

A Data Manipulation Language, to manipulate the data in the database.

              In Network data model, the Network database as a collection of records in one-to-many record to have more than one parent.

              In Network database, a relationship is called a set. Each set contains two entities one entity is owner and other entity is member.

 

 

 

Sets

Set Name

 

Owner

Member

Ordered

Customer

Sales

Buyers

Book

Sales

Contribute

Author

Book

Catalog

Publisher

Book

 

Advantages:

1. Conceptual simplicity is at least equal to that of the hierarchical model.

2. It handles more relationship types, such as M:N and multi-parent.

3. Data access is more flexible than in hierarchical and file system models.

4. Data Owner/Member relationship promotes data integrity.

5. There is conformance to standards.

6. It includes data definition language (DDL) and data manipulation language (DML) in DBMS

Disadvantages:

1. System complexity limits efficiencystill a navigational system.

2. Navigational system yields complex implementation, application development, and management.

3. Structural changes require changes in all application programs.

Relational data model: 

          The relational model was introduced in 1970 by E.M.Codd. The foundation of relation is a mathematical concept known as relation. The Relation is composed of intersecting rows and columns. Each row in a relation represents a tuple. Each column represents an attribute.

           The relational data model is implemented through a Relational Database Management System (RDBMS).

            Tables are related through the sharing of common attribute. For e.g. the table agent and customer as shown below.

Agent

Agent_Code

Agent_Name

Agent_Address

Agent_PhoneNo

Agent_Area code

 

 

 

 

 

 

 

 

 

 

 

Customer

Cust_No

Cust_name

Cust_Address

Phone_No

Agent_Code

 

 

 

 

 

 

 

 

 

 

By matching the Agent_Code in the customer table with Agent_Code in the Agent table we can find Agent details of that customer.

       The relationship types one-to-one, one-to-many and many-to-many have in a relational data model.

      A relationship diagram is a representation of entities, the attributes within the entities and the relationship between the entities.

                            Agent                                                         Customer

 Agent_Code

 Agent_Name

Agent_Address

Agent_PhoneNo

Agent_Area code

Cust_No

Cust_name

Cust_Address

Phone_No

Agent_Code

 

 

 

 

 

 

In the above diagram, the relationship is one-to-many. The symbol ∞ indicates many.

           The customer represents “many” sides, because an AGENT can have many CUSTOMERS.

           The AGENT represents the “1” side because each CUSTOMER has only one AGENT.

           The languages which are supported to relational data model is powerful and flexible. Because of that the relational data model is popular. FoxPro, database, M.S.Acess, SQL are relational database software’s. This software’s allows the user to specify what must be done without specifying how it must be done.

          SQL based database applications involves 3 parts. 1. End user interface.  2. Set of tables stored in the database. 3. SQL Engine.

 

 

 

Advantages:

1. Structural independence is promoted by the use of independent tables. Changes in a tables structure do not affect data access or application programs.

2. Tabular view substantially improves conceptual simplicity, thereby promoting easier database design, implementation, management, and use.

3. Ad hoc query capability is based on SQL.

4. Powerful RDBMS isolates the end user from physical-level details and improves implementation and management simplicity.

Disadvantages:

1. The RDBMS requires substantial hardware and system software overhead.

2. Conceptual simplicity gives relatively untrained people to use a good system poorly, and if unchecked, it may produce the same data anomalies found in file systems.

3. It may promote islands of informationproblems as individuals and departments can easily develop their own applications.

Entity relation model:

          Peter Chen first introduced the E.R.data model in 1976; it was the graphical representation of entities and their relationship in a database.

           E.R. models are normally represented in an entity relationship diagram.

 The E.R.Model is based on the following components.

Entity: entities are the real time objects. Entities represented by a rectangle.

e.g. painter, employee, skills, noun.

Attribute: Attributes are the characteristics of entities.

e.g.   Empno, Empname, Empaddress etc.

Relationships: A relationship describes association among the entities. There are three types of relationships, one-to-many, many-to-many and one-to-one.

There are two types of ER notations.

Chen notation

Crow’s foot notation.

              For different types of relationships.

 

CHEN NOTATION

 

         

 

                  

Crown’s Foot Notation

 

 

        In Chen notation, entities are represented rectangle and entity names are written in the capital letters at the centre of the rectangle. Relationships are represented by a diamond. The diamonds are connected to entities through a relationship name is written inside the diamond.

        In the crows foot notation, the crow foot is derived from the three pronged symbol used to represent many relationships. In this notation, the one represented by a short line segments, and many is represented by the crow’s foot. The relationship name is written above the relationship line. The relationships are also show in vertical.

 

 Advantages:

1. Visual modeling yields exceptional conceptual simplicity.

2. Visual representation makes it an effective communication tool.

3. It is integrated with dominant relational model.

Disadvantages:

1. There is limited constraint representation.

2. There is limited relationship representation.

3. There is no data manipulation language.

4. Loss of information content occurs when attributes are removed from entities to avoid crowded displays. (This limitation has been addressed in subsequent graphical versions)

 

Object Oriented Model:

         In the Object Oriented Data Model (OODM) both data and their relationships are contained in a single structure known as an Object.

         Object Oriented Data Model has allowed an object, the object contains operations that can be performed on it, such as changing data values, finding a specific data value, and printing data values.

The OODM is based on the following components.

An object is an abstraction of a real world entity.

Attributes describes the properties of an object.

E.g.   Person object contains the attribute name, social security number, date of birth etc.

A collection of similar objects contains attributes and methods. By using those methods change data values, find data values and print data values in the objects.

Classes are organized in a class hierarchic key. The class hierarchic key is similar to upside down tree. In which each class has only one parent.

One of the properties of object oriented data model is inheritance. By using the inheritance we can inherit attributes and methods from super class to sub classes.

E.g. customer and employee sub classes of the person super class. Customer and employee will inherit all attributes and methods from person.

Object oriented data models are drawn using unified modeling language (UML) class diagram. The UML class diagrams are used to represent data and other their relationships.

For E.g. Let us use invoice program. In this case the invoices are generated by customer, each invoice contains one (or) more lines, and each line represents an item purchased by customer.

    The following diagram shows an object representation UML class diagram, ER model

for invoice.  

 

 

 

 

 

 


 

 

 

 

 

 

The object representation of invoice includes all the related objects within the same object.  The 1 next to the customer object indicates that each invoice related to one customer. The M next to the line object indicates that each invoice contains no. of  lines.

       The UML class diagram uses 3 separate classes (customer, invoice and line) and two relationships to represent this problem.

       The E.R Model also uses the 3 separate entities and two relationships to represent the invoice problem.

Advantages:

1. Semantic content is added.

2. Visual representation includes semantic content.

3. Inheritance promotes data integrity.

Disadvantages:

1. Slow development of standards caused vendors to supply their own enhancements, thus eliminating a widely accepted standard.

2. It is a complex navigational system.

3. There is a steep learning curve.

4. High system overhead slows transactions

 

 

 

 

 

 

 

CHAPTER-3

Relational Database Model

 

In Relational Data base model records are stored into tables. Relational data model is easier to understand than the hierarchal data models and network data models. Relational data model provides a logical view of the data and its relationship among other data.

Tables and Characteristics:

          A Table is composed of rows and columns. Each row of the table is called as tuple. Each column of the table is called as attribute. A table is also called as Relation. A table contains a group of related entities.

Characteristics of a Table:

A table is composed of rows and columns.

Each row of the table represents one entity (tuple) in the entity set.

Each column represents an attribute and each column has distinct name.

Each cell represents a single value.

All values in a column must have same data format.

Each column has a specified range of values which is called domain.

The order of the rows and columns is immaterial to the DBMS.

Each table must have an attribute or group of attributes that uniquely identified each row.

The following Student table shows above characteristics.

 

 

The student is composed of 8 tuples (rows) and 6 attributes (columns).

In the student table the primary key is STU_NUM(student number) by using this attribute we can identify a record uniquely in a student table.

Keys

 

Key field or Primary Field:

    A key consists of one or more attributes that determines other attributes. For example, an invoice number identifies attributes such as invoice date, customer details, items details and amount.

          The statement “A DETERMINES B” indicates that if we know the value of attribute A then determine the value of B.

Eg: In the student table if we know the value of student_number then determines student last name, st_fname, stu_inital. This can be represented in the following way.

STU_NUMàSTU_LNAME, STU_FNAME, STU_INIT.

Functional Dependency:

    The term Functional Dependency can be defined for “A DETERMINES B”, if each value in a column “A” determines only one value in column B.

 

Eg: STU_NUM functionally determines STU_LNAME (or) STU_LNAME not functionally depends on STU_NUM).

STU_NUMà STU_LNAME.

 

Composite Key or Fully Functional Dependency:

 

    A key may be composed of more than one attribute; such a multi attribute key is known as Composite key.

Eg: STU_LNAME, STU_FNAME, STU_INIT, PHONE_NOà STU_CLASS,STU_HOURS.

    The combination of STU_LNAME, STU_FNAME, STU_INIT, PHONE_NO can determine the STU_CLASS, STU_HOURS. If the attribute “B” is functionally depends on “A”. composite key “A”, but not on any subset of composite key. The attribute “B” is fully functionally depends on “A”.

 

Super Key :A sub set of Attributes that uniquely identifies a tuple(row) in a relation(table).

Eno

Ename

Salary

Dept_no

Voter_Id

1

Raju

20000

10

V12345

2

Prasad

40000

10

V12222

3

Raju

20000

20

V45666

{Eno } :No two rows have same Eno ( Eno uniquely identifies a tuple(row) in a relation)

{Ename } : Two employee’s may have same name.

{Voter_id} :No two rows have same Voter_id (Voter_id uniquely identifies a tuple(row) in a relation)

{Eno, Ename }: Eno itself uniquely identifies a tuple(row) in a relation, hence combination of Eno and Ename also uniquely identifies a tuple(row) in a relation

Eg : In a STUDENT table

STUDENT{STU_NUM,STU_LNAME,STU_FNAME,STU_INIT,STU_DOB,STU_HRS,STU_CLASS}

Super Keys can be identified as fallows.

{STU_NUM}

{STU_NUM, STU_LNAME}

{STU_NUM, STU_LNAME, STU_INIT}

*Minimal Super Key (Key) :

Definition : A Minimal Super Key (Key) K is a superkey with the additional property that removal of any attribute from K will cause K not to be a superkey any more.

{Eno } : is Minimal Super Key  ( A Super Key which have only one attribute is Minimal Super Key)

{Voter_id} : is Minimal Super Key

{Eno, Ename }:  is Not a Minimal Super Key ( Removal of Ename from {Eno, Ename} = {Eno} is also a Super Key } hence {Eno, Ename} is not Minimal Super Key.

Candidate Key :

Definition :  If a relation schema has more than one key (Minimal Super Key) then each of them is called as candidate key.

One of the candidate keys is arbitrarily designated to be the primary key, and the others are called secondary keys(or Alternative key).

A key formed by combining at least two or more columns is called composite key

Primary Key :

Definition : Set of attributes of a relation which uniquely identifies a tuple in a relation.

Note :

A Relation(table) can have many Superkeys, and also many Minimal Superkeys.

If a Relation(table) has more than on Minimal Superkeys each can be called as Candidate Keys.

One of the candidate keys is arbitrarily designated to be the primary key, and the others are called secondary keys(or Alternative key).

    Key Hierarchy

Primary key doesn’t allows duplicates and Null Values.

 

 

 

 

 

 

Foreign Key:

    A foreign key is an attribute whose values match the primary key values in the related table.

Foreign Key (Referential Integrity Constraint) :

Referential Integrity Constraint is used to specify interdependencies between relations. This constraint specifies a column or list of columns as a foreign key of the referencing table.

A foreign key means the values of a column in one table must also appear in a column in another table. The foreign key in the child table will generally reference a primary key in the parent table. The referencing table is called the child table & referenced table is called the parent table

Self Referential Integrity Constraint mean a column in one table references the primary key column(s) in the same table.

Eno

 

Ename

Salary

Dept_no

(FK)

Voter_Id

1

raju

20000

10

V12345

2

ravi

40000

10

V12222

3

Raju

25000

20

V45666

EMPReferencing relation

 

 

 

 

 

 

 

 

In EMP table Eno is Primary Key. (Duplicates and Null values are not allowed in Eno)

In EMP table Dept_no is foreign key which references DEPT  table Dept_no column. (A value for Dept_no in EMP table accepts only if it exists in Dept_no column in DEPT table.)

 

 

DEPT                    Referenced relation

Dept_no

(PK)

Dname

Dloc

10

MTech

BVRM

20

MBA

HYD

30

MCA

BVRM

In DEPT table Dept_no is Primary key.

Secondary Key:

The secondary key is defined as a key that is used to for data retrieval purpose.

Example: In the customer table the data retrieval can be facilitated when CUST_LAST and CUST_PHONE number are used.

Integrity Rules:

Integrity rules are used in the database design.

Entity Integrity: All primary key entries are unique and no part of the primary key may be NULL.

Example: In Agent table the agent_code is primary key and this column is free from null values.

Referential Integrity: A foreign key is an attribute whose values match the primary key values in the related table.

Example: The vendor_code is the primary key in the vendor key and it occurs as a foreign key.

NOT NULL: NOT NULL constraint can be placed in a column while inserting a row that column must have a value.

Unique: Unique constraint can be placed in a column while inserting a row that column have unique values. (No duplication).

Relational Set Operators:

The relational set operators are SELECT, PROJECT, JOIN, INTERSECT, UNION, DIFFERENCE, PRODUCT, DIVIDE.

UNION:

The UNION operator combines all rows from two tables, excluding duplicate rows. The tables must have the same structure.

                                                   Product 1                                                                              

                                            Product 2

 

         Query:             Product 1 union product 2

 

INTERSECT:

The INTERSECT operator gives only the rows that are appear in both tables. The tables are also have same structure.

DIFFERENCE:

The DIFFERENCE operator gives all rows in one table that are not found in other table.

  STUDENT  EMPLOYEE

 

QUERY: STUDENT MINUS EMPLOYEE

 

PRODUCT:

The PRODUCT operator gives all possible pair of rows from two tables.

SELECT:

The SELECT operator gives all rows that satisfies a given condition.

         

 

                    SQL> Select product where P_CODE=311452.

 

PROJECT:

The PROJECT operator gives all values for selected attributes. In other words project gives a vertical subset of tables.

                  

 

Project  P_DESCRIPT Product;

                  

 

JOIN: The JOIN operator combines rows from 2 or more tables. There are several types of joins.

 

Natural Join

A Natural Join joins tables by selecting the rows with common values in their common attributes. A natural join is the result of a three-stage process:

1). First, a PRODUCT of the tables is created, yielding the results shown in Figure 3.12.

 

 

 

   Query: SQL> Product NATURAL JOIN Vendor

 

 

 

2). Second, a SELECT is performed on the output of Step a to yield only the rows for which theAGENT_CODE values are equal. The common columns are referred to as the join columns.

 

3). A PROJECT is performed on the results of Step b to yield a single copy of each attribute, thereby eliminating duplicate columns.

 

The final outcome of a natural join yields a table that does not include unmatched pairs and provides only the copies of the matches.

Equijoin:

In Equijoin the tables on the basis of equality condition that compares specified columns of each table. In Equijoin the comparison operator Is Equal To is used in the condition.

                                             Or

Inner join produces only the set of records that match in both Table A and Table B.

Product:

Prod_code

Prod_Descp

Vend_code

123

tyres

V101

124

tubes

V102

125

Bolts

-------

 

Vendor:

Vend_code

Vend-name

V101

ravi

V102

ram

V103

krishna

 

Result:

Prod_code

Prod_Descp

Vend_code

Vend-name

123

Tyres

V101

ravi

124

Tubes

V102

ram

 

 

  Outer Join:

In the Outer Join the matched pair of records would be written and any unmatched values in other table would be NULL.

 

Left Outer Join:

The Left Outer Join matched the records would be return and any unmatched values in the other table would be NULL.

 

 

Right Outer Join:

In Right Outer Join the matched records would be and any unmatched values in the right table would be NULL.

 

Full Outer Join:

Full outer join produces the set of all records in Table A and Table B, with matching records from both sides where available. If there is no match, the missing side will contain null.

 

Product:

Prod_code

Prod_Descp

Vend_code

123

tyres

V101

124

tubes

V102

125

Bolts

-------

 

Vendor:

 

Vend_code

Vend-name

V101

ravi

V102

ram

V103

krishna

 

Result:

 

Prod_code

Prod_Descp

Vend_code

Vend-name

123

Tyres

V101

ravi

124

Tubes

V102

ram

125

Bolts

----------

-----------

---------

----------

V103

krishna

 

 

DIVIDE:

The DIVIDE operator uses one single column table as a deviser and two column table as the dividend. The output of DIVIDE operator is a single column with a values column-A from the dividend table rows where the values of the common column in both tables match.

Data Dictionary:

The Data Dictionary provides a description of all tables in the database. The Data Dictionary contains attribute names and characteristics of each table in the system. The data dictionary contains meta data.

Example:

System Catalog:

    Like the Data Dictionary, System Catalog contains metadata. The system catalog describes table names, table creator, and creation of data, Number of column in each table, the data types of the column, authorized users and access privileges.

Relationships within Relational Databases

The relationships are classified into One-to-Many, One-to-One and Many-to-Many.

One-to-Many (1:M) relationship

          One entity of one table is associated with number of entities into other tables.

Consider the “Painter Paints Paintings” the ER Model and implementations are shown below.

 

         

One-to-One

In this relationship one entity of one table is associated with one entity of other table and vice-versa.

Consider the PROFESSOR AND DEPARTMENT. The ER data model and implementation are shown in below.

May-to-Many (M:M) relationship

    In this relationship each and every entity of one file will be associated with one or more entities of another  tables and vice versa. This relationship is not supported in relational environment. The many to many relationship can be converted into two One to Many relationships.

Consider student and subjects example. The ER model and implementation are shown in below.

 

In above each and every entity of student file is associated with one or more entities of the subject table because each student will opt one or more subjects in a semester.

Each and every entity of subject file is associated with one or more entities of student table because each sub will be opted by more than one student in a semester. Many to Many associations are not supported.

Hence the relationship will be converted into two one to many associations are as shown below by introducing an intermediate table in which the common data of the original file are stored.

Index:

          An Index is composed of and index key and set of points. Each key points to the location of data identified by the key.

Example: Suppose we want to look up all of the paintings created by the given painter without an index, we must read each row in the painting table. If we index the painter table and use the index key of painter number, we look up appropriate painter number in the index and find the matching pointers.                  

             DBMSs use indexes for many different purposes. You just learned that an index can be used to retrieve data more efficiently. But indexes can also be used by a DBMS to retrieve data ordered by a specific attribute or attributes.

             For example, creating an index on a customer’s last name will allow you to retrieve the customer last name in alphabetical ordered.

 Indexes play an important role in DBMS for implementation of primary keys. When we define a table primary key the DBMS automatically creates an unique index on the primary key column.

When we declare the customer-code to be the primary key of the customer table, the DBMS automatically creates a unique index on that attribute.An unique index is an index in which the index key can have one pointer.

Codd`s Relational Database Rules

          Dr E F Codd published a list of 12 rules to define a relational database.

Rule 1: Information:

All information in a relational database must be logically represented column values in rows with tables.

Rule 2: Guaranteed Access:

Every value in a table is guaranteed to be accessible through a combination of table name, primary key and column name.

Rule 3: Systematic Treatment of NULLs:

NULL must be represented and treated in a systematic way (Independent of data type).

Rule 4: Dynamic online catalog based on the relation:

The metadata must be stored as ordinary data in a table within the database. Such data must be available to authorized users.

Rule 5: Comprehensive data sub language:

The relational database may support many languages. However it must support data definition, view definition, data manipulation integrity constraints, authorizations and transaction management.

Rule 6: View Update:

Any view i.e., theoretically must be updatable through the system.

Rule 7:  High level Insert, Update and Delete:

The database must support insert, update and delete.

Rule 8: Physical data independency:

Application programs are logically unaffected when storage structures are changed.

Rule 9: Logical data independency:

Application programs are logically unaffected when changes are made to the table structures.

Rule 10: Integrity Independency:

All relational integrity constraints must be definable in the relational language and stored in the system catalogs.

Rule 11: Distribution Independency:

The end users and application programs are unaffected by the data locations.

Rule 12: Non Sub Version:

If the system support low level access to the data, there must not be a way to bypass the integrity rules of the database.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

UNIT-2    CHAPTER-4

Entity Relationship Modeling

The Entity Relationship modeling forms an Entity Relationship diagram. The ERD represents the conceptual database. The Entity Relationship diagram contains mainly three components. Entity, Attribute and Relationship.

Entity: An Entity represents a real world object.

Eg: Student, Customer, Employee, Subject, Faculty, and Product

In chen and crow foot notation an entity is represented by rectangle containing entity name. The entity names usually return in capital letters.

Attributes: An Attribute represents characteristics are properties of an entities.

Eg: stu_name , stu_address

In chen notation the attributes are represented by Vowels and are connected to the entity rectangle with a line .Each vowel contains name of the Attribute.

 

 

In the cross foot Notation the attributes are written in the attribute box below the entity rectangle.

Required and Optional Attribute:

          A Required is an Attribute that must have a value. An Optional Attribute is an Attribute that does not have a value.

In the Crows foot Notation the required Attributes are represented by bold face.

Domain: Attributes have a domain. A domain is the set of possible values for a given Attributes.

Eg: The domain for the gender attribute consists of only two possibilities namely male and female.

Identifiers: Identifiers contains one  attributes that uniquely identify an entity in the entity set. In relational model, such identifiers are mapped to primary keys in the table. Identifiers are undermined in the ER diagram.

For Eg.  STUDENT(Sto_no, Stu_fname, Stu_Lname, Stu_email);

Composite Identifier: A Composite Identifier is a primary Key contains more than one attribute.

For Eg. To identify the each student entity by using a composit primary key composed of the combination of Stu_Lname and Stu_Fname instead of using Stu_no.

Composite and Single Attribute: Attributes are classified as single Attribute or Composite Attribute.

A Single Attribute is an attribute that cannot be sub divided.

Eg: Age, F_name , Gender.

A Composite Attribute is an Attribute that can be further sub divided which gives additional Attributes.

Eg: Address can be sub divided into street , city and pincode.

Single Attribute: A Single Attribute ia an attribute that can have only one value.

Eg: STU_Number in the STUDENT TABLE.

Multi Value Attribute: A multi value Attribute is an attribute that can have many values.

Eg: A car color may be sub divided into many colors.

In Chenn notation , the multivalued attributes are shown by a double line connecting to the attribute to entity.

 

Note:  The crows foot notation does not identify multivalued attributes.

         

Derived Attribute: A Derived Attribute is an attribute whose values are calculated from other Attributes.

Eg: An Employee age computing difference between current data and the employee date of birth.

A  Derived Attribute is indicated in a chen’s Notation by a dashed line by the connecting to the attribute and entity.

Relationship: The Relationship is an Association between entities.

Eg: A professor teaches a class.

There are three different types of relationships

One- One

One- Many

Many-Many

 Connectivity and cardinality:

          Cardinality Express the minimum and maximum number of entities occurrences associated with one occurrence of the related entity.

          In the ERD the cardinality is indicated by placing the appropriate numbers beside the entities using the format(x, y).

          The first value represents the minimum number of associated entities. The second value represents the maximum number of associated entities.

For Eg:

 

The cardinality (1,1) indicates that each class is taught by only one professor.

          The cardinality (1,4) indicates one entity in the professor  relation associate not more than four times in the entities of the class Relation.

Existence Dependency: An Entity is said to be existence dependency when the entity is associated with another related entity.

Weak Relationship

A weak relationship exists if the primary key of the entity does not contain component of the parent entity.

COURSE(CRS_CODE, DEPT_CODE, CRS_DESCRIPTION, CRS_CREDIT)

CLASS(CLASS_CODE, CRS_CODE, CLASS_SECTION, CLASS_TIME, ROOM_CODE, PROF_NUM)

          The Relationship between course and class is weak because the class_code is the primary key in the class entity while the course_code in the class is foreign key. In this example the class primary key did not inherit the primary key components from the course entity.

Strong Relationship           

 A Relationship exist if the primary key of the related by contains a primary key component of the parent entity. The relationship between course and class is strong because the class entity composite entity key is composed of class_code+ Crs_code. The class primary key inherit the primary key component from the course entity. In the strong relationship we can write ‘o’ symbol next to the entity.

Weak Entity

In Weak Entity the primary key is primary key is partially as totally derived from the parent entity in the relationship .For example: The dependent entity key was inherited from the employee entity as show below.

Employee (Emp_no, Emp_LName, Emp_FName,Emp_init.Emp_dob).

Dependent (Emp_no, Dept_no, Dept_LName,Dept_FName, Dept_dob).

In chen Notation the Weak entity is represented by double weak entity.

Strong Entity

In Strong entity the primary key has not partially or totally derived from the parent entity in the relationship. For example in the course and class relationship the class table primary key is class code which is not derived from the course parent key. The entity class is a strong entity.

Class(class_code, course_code,class_desc,class_time, prof_code)

In the Course and Class relationship is composed of class_code and course_code is derived from the course parent entity. The entity class is a weak entity.

Class(class_code, course_code, desc,class_time, prof_code)

Relationship Participation:

The Relationship Participation between the entitiesis either optional or mandatory.

Optional participation

Optional participation means that one entity occurrences does not require a corresponding entity occurrence in a particular Relationship.

Eg: “course generate class” relationship an entity occurrence in the course table does not necessary require the corresponding entity occurrences in the class table.

In the crows foot notation an optional relationship between entities is shown by drawing a circle on the side of optional entity. The minimum cardinality is zero for the optional entity.

   

HEAR CLASS IS OPTIONAL TO COURSE

Mandatory participation means that one entity occurrence requires a corresponding entity occurrence in a particular relationship.

Eg 1:  “course generate class” relationship an entity occurrences in the course table necessary require the corresponding entity occurrences in the class table.

In the crows foot Notation there is no separate symbol for the mandatory entity. The minimum cardinality is one for the mandatory entity.

          (HEAR COURSE AND CLASS IS MANDATORY RELEATIONSHIP)

Eg 2: The “Professor teaches class” relationship it possible for a professor not to teach a class. Therefore class is optimal to the professor on other hand a class a class must be taught by a professor. Therefore the professor is mandatory on other hand a class must be taught by a professor. Therefore the professor is mandatory a class.

The cardinality (1,1) represents one class is taken by one professor. The cardinality (0,3) indicates the professor may teach no classes or theory classes.

The following table shows various cardinalities that are supported by crow’s foot notation.

Relationship Degree

A Relationship Degree indicates the number of entities associated with a relationship.

A Unary Relationship: A Unary Relationship exists when an association is maintained within a single entity.

Eg: An Employee within the employee entity is the manger for one or more entities within that entity.

Binary Relationship: A binary relationship exists when two entities are associated.

Eg: The relationship a professor teaches one or more class.

Ternary Relationship: A ternary relationship exists when three entities are associated.  

Eg:    A doctor writes one or more prescription

         A patient may receive one or more prescription

         A drug may appear one or more prescription    


Recursive Relationship: A Recursive Relationship is one in which a relationship between the same entity set.

There are three types of Recursive Relationships

One- One

One- Many

Many-Many

One to One: A One to One unary relationship may be expressed by an employee may be married to one and only one other employee.

One to Many: A One to Many unary relationships may be expressed by an employee may manage many employees.

Many to Many: The Many to Many relationship may be expressed by a course may be pre requisite to many other courses.

 

Associative Entities: The associative Entity is used to implement a many to many relationship between entities. This associated entity is composed of the primary key of each of the entites to be connected.

Example: The Crown foot notation the relationship between the parent and child entities indicates the strong relationship.

 

 

Developing an ER diagram:

          The process of database design is an Iterative process rather than a sequential process. Building an ER diagram, usually involves the following Activities.

Create a detailed description of operation of the organization.

Identify the business rules from the description.

Identify entities, Attributes and relationship from the business rules.

Developing the initial ERD.

Identify the attributes and primary keys.

Revise and review the ERD

During the review process additional objects, attributes and relationships will be covered. Therefore the basic ERD will be modified to incorporate the newly discovered components.

Eg: Let us with initial interviews with the tiny college administrator and the interview process gives the following business rules.

Tiny college is divided into several schools a school of business, a school of Arts and Science, a school of education a school of applied sciences, each school is administrated by a deal who is a professor.

Each school is composed of several departments.

Each department may offer several courses.

Each course may operate several classes.

Each department may have professors one and only one of those professors chairs the department and no professor is required to accept the chair position.         

Each professor may teach the classes. A professor may not teach the class.

A student may enroll several classes, each class contains several students. Student is optional to class in the many to many relationships. This many to many relationship must be divided into two one to many relationship through many enroll entities.

Each department has several students.

Each student has an Adviser in Department, each Adviser consists several students. An Adviser is also a professor, but not all professor advice students.

 

         

A building contains Rooms and room is used for classes.

 

The following diagram shows crows foot ERD for tiny college.

 

 

 

 

 

 

Database Design Challenges: Conflicting Goals

DatabaseDesignChallenges:
Database designers often must make design compromises that are triggered by conflicting goals, such as adherence to design standards (design elegance), processing speed,andinformationrequirements.

Designstandards:
The database design must conform to design standards.  Such standards have guided you in developing logical structures that minimize data redundancies.

In short, design standards allow you to work with well-defined components and to evaluate the interaction of those components with some precision.

Processing speed:
In many organizations, particularly those generating large numbers of transactions, high processing speeds are often a top priority in database design. High processing speed means minimal access time. If the focus is on data-retrieval speed, you might also be forced to include derived attributes in the design.

Information requirements:
The quest for timely information might be the focus of database design. Complex information requirements may dictate data transformations, and they may expand the number of entities and attributes within the design.

 Therefore, the database may have to sacrifice some of its “clean” design structures and/or some of its high transaction speed to ensure maximum information generation.

A design that meets all logical requirements and design conventions is an important goal.

However, if this perfect design fails to meet the customer’s transaction speed and/or information requirements, the designer will not have done a proper job from the end user’s point of view.

Compromises are a fact of life in the real world of database design.
Even while focusing on the entities, attributes, relationships, and constraints, the designer should begin thinking about end-user requirements such as performance, security, shared access, and data integrity.

Finally, prepare the document! Put all design activities in writing. Then review what you’ve written.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Chapter-5

Normalization of Database Tables

Database Tables & Normalization:

In Database designed process, the table is the basic building block.

The ER model gives good table structure. But it is possible to create poor table structure. Even in a good database structure design.

Def:

          Normalization is an Analysis of functional dependency between the attributes of a relation. It reduces the complex user view into set of stable sub groups or fields.

          The normalization process is used to create a good table structure to minimize data redundancy.

Normalization works through a series of stages called normal form.

The first three stages are

First Normal Form(1NF)

Second Normal Form(2NF)

Third Normal Form(3NF)

Business Databases: Business databases are sufficient to normalize to 2NF or 3NF. The other stages are

Boyce Code Normal Form (BCNF)

Fourth Normal Form(4NF)

Fifth Normal Form (5NF)

          Normalization is a very important in database design .Generally the higher normal forms, the more relational join operations required to produce a specific output. Therefore occasionally we accepted to denormalize some positions of the database to increase the efficiency.

          DeNormalization produces a lower normal form i.e., 3NF will be connected into 2NF will be converted into 1NF.

The need for Normalization:                                                                                                          Consider the Database activities of a construction company that manages several building projects. Each Project has its own project number and project name, employee assigned to it and soon. Each employee has employee number, employee name, classification etc.

 

PROJ_NUM

PROJ_NAME

EMP_NUM

EMP_NAME

JOB_CLASS

CHG_HOUR

HOURS

Total_Charge

15

Evergreen

103

June E .Arbough

Elect.Engineer

84.50

23.8

2011.1

 

 

101

Swetha

Database designer

105.00

19.4

2037

 

 

105

Lakshmi

Database designer

105.00

35.7

3748.5

 

 

106

Durga

Programmer

35.75

12.6

450.45

 

 

102

Ram

Systems Analyst

96.75

20.8

2012.4

SubTotal  10259.45

18

Amber wave

114

Harika

Applications designer

48.10

25.6

1231.36

 

 

118

Ganesh

General support

18.36

45.3

831.708

 

 

104

Sri

Systems analyst

96.75

32.4

3134.7

 

 

112

Hari

DSS Analyst

45.95

44.0

2021.8

                                                                                                                                                                                    Sub Total         7219.568

22

Rolling Tide

105

Sruthi

Database designer

105.00

64.7

6793.5

 

 

104

Raju

Systems analyst

26.75

48.4

1294.7

 

 

113

Ravi

Application designer

48.10

23.6

1135.16

 

 

111

Ramesh

Clerical support

26.87

22.0

591.14

 

 

106

Rao

Programmer

35.75

12.8

457.6

                                                                                                                                                                                   Sub Total              10272.1

25

Starflight

107

Rekha

Programmer

35.75

24.6

879.45

 

 

115

Rani

System analyst

96.75

45.8

4431.15

 

 

101

John

Database designer

105.00

56.3`

5911.5

 

 

114

Manikanta

Applications designer

48.10

33.1

1592.11

 

 

108

Nalini

System analyst

96.75

23.6

2283.3

 

 

118

James

General secretary

18.36

30.5

559.98

 

 

112

P J

DSS Analyst

45.95

41.4

1902.33

                                                                                                                                                                                   Sub Total           17559.82

                                                                                                                                                                                   Total Amount   45310.94

 

The Easiest way to generate the required report to create a table that table has some fields of the Report.

Table_Name : Construction_Company

An Employee can be assigned more than one project.

          For example: Employee number 104 has been assigned to two project .Therefore knowing the project _no and employee no will find the job classification and hours worked. Therefore project_No and emp_no will be taken as primary key.

The above structure of the table has the following deficiency

The project _no is a part of primary key. But it contains null values.

The table entries invites data inconsistency for example job classification value Electrical_Engineer might be entered.Elec_engi ,EE

The table displays data redundancy.

Update Anomalies: Modify the job class for Employee_No 105 requires many alternatives.

Insertion Anomalies: To complete a row definition of a new employee must be assigned to a project. If the employee is not assigned, a dummy project must be created to complete the row.

Deletion Anomalies: Suppose only one employee is associated with a project, if that employee leaves the company and the employee data are deleted, the project information will also be deleted.

          The above deficiency of table structure appears to work, the report gives different results depending on data.

         

Normalization Process: The most common Normal forms and their characteristics are

First Normal Form (1NF): A Relation is said to be in first normal form if it is already in un normalized form and it has no repeating group.

Second Normal Form (2NF): A Relation is said to be in second normal form if it is already in first normal form and it has no partial dependency.

Third Normal Form (3NF): A Relation is said to be in third Normal form if it is already in second normal form and it has no transitive dependency.

Boyce code Normal Form(BCNF): A Relation is said to be in Boyce code Normal form if it is already in third normal form and every determinant is a candidate key.

Fourth Normal Form (4NF): A Relation is said to be in fourth normal form if it is already in Boyce code normal form and it has no multi valued dependency.

Fifth Normal Form(5NF): A Relation is said to be fifth normal form if it is already in fourth normal form and it has no loss less decompose.

Eg: Normalization of construction company Report

PROJ_NUM

PROJ_NAME

EMP_NUM

EMP_NAME

JOB_CLASS

CHG_HOUR

HOURS

Total_Charge

15

Evergreen

103

June E .Arbough

Elect.Engineer

84.50

23.8

2011.1

 

 

101

Swetha

Database designer

105.00

19.4

2037

 

 

105

Lakshmi

Database designer

105.00

35.7

3748.5

 

 

106

Durga

Programmer

35.75

12.6

450.45

 

 

102

Ram

Systems Analyst

96.75

20.8

2012.4

SubTotal  10259.45

18

Amber wave

114

Harika

Applications designer

48.10

25.6

1231.36

 

 

118

Ganesh

General support

18.36

45.3

831.708

 

 

104

Sri

Systems analyst

96.75

32.4

3134.7

 

 

112

Hari

DSS Analyst

45.95

44.0

2021.8

                                                                                                                                                                                    Sub Total         7219.568

22

Rolling Tide

105

Sruthi

Database designer

105.00

64.7

6793.5

 

 

104

Raju

Systems analyst

26.75

48.4

1294.7

 

 

113

Ravi

Application designer

48.10

23.6

1135.16

 

 

111

Ramesh

Clerical support

26.87

22.0

591.14

 

 

106

Rao

Programmer

35.75

12.8

457.6

                                                                                                                                                                                   Sub Total              10272.1

25

Starflight

107

Rekha

Programmer

35.75

24.6

879.45

 

 

115

Rani

System analyst

96.75

45.8

4431.15

 

 

101

John

Database designer

105.00

56.3`

5911.5

 

 

114

Manikanta

Applications designer

48.10

33.1

1592.11

 

 

108

Nalini

System analyst

96.75

23.6

2283.3

 

 

118

James

General secretary

18.36

30.5

559.98

 

 

112

P J

DSS Analyst

45.95

41.4

1902.33

                                                                                                                                                                                   Sub Total           17559.82

                                                                                                                                                                                   Total Amount   45310.94

 

The construction company report is represented in the form of relation. The relation named as CONSTRUCTION_COMPANY this is in un normalized form as shown below

CONSTRUCTION_COMPANY(Proj_No,Proj_Name,(Emp_No,Emp_Name, Job_Classification, Charge_Per_Hour, Hours_ Billed)) -----à (1)

The field Total charge, SUB TOTAL,GRAND TOTAL are not included in the relation because they are derived Attribute.

First Normal Form:

          In Relation (1), the fields in the inner most set of parenthesis put together is known as repetating group. This will result in redundancy of data for the first two relations remove the repetating group. Hence the relation 1 is subdivided into two relations to remove repeating group

PROJECT(proj_No,proj_Name)   ­­­ ----------> (2)

PROJECT_EMP(Proj_No, Emp_No,Emp_Name,Job_Class,Charge_Per_Hour,Hours_Billed) ------ à(3)

Now above relation (2) & (3) are in 1NF. In relation (3) Proj_No , Emp_No jointly serve as key field.

Second Normal Form:

Definition of  Partial Dependency:

            Non key attribute are depending on the part of the composite primary key then it is said to be  Partial Dependency.

          In Relation 2 the number key fields is only one and hence there is no scope for partial dependency the absence of partial dependency in relation 2 takes it 2NF without any modification.

The dependency diagram of relation 3 is shown below

          In the above diagram Hours_Builled depends on Project_No and Emp_No but the remaining non key fields (Emp_Name,Job_Class,Charge_per_Hour)depends on Emp_No this situation is an example of 2nd normal form .Hence the relation 3 is divided into 2 relations.

Assignment(Proj_No, Emp_No, Hours_Billed)------------à(4)

Emp_Job(Emp_No,Emp_Name,Job_Class,Charge-Per_Hour)--------------à(5)

The Relations (4) & (5) are in 2NF

Third Normal Form:

Transitive Dependency:

            If one non prime attribute is determines the other non prime attribute then it is called as transitive dependency..

          In Relation (2) there is only one non key field. This means that it has no transitive dependency. Hence Relation (2) can be treated as 3NF without any modification similarly in relation (4) there is only one non key field. This means that it has no transitive dependency. Hence relation (4) can be treated as 3Nf  without any modification.

In Relation(5) Charge_Per_Hour depends on Job_Classification this means that relation (5) has transitive dependency. The dependency diagram for the relation (5) is shown below.

Diagram

Hence relation (5) is sub divided into two relations. Relation (6) and relation (7) as shown below.

Job(Job_Class, Charge_Per_Hour)-----------------à(6)

Emp(Emp_No,Emp_Name,Job_Class)------------à(7)

For a practical application it is sufficient to normalized up to either 2NF or 3NF

Hence, the process of normalization is stopped and the final 3NF relations of construction company as shown below.

Project(Proj_No,Proj_Name)---------------------------------à(1)

Assignment(Proj_No, Emp_No, Hours_Billed)------------à(2)

Emp(Emp_No,Emp_Name,Job_Classification)------------à(3)

Job(Job_Classification, Charge_Per_Hour)----------------à(4)

Improving Design:

How to improve the design of the database?

Evaluate primary key Assignment:

Each time a new employee is entered into the employee table, A job class value must be entered. Unfortunately the data entry in the job class contains an error, that lead to referential integrity violation. For example entering “DB Designer”instead of Database Designer for the Job_class attribute in the Employee table will trigger such a violation . Therefore it is better to add job code attribute in job relation and employee relation.

 

 

Emp(Emp_No,Emp_Name,Job_Code)

Job(Job_Code,Job_Classification,Charge_Per_Hour)

 

Evaluate Naming conversions:

In the job relation job classification will be changed to Job_Description and Charge_Per_Hour will be charged to Job_Charg_Hour. In the assignment relation Hours_Billed will be changed to Assign_Hours_Billed.

 

Job(Job_Code,Job_Description,Job_Chg_Hour)

Assignment(Proj_No,Emp_No,Assign_Hours_Billed)

 

Refine Attribute Atomicity:

An Automatic Attribute is one that cannot be further sub divided such an attribute is said to be automaticity. In employee table the attribute Emp_Name is not a automaticity because it is further sub divided as Emp_LName,Emp_FName,Emp_Init. These attributes are added to the employee table.

 

Emp(Emp_No,Emp_LName,Emp_FName,Emp_Init,Job_Code)

 

Identify New Attribute:

If the employee table used in real world environment several other attributes will be added. For example Social_Security_Number, Date_of_Joining, Date_of_Birth,Hire_Date,Gross_Salary,Net_salary etc will be added to improve relation.

 

Emp(Emp_No,Emp_LName,Emp_FName,Emp_Init,Hire_Data,Gross_Salary,Job_Code)

 

Identify New Relationship:

 

When we create a new relationship between the table it will not produce unnecessary duplication. Then we create a new relationship.

 

Refine primary Keys:

 

The combination of Emp_No and Proj_No is a primary key in the Assignment table. For example if we add a assigned hours more than one time for a particular project then it violates the primary key constraints.

 

To avoid the violation to add additional attribute Assign_Date to the assignment table. If we want to add assigned hours for a particular project more than one time in the same day then it will violates the primary key constraints. The same data entry gives no problem when Assign_No is used as a primary key in the Assignment relation.

 

Assignment(Assign_No,Assign_Date,Proj_No,Emp_No,Assig_Hour_Billed)

 

To Maintain Historical Accuracy:

 

It is assumed that the Job_Chg_Hour will change over time. The changes to each project were billed by multiplying the hours worked on the project in the assignment table by the Job_Chg_Hour in the job table. Those changes would always show the current change_ per_hour stored in the job table rather than the job charge hour that was in effect at the time of assignment. Because of that we are adding an attribute Assign_Chg_Hour to the Assignment table.

 

Assignment(Assign_No,Assign_Date,Proj_No,Emp_No,Assig_Hour_Billed,Assign_Chg_Hour)

 

Evaluate using Derived Attribute:

 

The derived attribute Assign_Charge is added to the Assignment relation and Assign_charge is updated by multiplying with Assign_Chg_Hour with the Assign_Hours_Billed. However the derived attribute Assign_Charge in the Assignment table makes easy to write reports or invoices.

Boyce code Normal Form(BCNF):

The BCNF can be violated only when the table contains more than one candidate key

Candidate key:

A key is said to be  candidate key if the superkey that does not contain a subset of attributes i.e the key itself a superkey.

These functional dependencies are shown below.

A+B  -------------àC,D

C ----------à B

The table structure has no partial dependency and there is no transitive dependency. But the condition    C ---àB indicates that a nonkey attribute determines the part of key the primary key, causes the table to fail to meet the BCNF requirements.

To convert the above table structure from 3NF to BCNF, first change the primary key to A+C.  The dependency  C---àB means that C is in effect a superset of B. The Decomposition procedures to produce the results shown below.

 

Example for BCNF:

Each Class_Code identifies a class iniquely. A student contains many classes and earning the grades respectively. A staff member can teach many classes. But each class is taught by only one staff member.

Stu_Id+Staff_Id --------àClass_Code, Enroll_grade

Class_Code  ---------------àStaff_Id

The above table contains two candidate keys to violets the BCNF. Now we can eliminate the one candidate key from the above table.

 

Fourth Normal Form (4NF):

          Consider an employee can have multiple assignment i.e, that employee works as an volunteer in service organization and worked in different projects which is shown below

 

The above contains two sets of independent multi valued dependencies(i.e, Org_Code,Proj_Code).If volunteer-1 and Volunteer_2 two tables are implemented. The two tables contains null values.

          In volunteer-3 table has a primary key but it is composed of all attributes of the table. When you consider like this it produces many redundancies.

          To eliminate multi valued dependency by creating the assignment and service tables as shown below.

 

In the Assignment table and service table does not contain multi valued dependency

De Normalization: ( Under Construction..Ravindra)

          Normalization is a very important in database design. Generally the higher normal forms the more the relational join operations required to produce a specific output. A successful design must also consider end user requirement for fast performance. Therefore occasionally we expected to de normalize some portions of the database design in order to meet performance require.

          De Normalization produces lower normal forms 3Nf will be converted to 2NF or 2Nf will be converted to 1NF.

Eg: The need for de normalization due to generate evaluation of faculty report in which each row list the scores of obtaining during the last 4 semester taught.

Faculty Evaluation Report:

instructor

Dept.

Sem-1

Mean

Sem-2

Mean

Sem-3

Mean

Sem-4

Mean

Last_sem avg

 

 

 

 

 

 

 

 

 

 

 

 

We can generate easy above the report but the problem arises. The data are stored in a normalized table. In which each row represented a different score for a given faculty in a given semester.

EVALDATA:

ID

Instructor

DEPT.

Mean

Semistor

 

 

 

 

 

 

It is some difficulty to generate faculty evaluation report the normalized table

The other table FACHLST faculty history table contains the last four semester mean for each faculty .The faculty history table is a temporary table created from the evaldata as shown below.

Instructor

Dept.

Sem-1

Mean

Sem-2

Mean

Sem-3

Mean

Sem-4

Mean

Last_sem avg

 

 

 

 

 

 

 

 

 

 

 

 

The FACHIST is a un normalized from table using the table we can generate .The faculty evaluation report very firstly. After generating the report, the temporary table, FACHIST will be deleted. We are doing like this, we can increase the performance of the database

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Chapter – 6

Advanced Data Modeling

Advanced Data Model:

          The Extended Entity Relationship Model (EERM) sometimes referred to as the enhanced entity relationship model because adding more semantic constructs to the original entity relationship model.  The ER Model constructs Entity super types, entity sub types and entity clustering.

Entity Super types, Entity Sub types:

          In an Organization contains different types of employee and all the employees are not having the same attributes. If you create a one table for all employees to store their information many columns have null values.

 For Example the pilot shares certain characteristics with other employee such as employee_no, emp_name, emp_address, employee_hire_date on the other employees. But  pilot characteristics are not shared by other employee. The pilot characteristics are employee_license and employee_ratingwill generates nulls for employees who are not pilot.

          The pilot entity store only the  attributes that are unique to pilot, and the employee entity store attribute that are common to all employees.

          We can conclude that pilot is a sub type of employee and employee is a super type of pilot. An entity super type is a generic type i.e, related to one or more entity subtypes, where the entity super type contains common characteristics entity subtype contains unique characteristics.

Specialization Hierarchy:

          Entity super types and sub types are organized in a hierarchy which describes the higher level entity super types (parent entity) and lower level entity (child entity) sub types.

The following diagram shows specialization hierarchy by an employee super type and three entity sub types.

Pilot

 Mechanic

 Accountant.

The specialization hierarchy reflects one-to-one relationship between super entity type and sub entity type.

Eg:

A pilot sub type is related to one instance of employee super type.

A mechanic sub type is related to one instance of employee super type.

An Account sub type is related to one instance of employee super type.

 

Inheritance:

          The property of inheritance enables an entity subtype, to inherit the attributes and relationships of the super type.

In the above example the employee entitysuper type participating in a one-to-many relationship with a dependent entity through inheritance all subtypes also participated in that relationship.

Subtype Discriminator:

          A sub type discriminator is the attribute in the super type entity that determines to which sub type is related.

In the above example if the sub type discriminator is emp_type.

If the emp_type has a value of p the super type is related to pilot sub type.

If the emp_type has a value of A the super type is related to Account subtype.

If the emp_type has a value of M the super type is related to mechanic subtype.

Disjoint & Overlapping Constraints:

          An entity super type can have disjoint or overlapping entity supertypes.

          Disjoint subtype are sub types that contains a unique subset of the super type entity set.

The Disjoint subtypes are indicated by the letter ‘d’ inside the category shape. In disjoint the super type entity is related to only one sub entity types.

Eg: An employee who is a pilot can appear only in the pilot sub type, not in any of other sub types.

          Overlapping subtypes are subtypes that contains non unique subsets of the super type entity set.

          The Overlapping subtypes are indicated by the letter’ o’ inside the category shape. In the overlapping the super type entity is not relate to only one sub entity types.

Eg: An employee may be a professor as well as administrator.

The Administrator and professor overlap the super type entity employee.

Completeness constraint:

          The completeness constraint can be partial or total.

          Partial completeness means that not every super type entity is a member of sub type entity. A single horizontal line under the circle represents a partial constraint O.

Total completeness means that every super type is must be a member of atlatest one sub type. A double horizontal line under the circle represents the total completeness constraint.

Specialization and Generalization:

          We can use various approaches to develop entity super types and sub types.

Specialization is the top down process of identifying lower level entity sub type from a higher level entity super type.

Eg: The specialization is used to identify multiple entity supply (Pilot, Mechanic, Accountant) from the super entity employee.

Generalization is the bottom-up process of identifying higher level entity super types from a lower level entity sub types.

Eg: The Generalization is used to identify entity super type from the sub type (Pilot, mechanic, Accountant).

Entity Clustering:

          Generally the data model will develop an initial ERD containing a few entities. As the designed approach completion the ERD will contain hundreds of entities and relationships. In those cases, we can use entity cluster to minimize the number of entities in the ERD.

           An entity cluster is a virtual entity type used to represent multiple entities and the relationship in the ERD.

          An entity cluster is formed by combining multiple inter related entities into a single entity object. An entity cluster is considered virtually in the sense that it is not actually an entity in the final ERD.

 

Entity Integrity:

          The most important characteristics of an entity is its primary key, which uniquely identifies each entity instance. The primary key and foreign key works together implement relationship between the tables in the relational data model.

Natural keys & Primary keys:

          The unique identifies is commonly uncounted in the real world. For example class_no to register for classes, invoice_no to identify  a particular invoice, account_no to identify credit cards and soon. These Examples contains natural keys.

          The natural keys class_no, invoice_no, account_no is used to uniquely identify the real world objects.

          If an entity has a natural identifier, a data modeler uses that natural key as the primary key of the entity.

Primary key Guidelines:

          The primary key main function is to uniquely identity an entity for a given primary key value the relational model can determine values of all dependent attributes.

          The second function is primary key and foreign key implement relationship between tables or entities.

Characteristics of Primary Key:

A primary key contains unique values and not accept the null values.

The primary key should be permanent and unchangeable

A primary key should have the minimum number of attributes.

Unique values can be managed when primary keys are numeric

When to use composite primary keys:

The composite primary keys are particularly useful in two cases.

As Identifier of composite entities.

As Identifier of weak entities.

          In the first case assume that we have a student entity and class entity and the relationship between these two entities is many to many via enroll entity. The enroll entity contains key fields of student entity and class entity which is used to identity entity instance in the enroll entity.

In the 2nd case a weak entity in a strong relationship with a parent entity is normally used.

Eg: The key field of employee entity is used one of the key filed of dependent entity.

Emp(Emp_no, Emp_Fname,Emp_Lname,email)

Dependent(Emp_no, Depn_ Depn_Fname, Depn_Lname, Depn_Addr)

When to use surrogate primary keys?

These are some instances when a primary key does not exist in the real world object.

                                      (or)

When the existing natural key is not suitable as primary keys.

For Example: Consider the facility that rent for rooms for small parties. The manager of the facility keep the all the events in the following table formats.

Date

Time_start

Time_End

Room

Event_name

Party-of

 

 

 

 

 

 

 

 

 

 

 

 

 

The above table can be represented as Event entity

EVENT(Date, Time_start, Time_End, Room, Event_name, Party-of)

In the above entity there is no simple natural keys i.e, used as a primary key in the model.

Based on the concept of primary key we suggest one of these options (Date, Time_start, Room) or (Date, Time_End, Room).

When Implementation of data model, the composite primary key in the event entity makes complexity and also coding.

The solution to the problem is to use a numeric single attribute as surrogate primary key.

          EVENT(Date, Time_start, Time_End, Room, Event_name, Party-of)

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

CHAPTER - 6

Transaction Control and Concurrency Control

Transaction:

          A Transaction is a series of actions to be performed on the database such that either all actions are performed or none.

         

Suppose that we sell a product to a customer, our sales transaction consists of at least the following parts.

We must write a new customer invoice.

We must reduce the quantity on hand in the product entities.

We must update the Acc_Transaction.

We must update the customer_bal.

In database terms a transaction is any action that read from and/or writes to a database. A transaction may consists of a single SQL statement or a series of related update statements or insert statements or combination of select, update and insert statements.

A Transaction is logical units of work that must entirely complete or entirely aborted no intermediate states are accepted. All of the SQL statements in a transaction must be completed successfully. If any of the SQL statements fail the entire transaction is roll back.

A successful transaction changes the database one consistent state to another. A consistent database state is one in which all data integrity constraints are satisfied.

Each database request generates several I/O operations that reads from or writes to physical storage medium.

Transaction Properties

          The Transaction has the following properties.

Atomicity.

Consistency.

Isolation.

Durability.

Atomicity: All Operations of the transaction must be completed. If not the transaction is aborted.

Example: If a transaction T1 has four SQL requests, all four requests must be successfully completed otherwise the entire transaction is aborted.

Consistency: When a transaction is completed, the database must be consistent state. That means it must satisfies all integrity constraints otherwise the entire transaction is aborted.

Isolation: The data used during the execution of the transaction cannot be used by a second transaction until the first transaction is completed.

Durability: Once the transaction changes are done they cannot be undone even in the system failure.

Transaction Management in SQL

          When a transaction sequence is initiated by a user the sequence must continue through SQL statements until one of the following four events occurs.

A COMMIT statement is reached, in which case all changes are permanently recorded into the database.

A ROLLBACK statement is reached in which case all changes are aborted and the database is ROLLBACK to previous state.

The END of the transaction is successfully reached. In which case all changes are permanently recorded within the database. This action is equivalent to COMMIT.

The transaction is abnormally terminated. In which case the changes made in the database are aborted and the database is ROLLLBACK to previous state. This action equivalent to ROLLBACK.

The Transaction Log

The DBMS uses a transaction log file to keep track of all transaction that update the database.

The Information stored in the log file is used by the DBMS for ROLLBACK, abnormal termination or system failure.

While DBMS executes transactions that modify the database, it also automatically updates the transaction log.

The transaction log stores

A record for the beginning of transaction.

For each transaction components

The type of operation is being to perform.

The name of the object effected by the transaction (Name of the table).

The “Before” and “After” values for the fields being updated.

Pointers to the previous and next transaction log entries for the same transaction.

The end of the transaction log.

Eg: update product set prod_qua=prod_qua-2

       Where prod code=’p1001’;

      Update Customer set cus_bal=Bal+8000

     Where cust_no=’c1234’;

 

 

 

The transaction log for the above transaction is,

TRL_id

TRX-NUM

PREV-PTR

NEXT-PTR

OPERATION

TABLE

ROW-ID

ATTRIBUTE

BEFORE VALUE

AFTER VALUE

341

101

Null

352

START

Start operation

 

 

 

 

352

101

341

363

update

Product

1011

Product_poh

25

23

363

101

352

365

Update

Custmer

363

Cust_Balance

525.15

615.75

365

101

363

null

commit

End of operation

 

 

 

 

 

          If the system failures occur the DBMS will examine the transaction log for incomplete transaction and ROLLBACK the database to its previous state.

Q. Explain Concurrency control in transaction management.

          Concurrency control is important because the simultaneously execution of transaction over a shared database can create several data integrity and consistency problems.

Lost updates.

Uncommitted data

Data inconsistency.

Lost Updates:

          The Lost update problem occurs when two concurrent transitions T1 and T2 are updating the same data element and one of the update is lost.

Eg: The two concurrent transactions T1 and T2 update the prod_qua value for same item in the prod table. Assume that current prod_qua value is 35.

                   Transaction                                                Computation

T1: Purchase 100 units                     prod_qua=prod_qua+100

T2:Sale 30 units                                prod_qua=prod_qua-30

          The following table shows the serial execution of those transaction under the normal circumstance gives the answer.

Time

Transaction

Step

Stored Value

1

T1

READ PROD_QOH

35

2

T1

PROD_QOH=35+100

 

3

T1

WRITE PROD_QOH

135

4

T2

 READ PROD_QOH

135

5

T2

PROD_QOH=135-30

 

6

T2

WRITE PROD_QOH

105

 

          Suppose that transaction is read a prod_qua value from a table before a previous transaction has been committed. This sequence shows the following table, how the lost update problem can occur.

Time

Transaction

Step

Stored Value

1

T1

READ PROD_QOH

35

2

T2

READ PROD_QOH

35

3

T1

PROD_QOH=35+100

 

4

T2

PROD_QOH=35-30

 

5

T1

WRITE PROD_QOH

135

6

T2

WRITE PROD_QOH

5

 

Uncommitted Data

          The uncommitted data problem occur when two transactions T1 and T2 are executed concurrently and the first transaction T1 is Rolled back after the second transaction T2 has already accessed the uncommitted data.

Eg: The two concurrent transactions T1 &T2update the prod_qua value for same item in the prod table assumes that the current prod_qua value is 35.

                             Transaction                                                 Computation

                   T1: Purchase 100 units                     prod_qua=prod_qua+100

                   T2: Sale 30 units                               prod_qua=prod_qua-30

          The following table shows under normal circumstance, the serial execution of this transaction use that transaction.

 

Time

Transaction

Step

Stored Value

1

T1

READ PROD_QOH

35

2

T1

PROD_QOH=35+100

 

3

T1

WRITE PROD_QOH

135

4

T1

ROLLBACK

35

5

T2

READ PROD_QOH

35

6

T2

PROD_QOH=135-30

 

7

T2

WRITE PROD_QOH

5

 

The following table shows, how the uncommitted data problem can arise when the Roll back is completed after T2 has begin its execution.

Time

Transaction

Step

Stored Value

1

T1

READ PROD_QOH

35

2

T1

PROD_QOH=35+100

 

3

T1

WRITE PROD_QOH

135

4

T2