Extended Entity-Relationship Model

27 Jan 2019

In order to do data modeling, we need data models. The Extended Entity-Relationship Model is particularly good at helping us to fix and represent a perception of reality.

Basics

A database is a model of structures of reality.
The Extended Entity-Relationship Model is a language that allows us to capture models of reality that do not vary over time, and structural aspects of reality that are relevant for the purpose at hand.
A model is like a mirror of reality. It changes to reflect reality over time.

Entity Type and Entity Surrogates

Entities – represented by squares.
- Within one ERM, entity type names must be unique.
Properties – represented by ellipses.
- Single-Line ellipsis indicates it is a single-value property.
- Property values are:
  - Lexical, visible, audible
  - they are things that name other things

Instances of User, shown above, are surrogates, representing an actual user.
Values of Email are not only a thing by itself, but it also names something, an email.
Identifying property – underlined.
- For each identifying property value there is at most one instance of the identified entity. Every entity must be uniquely referenceable.

Composite Properties – Composite properties are composed from other properties.

Multi-valued Properties – Double-circled. These can be lists of items with one or more elements.

Relationship Types

Relationships are represented by diamonds. Relationships have cardinality, which are the number of each entity that may/must participate in the relationship.
The names of multiple relationship types between the same two entity types must be unique.
1-1 Relationship Types
- Partial Function (in both ways) – a partial function is where not all of one entity maps to not all of the other.
  - Called “Function” because of one-to-one entity correspondence.
  - Called “Partial” because not all entities are mapped.

1-N (“1-to-Many”) Relationship Types
- Each of the first entity can map to 0, 1, or many of the other entity.
- Partial Function from the Many side to the One side.

Mandatory 1-N Relationship Type
- Signified by a bold, solid line or double line. In the example below, RegularUser must mandatorily participate in the Current Job relationship with Employer. This means that all RegularUsers have a relationship with an Employer.
- Total Function from N-side to 1-side.

N-M Relationship Type
- Not a Function at all. A mathematical “Relationship”

N-ary Relationship Type
- Shown below is a “Ternary” relationship.
- “A particular RegularUser, on a particular Team, participates in one particular Event.” Implication is that all three are required to identify one EventTeamMember relationship.

For example, the following diagram has a much “broader” meaning than the narrowly prescribed relationship shown above. There are three distinct facts in the diagram below, compared to one single more constraining fact in the one above.

Identifying Relationships / Weak Entity Types
- StatusUpdate is called weak because it cannot exist without RegularUser
- DateAndTime is called a Partial Identifier
- In example below, Email is an identifying property type. DateAndTime is another property type.
- A particular StatusUpdate is identified by an Email and DateAndTime.
  - StatusUpdate cannot exist without RegularUser. StatusUpdate are not uniquely identified by only a DateAndTime.
  - StatusUpdate cannot be identified without RegularUser
  - (Email, DateAndTime) identifies StatusUpdate

Identifier Examples for Weak Entities – Office Hour 1/24/19
- Test – Weak entity type. Identified by CourseID and testID
- Section – weak entity type. Identified by combination of SectionID, testID and coursed.

Wrong Answer – Weak entity type. Identified by questionID combined with ID.

Recursive Relationship Types
- Called Recursive because it describes a relationship to itself.
- In example below, need to add directionality to the relationship, and label the relationship types.

Supertypes and Subtypes (“is-a” relationship types)
- Disjointness Constraint – the two subtypes cannot overlap. Indicated by the circled d.
- Overlap Constraint – the two subtypes can overlap. Indicated by the circled o.

Inheritance
- Similar conceptually to the concept of inheritance in object-oriented programming.
  - Every user has an email and a password.
  - Male, Female, RegularUsers, AdminUsers inherit the email and password properties from User
  - Users do not have a Birthday and CurrentCity.

Union entity type
- Two rules for Union entity types
  - Employer is a subset of Company union GovtAgency. So, there are not going to be any Employers who are not either a Company or a GovtAgency.
  - The intersection of Company and GovtAgency is the empty set.

Some Clarifications

How do we know whether something is a thing? Or a Relationship? Or a Property?
Is the EER supporting the fundamental types of abstraction? Classification, Aggregation, Generalization?
What would the type of a query on the EER model be?

Are relationships entities? Or just glue?

Relationships may have attributes.

For 1-N and (1-1) relationships, attributes may be moved to the entity on the “many-side” either size. CurrentJobSinceDate could be moved to the RegularUser entity in the example above.

In the example above, the many-to-many relationship between RegularUser and School is “objectified” and transformed into the architecture shown. The functionality is unchanged.

The decision of whether something is a entity, relationship, or property depends upon the context within which the database will operate. This is part of the process of “Fixing and Representing a Model of Reality,” which is the central task of creating an EER in the first place.

What can the EER do?

There are three types of abstraction that people agree are important when fixing a perception of reality:

Classification – clearly supported by EER
Generalization – Super/Sub type relationship is included to support generalization
Aggregation – Not supported by EER

Problem with Aggregation

Consider the example of a car’s drivetrain. An instance of a drivetrain is composed of instances of its component parts, which include things like an engine, clutch, transmission, driveshaft, differential, axle, and shock.
This sort of thing cannot be easily modeled in the EER model.
- You can fudge it, but it is like playing object-oriented programming in C. You can force it, but it is not the ideal tool.

What is the Result Type of a Query?

Recall: A data model consists of formalisms to express data structures, constraints, and operations.
Consider a query on the following EER diagram. What would the result of the query be, if, for example, we wanted to capture Email, FirstName, LastName, and SchoolName

The result of that query would be a simple list of properties. But, a list of properties is not a type, not a relationship type, not a super/sub type, etc. Since the result does not have a type, there is no way that we can take that result and continue to operate on it using a query language.
Conclusion is that, since the result did not have a type, the query was not performed using a “Closed Query Language.” Using a “Closed Query Language” is the only way that we can formulate high level ideas and ask high-level questions.
The foregoing is the reason there are no EER Database products. Almost all commercially-available database systems use the Relational model.

The next section will explain how to map EER diagrams to Relations. First, we need to define what a Relation is.

The Relational Model – Theoretical Foundation

Three primary considerations for a relational model:

Data Structures
Constraints
Operations (two fundamental notations for expressing operations in relational databases are shown below)

Algebra
Calculus
- Tuple Calculus (SQL) – tuples of relations are variables
- Domain Calculus (QBE) – cells of domains are variables

Data Structures

EER has several: entity types, property types, relationship types, super/sub types, etc.
Relational model, by contrast, has only one structure: Relations

Definitions

A domain, D, is a set of atomic values (a type).
- Atomic values, from standpoint of database management system, has no meaning inside them. Pure values.
A relation, R, is a subset of the set of ordered n-tuples, where the set elements are taken from some domain, D.
- Set of sets of elements of a single type, IE, a set of columns, where each column is of a single type. IE, a table.
An attribute, A, is a unique name given to a domain in a relation helping us interpret domain values.
- These are names of columns that allow us to avoid referring to columns only by column number.
- An important side-effect of this is that the “value” of a relation is independent of attribute order and tuple order! IE, column order and row order do not matter.

Relations are illustrated by Tables

Other less critical definitions are shown in the image below.

Constraints

Keys
Primary Keys – defining primary keys have two important consequences:
- Entity Integrity
- Referential Integrity
In table below, User defines the set of users that exist, about whom we are collecting information. Email is the primary key of the table, therefore:
- No value of Email can be null.
- When Email is used in another table, the set of emails used in that table must be a subset of the Emails that exist in the User table. For example, emails in RegularUser must be a subset of the emails in User

This content is taken from notes I took while taking the Database Systems Concepts and Design (CS 6400) course offered at Georgia Tech in the Spring, 2019 semester. The course content was created and presented by Dr. Leo Mark, an Associate Dean of Academic Programs and Student Affairs at Georgia Tech and Associate Professor in Georgia Tech's College of Computing.