About the Specialization and Course
The course objectives follow:
Describe the structure of relational databases.
Interpret and create entity-relationship diagrams and relational schemas that describe the contents of specific databases.
Write queries that retrieve and sort data that meet specific criteria, and retrieve such data from MySQL and Teradata databases that contain over 1 million rows of data.
Execute practices that limit the impact of your queries on other coworkers.
Summarize rows of data using aggregate functions, and segment aggregations according to specified variables.
Combine and manipulate data from multiple tables across a database.
Retrieve records and compute calculations that are dependent on dynamic data features, and translate data analysis questions into SQL queries.
What You Will Learn
The most common way to pronounce ‘SQL’ are ‘ess-que-ell’ and ‘sequel’, but there is no consensus. MySQL documentation says the official way is ess-que-ell. Oracle documentation says ‘sequel
A Survey by the Teradata University Network asked 400 recruiters from technical companies to answer this question: “When I recruit for business intelligence, business analyst roles, it is important that the students have the following coursework or knowledge.” The answers:
- Communication Skills
- SQL and Query Skills
- Basic Analytics
SQL is the language used to retrieve data from specific types of databases called relational databases. Relational databases are the standard way businesses store highly organized and structured business data.
MySQL is used in this course for the same reasons Box uses MySQL, namely: “It’s reliable, it’s mature, it’s open source, and well understood.” Facebook, YouTube, Yelp, Dropbox, GitHub, LinkedIn, Etsy, Twitter, Booking.com, Square, Pinterest, Yahoo also all use MySQL.
Teradata databases are the other type of relational database that will be taught. It is the leader in data warehousing. A ‘Data Warehouse’ is a copy of company data that is set up explicitly for querying and reporting. The warehouses are used to create dashboards for executives, among other things. Sam’s Club, Verizon, Apple, Bank of America, Walmart, and AT&T all use Teradata
Understanding Relational Databases
Problems Databases Solve
Database solutions need to be:
- easily retrievable,
- easy to update,
- accessible and modifiable by multiple people simultaneously,
- consistent across all stored records
- space efficient,
- fast, and
How Relational Databases Solve These Problems
The fundamental concept behind relational databases is that they separate data sets into individual pieces or subsets of data. The various subsets have a theme that logically binds the data records. This way, the system only needs to interact with the subset of the database that is pertinent to a given query, and not the entire database. As an added benefit, it also results in the smallest size database.
In practice, what relational databases do is organize data sets into smaller tables that each have their own unified theme. A critical component of database design is that each table needs to have a column that is used to link that table to other tables. That column needs to be unique.
A further advantage of relational databases is that they enable the computer to perform its operations in mathematically optimal (and therefore, the fastest possible) manner. Finally, they enable robust data-integrity checking, which means that certain columns will only accept data of a certain type.
Database Design Tools that Will Help You Learn SQL Faster
The way analysts ask a database to give them data is to write and execute a “query.” The code is SQL code. Almost all database management systems (DBMS) use SQL or a SQL-like language, because SQL is so ubiquitous and intuitive.
The best way to learn SQL is by first studying Entity-Relationship Diagrams (or, ER diagrams). These are pictures that represent how the data in a database are supposed to be connected or related to one another.
Relational schemas are another type of tool that are similar. These are maps of the database. Sometimes, relational schemas are missing important data that are contained in the ER diagram.
The tool that will be utilized to create these diagrams is erdplus.com.
How Entity-Relationship Diagrams Work
ER Diagrams are used to communicate which tables are connected to which other tables, and how. The tables are linked together by one or more columns with the same values.
Entity Instance: a single occurrence of an entity type (Database structure: a row)
Unique Attribute: an attribute with a unique value in each entity instance (AKA: “unique key”, or “unique identifiers,” database structure: column that links tables)
|Boxes||Entities||Categories of similar, but unique, measurements||Student, College||Tables|
|Ovals||Attributes||Unique measurements within a category||CollegeStartDate, CollegeLocation||Columns|
|Underlined Entity Name||Unique Attribute||Indicates key||StudentID, CollegeName||May be Primary Key|
|Diamond and lines||Relationship||Relationship between entities||Student “attends” College||Type of Join used to combine tables|
A dashed oval around an attribute indicates that attribute is derived from other attributes.
Cardinality Constraints: Limitations on the relationships between two entities. These are represented by the symbols located near the entity boxes. The symbol nearest the box is the maximum number of instances of that box’s entity to which that relationship can refer. The other symbol is the minimum number of instances.
For example, a student may attend a maximum of one, and minimum of one, college. Conversely, a college is attended by a minimum of one, and maximum of many, students. The cardinality constraints are represented by the following geometries.
|M or N||Infinite|
|O (circle)||Optional, or zero|
Specific numbers in parentheses may also be provided. In this case, the numbers in parentheses take precedence over symbols, and the numbers are always written with the minimum number on the left, and the maximum on the right.
Database Structures Represented by Entity-Relationship Diagrams
Some of the more complex ER Diagrams can be interpreted as follows.
|Parentheses around Entity Name||Composite Attribute||Important measurements that can be completely reconstructed using other entities. These are usually not combined into their own column in the database.||ClassroomID, which is a composite of Building and RoomNumber|
|Double Rectangle||Weak Entity||An entity with a partial key||An apartment that has a unique identifier only when AptNo is combined with BuildingID|
|Dashed Underline||Partial Key||A key that, on its own, is not unique. It can become unique if it is joined to the primary key of the entity to which its weak entity is joined.||AptNo is not unique in the database, but it is unique within each BuildingID|
|Double Diamond||…||Relationship between a weak and a strong entity||…|
The fundamental takeaway is that when the ER Diagram has any of these special symbols, joins must be performed carefully to ensure the keys are unique.
ER Diagrams are distinct from Relational Schemas. Whereas ER Diagrams are used to represent the concepts that database architects implement, they do not represent how a database is actually organized.
To represent how a database is actually organized, a relational schema is used. The critical components of relational schema are:
- Primary Keys, and
- Foreign Keys.
The technical term for tables in a relational schema are “relations.” A relation is the logical idea that serves as the organization for the physical basis for the database. This term comes from set theory.
|Common Name||Technical Term|
Primary Key: Column (or set of columns) whose value is unique for every row in a table. Only 1 column per table can be a primary key, unless multiple columns are required to identify each row.
Unique Columns: Identified with a “(U)” beside it in the schema.
Foreign Key: Column that refers to the primary key of another table. Sometimes indicated by (FK) next to the name of the column in the table. Note that the names of foreign keys do not have to match the names of primary keys.
Generally, arrows are drawn from the foreign key of one table to the primary key of another. Note that the cardinality constraints from the ER Diagrams can also be included in the relational schema, instead of arrows.
Weak entities are represented in a relational schema by multiple items being underlined in the schema.
“Multivalued” attributes are when an entity can have multiple values for the same attribute. As an example, one class could have different classroom numbers over the course of a semester. This is indicated by a double circle around the attribute.
The arrow goes from the table with the foreign key to the table where the key comes from. The arrow therefore points to the primary key.
If two columns in a table are underlined, both are required to be used in conjunction to uniquely identify different rows.
Attributes in ER diagrams can be unique, but they do not get assigned to be primary or foreign keys until the relational schema stage of modeling a database.
Whenever there is a 1:Many relationship, the entity on the Many side should get the foreign key. The rationale for this is that in the opposite situation, there would be duplicate values of the unique identifier on the 1 side of the table.
In 1:1 relationships, it doesn’t matter which table has the primary or foreign key.
If a database has a many-to-many relationship, then the relational schema will have an entity with two foreign keys in it, that correspond to the primary keys in the other two tables. The table with two foreign keys is referred to as a “mapping table” or “linking table.” The two foreign keys are actually a single composite primary key. The reason for this is there is no way to follow the rule that primary keys must not be duplicated and still include a foreign key in the same table as the primary key.
- Each doctor works at between 2 and 4 locations
- Each location has between 3 and 10 doctors on-site
- For each doctor, store a unique doctor ID, name, and graduation year
- For each location, store a unique location ID and name
- Each location has at least one exam room
- Each exam room is at only one location
- For each location, store a unique location ID and a name
- For each exam room, store a room number and size
- The exam room number is unique for each location
- Each company has at least one mutual fund
- Each mutual fund is owned by only one company
- Each mutual fund consists of at least one security
- Each security can be owned by many mutual funds, or by none at all
- For each company, store a unique company ID, location, and name
- For each mutual fund, store a unique ID, name, and inception date
- For each security, store a unique ID, name, and type
- For each security acquisition by a mutual fund, store an amount
- For each designer, store a unique ID, unique SSN, and name
- For each outfit, store a unique ID, price, and completion date
- For each show, store a unique ID, date, and location
- For each technician, store a unique SSN and name
- For each customer, store a unique ID, name, and multiple phone numbers
- Shows feature 1 or 2 designers
- Designers can be featured in many shows, or zero
- Outfits are designed by one designer
- Designers design at least one outfit
- Technicians work on at least one outfit
- Outfits are worked on by one or many technicians
- Store the date each technician begins working on each outfit
- Customers purchase one or many outfits
- Each outfits is sold to one customer
This is an example of a relational schema that includes cardinality constraints.
TRNSACTcan be linked by using
- Within the
seqare all needed in combination to uniquely identify transactions
- Single line symbol is interpreted as “must have a minimum of, but can have no more than, one”
Content for this note is taken from the Coursera course “Managing Big Data with MySQL.”