SQL #7: Joining

3 minute read

This is the seventh in a series of SQL notes I made during the Kubrick Data Engineering training course. The others are:
#1: Database Design
#2: SQL Design
#3: DQL
#4: DDL
#5: DML
#6: Sorting
#8: Temporary Tables
#9: Programmable Objects
#10: Indexing
#11: Operation & Optimization

All the posts in this series are my personal note taking and may be updated as the course progresses.


JOIN

When using 3NF databases we have many seperated tables, a JOIN combines two or more of these tables horizontally (add columns). The only conditions for a join is that there is some common key, which is given after the ON keyword.

INNER JOIN

JOIN aka INNER JOIN only joins where values exist in both columns. The syntax involves a normal FROM detailing the first table with a surrogate, followed by the JOIN of the second table and the ON selecting the common column. The following query will output a table containing product names and colours from Production.Product along with an orderID and corresponding order quantity from SalesOrderDetail where the product ID is matched in both tables.

SELECT
	 d.salesorderID
	,p.name
	,p.color
	,d.OrderQty
FROM sales.SalesOrderDetail d
INNER JOIN production.product p
	ON d.ProductID = p.ProductID

LEFT OUTER JOIN

LEFT JOIN aka LEFT OUTER JOIN will include all distict values in the LEFT table listed as well as the common values. Often LEFT JOIN is combined with INNER JOIN to make a string of joins to connect multiple tables.

The following script uses multiple joins to connect otherwise unconnected tables. For example Person.Person is connected to Person.BusinessEntityAddress via the BusinessEntityID key. Person.Address is also joined with Person.BusinessEntityAddress on the AddressID key. Therefor a direct connection can be made between Person.Address and Person.Person which previously did not exist.

SELECT 
		 isnull(pp.[Title] + ' ','') + ' ' + pp.[FirstName] + ' ' + isnull(pp.[MiddleName] + ' ','') + pp.[LastName] AS FullName
		,isnull(a.AddressLine1 + ', ' + isnull(a.[AddressLine2] + ' ','') + a.City + ', ' + a.PostalCode, 'N/A') AS [FullAddress]
	FROM person.person pp
	LEFT JOIN person.EmailAddress e
		ON e.BusinessEntityID = pp.BusinessEntityID
	INNER JOIN Person.BusinessEntityAddress be
		ON pp.BusinessEntityID = be.BusinessEntityID
	INNER JOIN Person.[Address] a
		ON be.AddressID = a.AddressID
WHERE PersonID IS NOT NULL

FULL OUTER JOIN

FULL OUTER JOIN joins every value from both tables, regardless of whether they match or not.

CROSS JOIN

CROSS JOIN joins every row from one table to every one of the other and should be used with caution as it can explode on even medium sized tables. An example would be creating a deck of cards from the suit types and numbers:

SELECT
     s.suit
    ,v.value
FROM suit s
CROSS JOIN [value] v

SELF JOIN

SELF JOIN is a relationship between two columns that exists within a single table. A common example of self joins might be an Employee / Manager hierarchy table. A manager is also an employee. An employee has a primary key employeeID. The managerID foreign key would reference the employeeID as a self join within the same table.


Set Operations

Set operations are similar to JOIN operations but they’re vertical (add rows). For a set operation to be successful there must be the same number of columns in each table with matching data types. If this is true the operation can be placed between two SELECT statements.
UNION - Combines tables by row, removing duplicates (requires sorting)
UNION ALL - Combines tables by row, keeping duplicates (doesn’t require sorting)
INTERSECT - Returns rows that are common between tables
EXCEPT - Returns rows that do appear in the first table and do not appear in the second

SELECT 
	 [CustomerID]
	,[Name]
	,[Phone]
FROM Person.Person
UNION ALL
SELECT
	 462875
	,'Joe Bloggs'
	,'07777777777'

Updated: