SQL #7: Joining
This is the seventh in a series of SQL notes I made during the Kubrick Data Engineering training course. The others are:
#1: Database Design
#2: SQL Design
#3: DQL
#4: DDL
#5: DML
#6: Sorting
#8: Temporary Tables
#9: Programmable Objects
#10: Indexing
#11: Operation & Optimization
All the posts in this series are my personal note taking and may be updated as the course progresses.
JOIN
When using 3NF databases we have many seperated tables, a JOIN
combines two or more of these tables horizontally (add columns). The only conditions for a join is that there is some common key, which is given after the ON
keyword.
INNER JOIN
JOIN
aka INNER JOIN
only joins where values exist in both columns. The syntax involves a normal FROM
detailing the first table with a surrogate, followed by the JOIN
of the second table and the ON
selecting the common column. The following query will output a table containing product names and colours from Production.Product
along with an orderID and corresponding order quantity from SalesOrderDetail
where the product ID is matched in both tables.
SELECT
d.salesorderID
,p.name
,p.color
,d.OrderQty
FROM sales.SalesOrderDetail d
INNER JOIN production.product p
ON d.ProductID = p.ProductID
LEFT OUTER JOIN
LEFT JOIN
aka LEFT OUTER JOIN
will include all distict values in the LEFT
table listed as well as the common values. Often LEFT JOIN
is combined with INNER JOIN
to make a string of joins to connect multiple tables.
The following script uses multiple joins to connect otherwise unconnected tables. For example Person.Person
is connected to Person.BusinessEntityAddress
via the BusinessEntityID
key. Person.Address
is also joined with Person.BusinessEntityAddress
on the AddressID
key. Therefor a direct connection can be made between Person.Address
and Person.Person
which previously did not exist.
SELECT
isnull(pp.[Title] + ' ','') + ' ' + pp.[FirstName] + ' ' + isnull(pp.[MiddleName] + ' ','') + pp.[LastName] AS FullName
,isnull(a.AddressLine1 + ', ' + isnull(a.[AddressLine2] + ' ','') + a.City + ', ' + a.PostalCode, 'N/A') AS [FullAddress]
FROM person.person pp
LEFT JOIN person.EmailAddress e
ON e.BusinessEntityID = pp.BusinessEntityID
INNER JOIN Person.BusinessEntityAddress be
ON pp.BusinessEntityID = be.BusinessEntityID
INNER JOIN Person.[Address] a
ON be.AddressID = a.AddressID
WHERE PersonID IS NOT NULL
FULL OUTER JOIN
FULL OUTER JOIN
joins every value from both tables, regardless of whether they match or not.
CROSS JOIN
CROSS JOIN
joins every row from one table to every one of the other and should be used with caution as it can explode on even medium sized tables. An example would be creating a deck of cards from the suit types and numbers:
SELECT
s.suit
,v.value
FROM suit s
CROSS JOIN [value] v
SELF JOIN
SELF JOIN
is a relationship between two columns that exists within a single table. A common example of self joins might be an Employee / Manager hierarchy table. A manager is also an employee. An employee has a primary key employeeID. The managerID foreign key would reference the employeeID as a self join within the same table.
Set Operations
Set operations are similar to JOIN
operations but they’re vertical (add rows). For a set operation to be successful there must be the same number of columns in each table with matching data types. If this is true the operation can be placed between two SELECT
statements.
UNION
- Combines tables by row, removing duplicates (requires sorting)
UNION ALL
- Combines tables by row, keeping duplicates (doesn’t require sorting)
INTERSECT
- Returns rows that are common between tables
EXCEPT
- Returns rows that do appear in the first table and do not appear in the second
SELECT
[CustomerID]
,[Name]
,[Phone]
FROM Person.Person
UNION ALL
SELECT
462875
,'Joe Bloggs'
,'07777777777'