SQL #3: DQL
This is the third in a series of SQL notes I made during the Kubrick Data Engineering training course. The others are:
#1: Database Design
#2: SQL Design
#4: DDL
#5: DML
#6: Sorting
#7: Joining
#8: Temporary Tables
#9: Programmable Objects
#10: Indexing
#11: Operation & Optimization
All the posts in this series are my personal note taking and may be updated as the course progresses.
Data query language consists of SELECT
and it’s associated options. It is used to find and pull out data from a table. While SELECT
is simple in itself, it can be combined with many other key words and functions to form very complex queries. (All of the examples in this series use data from the AdventureWorks dataset).
Single SELECT
SELECT
is the core syntax of SQL queries, a SELECT
statement in it’s simplest form queries a table with no filters:
SELECT *
FROM Production.Product
FROM
is used to select the table to query from.
WHERE
A more specific query would state the column names and use WHERE
to apply a logical condition on a column using an equality operator. In SQL, the precursor to comments is --
. The text qualifier is the square bracket [ ]
.
--VERTICAL FILTER (Columns):
SELECT
[Name]
,[Color]
,[Partno]
FROM Production.Product
--HORIZONTAL FILTER (Rows):
WHERE [Cost] <= 500
The ISO standard equality operators are =,<,>,>=,<=,<>, SQL also has BETWEEN
, LIKE
, IN
, and EXISTS
.
BETWEEN
specifies an inclusive number range:
SELECT * FROM Production.Product
WHERE ListPrict BETWEEN 50 AND 100
LIKE
matches strings using REGEX (_
is ANY single character, %
is ANY number of ANY character):
SELECT * FROM Production.Product
WHERE NAME LIKE 'A%'
IN
match against a chosen set of options:
SELECT *
FROM Production.Product
WHERE colour IN (‘silver’,’blue’,’green’)
EXISTS
matches where some row of a subquery exists:
SELECT ListPrice
FROM Production.Product
WHERE EXISTS
(SELECT SalesID FROM Sales.Sales WHERE OrderQty > 50);
This EXISTS
example shows an important feature of WHERE
and other filters; they can use a subquery as an arguement.
HAVING
HAVING
is similar to WHERE
in that it filters rows. However HAVING
is always used after a GROUP BY
statement and is applied to the grouped rows.
The following query shows products grouped by color, then by size which have an individual cost of < 1000 and a grouped cost of < 1000. AS
can optionally be used to set an alias for a column or table.
SELECT
[color]
,[size]
,SUM(standardcost) AS sumcost
,AVG(standardcost) AS avcost
FROM Production.Product
WHERE standardcost < 1000
GROUP BY [color], [size]
HAVING SUM(standardcost) < 1000
While SQL server is case insensitive, other systems are not so it’s also a good idea to have consistent casing (either upper or lower).
The user can also specify DISTINCT
to return only distinct values. TOP n
is used to return the top n values or TOP n PERCENT
to return a percentage from the top. If the query is ordered and two values are tied then TOP 1 WITH TIES
will return all the first tied values for example.
The INTO
command (before FROM
) is used to save the query into a new table.
CASE
CASE
is used to SELECT
a different value for a column depending on some condition, this can be useful for ‘binning’ an attribute, price for example.
SELECT
CASE
WHEN ListPrice >= 2000 THEN 'Expensive'
WHEN ListPrice >= 500 THEN 'Moderate'
WHEN ListPrice > 0 THEN 'Cheap'
WHEN ListPrice <= 0 THEN 'Free'
ELSE 'Unknown'
END price_bin
FROM Production.Product
The above script will create a new column called price_bin
containing labels based on the ListPrice
column values.
Order of Operations
SQL queries have a defined logical order of operations which is the order in which filters and groupings are applied. It’s important to understand this order as it affects every aspect of how a SELECT
statement is written. Confusingly, the lexical (written) order is different to the logical order, which is also different to the actual order of operations (apparently).
The logical order of operations is:
FROM
: Choose tableWHERE
: Filter rowsGROUP BY
: Group filtered rows- Aggregations: Summarize rows
HAVING
: Filter groupsOVER
(Window): Aggregate over groupsSELECT
: Filter columnsDISTINCT
: Remove duplicatesUNION
,INTERSECT
,EXCEPT
: CombineSELECT
statementsORDER BY
: Choose orderingLIMIT
,FETCH
,TOP
: Filter ordered rows
Each statement can only use arguements that are defined before it in the list. This means that for example;
WHERE
can not include aggregations but window functions (OVER
) canORDER BY
is the only statement that can use aliases defined bySELECT
There are a few ways to get around the limitations of the single SELECT
. The simplest is a nested SELECT
.
Nested SELECT
A nested SELECT
is an (inefficient) method for circumventing the logical order of operations and form a more complex query.
Any example would be, to find the discontinued products from Production.Product
we would need to search Sales.SalesOrderDetail
for the most recent order. If the last time the product was modified was in the same year then it’s probably still stocked. An EXCEPT
statement can be used to find the products which are not in this list.
SELECT ProductID FROM Production.Product
except
SELECT ProductID FROM Production.Product
WHERE ProductID IN(
SELECT ProductID FROM sales.SalesOrderDetail
WHERE year(ModifiedDate) IN(
SELECT
max(year(orderdate)) lastorder
FROM sales.SalesOrderHeader ))