SQL #9: Programmable objects
This is the ninth in a series of SQL notes I made during the Kubrick Data Engineering training course. The others are:
#1: Database Design
#2: SQL Design
#3: DQL
#4: DDL
#5: DML
#6: Sorting
#7: Joining
#8: Temporary Tables
#10: Indexing
#11: Operation & Optimization
All the posts in this series are my personal note taking and may be updated as the course progresses.
Functions
As well as default system functions users can define new functions, the category of function is either:
- Scalar - Returns a single value, like the system functions above (
SELECT <fnc>
) - In Line tabled value - Returns a table from a
SELECT
statement, can take an arguement as input (SELECT FROM <fnc>
) - Multi statement table valued - Returns a table from multiple
SELECT
statements, can take input (SELECT FROM <fnc>
)
Scalar
A scalar functions act on a single input variable and outputs a single value. They can be computationally inefficient when they involve IF functions that cycle through the table row by row.
CAST
CAST(<val> AS DATE)
- Treats the supplied value as some other formatTRY_CAST(<val> AS DATE)
- Same asCAST
but enters NULL if there’s an errorCONVERT(DATE, <val>, <optn>)
- SQL server specific version of cast which allows for extra options
Missing Data
NULL
in SQL is an annoying value as testing for it will always return UNKOWN, even if the query is clearly asking for a character that is not null, or asking for a null. The answer is to use IS NULL
or IS NOT NULL
which will return TRUE or FALSE. For many reasons it’s a good idea to avoid the use of NULL
entirely.
ISNULL(n1,n2)
- Return n1 if it is NOTNULL
, otherwise return n2NULLIF(n1, n2)
- ReturnNULL
if n1=n2, otherwise return n1COALESCE(n1,n2,..)
- Gives first value from list that isNOT NULL
Maths
Most of the mathematical aggregation functions are self-explanatory. They are MAX()
, MIN()
, COUNT()
, SUM()
, AVG()
, STDEV()
, and VAR()
. These functions are often combined with other aggregation methods to draw insight. Other functions are:
CEILING()
- Round upFLOOR()
- Round downROUND(,D)
- Round to D decimal places -DECIMAL(,P,S)
- Round to P total digits and S significant figures -CHECKSUM()
- Computes a hash as a number -NEWID()
- Computes a random ID stringRAND()
- Random number betweeen 0-1
String functions
LEN()
- LengthLTRIM()
,RTRIM()
- Trims blank space either sideSUBSTRING()
- Splits string upCHARINDEX()
- Index of a character in stringCONCAT()
- Combines stringsDIFFERENCE()
- Returns a rating of how different 2 strings are (1=diff, 4=same)
The following statement will split an email into the username and the domain name
DECLARE @email VARCHAR(100) = 'joe@gmail.com'
SELECT SUBSTRING(@email, 1, CHARINDEX('@', @email)-1) username
,SUBSTRING(@email, CHARINDEX('@', @email)+1, LEN(@email)) domainname
Date Functions
There are many date functions in excel that can be used on the DATE
datatype fields. The following script can be used to extract information extra from date columns using system date functions, this is the basis of a dimensional date table.
SELECT
SalesOrderID
,OrderDate
,ShipDate
,GETDATE() AS TodaysDate
,DATEDIFF(dd,OrderDate,ShipDate) AS ProcessDays
,CAST(DATEDIFF(dd,ShipDate,GETDATE()) / 365. AS DECIMAL(4,1)) AS YearsToToday
,CAST(DATEADD(MONTH, DATEDIFF(MONTH, 0, OrderDate), 0) AS DATE) AS StartOfMonth
,EOMONTH(OrderDate) AS EndOfMonth
,DATENAME(dw,OrderDate) AS OrderDayName
,DATEPART(dw,OrderDate) AS OrderdayNo
FROM Sales.SalesOrderHeader
User defined functions
A user can create functions using CREATE FUNCTION
, a scalar function requires an input and an output and must have BEGIN
and END
keywords denoting where the function operation happens. The following script creates a function that does the same as ISNULL()
.
CREATE FUNCTION dbo.ufnisnull(@col SQL_VARIANT, @rep SQL_VARIANT)
RETURNS SQL_VARIANT
AS
BEGIN
DECLARE @answer SQL_VARIANT
-- Calculation:
IF @col IS NULL
SET @answer = @rep
ELSE SET @answer = @col
RETURN @answer
END
Table Valued Functions
These functions return a table variable as a result, the function itself takes the form of a select statement that references an input variable and does not need BEGIN
or END
. A simple function of this type just returns the values from a table for some condition.
CREATE FUNCTION dbo.fn_productbyminprice(@minprice MONEY)
RETURNS TABLE -- TVF definition
AS
RETURN(
SELECT *
FROM Production.Product
WHERE ListPrice > @minprice
)
Multi Statement Table Valued Functions
These functions are the same as table valued functions except they contain multiple SELECT
queries. For example, a function could return a table with values populated from multiple SELECT
statements. The following script returns a table.
CREATE FUNCTION dbo.lastShipped(@CustomerID INT)
RETURNS @CustomerOrder TABLE
(SaleOrderID INT NOT NULL,
CustomerID INT NOT NULL,
OrderDate DATETIME NOT NULL,
OrderQty INT NOT NULL)
AS
BEGIN
DECLARE @MaxDate DATETIME
SELECT @MaxDate = MAX(OrderDate)
FROM Sales.SalesOrderHeader
WHERE CustomerID = @CustomerID
INSERT @CustomerOrder
SELECT a.SalesOrderID, a.CustomerID, a.OrderDate, b.OrderQty
FROM Sales.SalesOrderHeader a
INNER JOIN Sales.SalesOrderHeader b
ON a.SalesOrderID = b.SalesOrderID
INNER JOIN Production.Product c
ON b.ProductID = c.ProductID
WHERE a.OrderDate = @MaxDate
AND a.CustomerID = @CustomerID
RETURN
END
Views
Views are like a CTE but without containing anything other than the query. Accessing a view will just run the query that is stored in that view making them very lightweight to store but not to run.
CREATE VIEW vw_price
AS
(
SELECT ListPrice
FROM Production.Product
)
vw_price
can then be used in any query and it will run the section of code in the brackets.
Stored Procedures
Procs are used in db’s to allow analysts to query data without doing anything dodgy. Often the analyst will be denied any access EXCEPT use of procs.
System defined stored procedures
Sp_who
– who’s connected to systemSp_who2
– Detailed version of sp_whoSp_who2 n
– Detailed info on user ‘n’Sp_attach_db
– Attach a DB file to server using filepath
User defined stored procedures
SP to add a row to table:
-- DEFINE/EDIT SP --
GO
CREATE proc usp_addrow
@productID VARCHAR(50) = NULL,
@cat VARCHAR(50) = NULL,
@subcat VARCHAR(50) = NULL,
AS
-- integrity --
IF left(@productID, 1) NOT LIKE '[A-Z]'
BEGIN
PRINT 'SP FAILED, productID should start with A-Z'
RETURN 99 -- Returns no. to front end and quits proc
END
-- check if already exists --
DECLARE @cnt INT
SELECT @cnt = count(*) FROM Product WHERE productID = @productID
IF @cnt > 0
BEGIN
PRINT 'Product code already in table '
RETURN
END
-- proc --
INSERT INTO Product(productID, cat, subcat, [desc])
SELECT @productID, @cat, @subcat, @desc
RETURN 0
GO
-- EXECUTION --
DECLARE @retvalue INT
EXEC @retvalue = usp_addrow '5AA', 'AAA', 'A' 'TEST CHAPTER'
SELECT @retvalue -- 99 if fail, 0 if pass
SP to get Metadata about a table:
CREATE PROC usp_tablesyntax
@tabname NVARCHAR(255)
AS
SELECT 'Create table ' + @tabname + ' ('
UNION ALL
SELECT
c.name
+ ' ' + tp.name
+ ' (' + CAST(c.max_length AS VARCHAR(10)) + '),'
FROM sys.tables t
INNER JOIN sys.columns c
ON t.object_id = c.object_id
INNER JOIN sys.types tp
ON c.user_type_id = tp.user_type_id
WHERE t.name = '' + @tabname + ''
UNION ALL
SELECT ')'
GO
-- EXECUTION --
EXEC usp_tablesyntax Production.Product
GO
Triggers
Triggers are scripts that are set to run when a specific condition is fullfilled. This is useful for automating a workflow or blocking certain edits.
The following trigger will update modifieddate
to todays date when listprice
is changed.
CREATE trigger tr_scd2_pricechange
ON Production.Product
FOR UPDATE
AS
SET NOCOUNT ON -- Does not return 'count of rows changed'
UPDATE p
SET p.ModifiedDate = getdate() -- Update modified date to todays date
FROM Production.Product p
JOIN inserted i -- Join on new values
ON p.ProductID = i.ProductID
JOIN deleted d -- Join on old values
ON p.ProductID = d.ProductID
WHERE i.ListPrice <> d.ListPrice -- Where price has changed
GO
-- TESTING --
UPDATE Production.Product
SET ListPrice = 19.99
WHERE ProductID = 5
Geographic functions
SQL has some special functions for converting and displaying geographic data. To use these functions a variable must be defined as geographic type using GEOGRAPHY
.
The following script declares a geographic variable and inputs the coordinates for Oxford, UK. A SELECT
statement then gets address data from Adventureworks, displays the distance from Oxford and orders by this distance. STAsText()
returns the coordinates for SpatialLocation
and STDistance
returns the distance (in millimetres) between SpatialLocation
and ox
.
DECLARE @ox GEOGRAPHY
SELECT @ox = GEOGRAPHY::STGeomFromText('POINT(-1.258289 51.753475)', 4326) -- Oxford co-ordinates
SELECT
AddressLine1
,City
,PostalCode
,SpatialLocation.STAsText() AS latlong
,SpatialLocation.STDistance(@ox) /1000000. km_from_ox
FROM Person.Address
ORDER BY km_from_ox ASC