SQL #2: SQL Design

3 minute read

This is the second in a series of SQL notes I made during the Kubrick Data Engineering training course. The others are:
#1: Database Design
#3: DQL
#4: DDL
#5: DML
#6: Sorting
#7: Joining
#8: Temporary Tables
#9: Programmable Objects
#10: Indexing
#11: Operation & Optimization

All the posts in this series are my personal note taking and may be updated as the course progresses.

In SQL#1, I laid out some of the concepts and structures of the modern databases that SQL interacts with. This post will follow on from that by introducing SQL as a language.

SQL Constructs

Within the T-SQL language there are multiple constructs which are almost mini languages, each with a unique function.

Data Query Language (DQL)

Queries are requests to the database to retrieve some data in a particular format. This is done with the SELECT function.

Data Definition Language (DDL)

Data can be directly edited using the CREATE, DROP, and ALTER functions.

Data Manipulation Language (DML)

Data is manipulated and changed using INSERT, UPDATE, and DELETE.

Data Control Language (DCL)

Finally, user access to a database can be controlled using GRANT, DENY, and REVOKE.

Table Organisation

Object Names

Objects in SQL have the following naming convention: SERVERNAME.DBNAME.SCHEMA.OBJECT where the shema represents the entity for which the object is an attribute.

Data Fields

The data fields for a table usually go in order (columns left to right). The order of the columns to be selected is arbritary unless exporting to Python or R when a particular order may be preferred. The following order is a general standard but different organisations may have their own style guide.

Local Keys
- Surrogate
- Business
Foreign Keys
Dates
Attributes (strings)
Measures (numeric)
Metadata

Data Types

SQL accepts a number of datatypes for variables. The common INT and CHAR types are listed along with their attributes.

Data type	Size	Features
`TINYINT`	1 byte	Unsigned (+)
`SMALLINT`	2 byte	Signed (+/-)
`INT`	4 byte	Signed (+/-)
`BIGINT`	8 byte	Signed (+/-)
`VARCHAR`	1 byte/char + 2 byte	Non-UNICODE, Variable width, maxchar=8000
`CHAR`	1 byte/char	Non-UNICODE, Fixed width, maxchar=8000
`NVARCHAR`	2 byte/char + 2 byte	UNICODE, Variable width, maxchar=4000
`NCHAR`	2 byte/char	UNICODE, Fixed width, maxchar=4000
`VARBINARY`	1 byte/int + 2	Store ‘blobs’ of binary, maxchar=4000
`DECIMAL(p,s)/NUMERIC(p,s)`	4 byte	Variable precision & scale
`FLOAT`	4-8 byte	Signed (+/-)

Other datatypes available in SQL are MONEY, DATE, TIME, DATETIME(to milliseconds) , DATETIME2 (to nanoseconds), DATETIMEOFFSET (with timezone). SQL_VARIANT is a special data type which can hold any type of value.

Note: SQL performs implicit conversion from strings to ints when needed but not ints to strings. For the statement WHERE ListPrice > '40' the '40' will be converted to 40. However stating WHERE PhoneNumber = 07734827429 will have to convert each PhoneNumber in the table to int to do the comparison which would be costly.

Environment Set-Up

When using SQL client side software a connection must be made to a relevant SQL server. If the server is running on your own computer you can use the address . or 127.0.0.1 which both refer to ‘self’. Once connected, a new query can be initiated for that server.

A database is selected using the USE keyword;

USE AdventureWorks
GO

Once this is done the client knows the SERVERNAME and the DBNAME so any reference to a table only requires SCHEMA.OBJECT. The default Schema name is dbo so it is possible to just referenct the object name, if that object is the default dbo.OBJECT.
GO is the syntax used to seperate transactions, everything between GO statements will operate as a transaction before anything afterwards.

It is good practice to use schemas as staging areas. For example STAGE for pre-processed tables and TRASH for tables to be dropped.

CREATE SCHEMA trash
GO

ALTER SCHEMA trash
TRANSFER [Purchasing].[PurchaseOrders]