SQL #2: SQL Design
This is the second in a series of SQL notes I made during the Kubrick Data Engineering training course. The others are:
#1: Database Design
#3: DQL
#4: DDL
#5: DML
#6: Sorting
#7: Joining
#8: Temporary Tables
#9: Programmable Objects
#10: Indexing
#11: Operation & Optimization
All the posts in this series are my personal note taking and may be updated as the course progresses.
In SQL#1, I laid out some of the concepts and structures of the modern databases that SQL interacts with. This post will follow on from that by introducing SQL as a language.
SQL Constructs
Within the T-SQL language there are multiple constructs which are almost mini languages, each with a unique function.
Data Query Language (DQL)
Queries are requests to the database to retrieve some data in a particular format. This is done with the SELECT
function.
Data Definition Language (DDL)
Data can be directly edited using the CREATE
, DROP
, and ALTER
functions.
Data Manipulation Language (DML)
Data is manipulated and changed using INSERT
, UPDATE
, and DELETE
.
Data Control Language (DCL)
Finally, user access to a database can be controlled using GRANT
, DENY
, and REVOKE
.
Table Organisation
Object Names
Objects in SQL have the following naming convention: SERVERNAME.DBNAME.SCHEMA.OBJECT
where the shema represents the entity for which the object is an attribute.
Data Fields
The data fields for a table usually go in order (columns left to right). The order of the columns to be selected is arbritary unless exporting to Python or R when a particular order may be preferred. The following order is a general standard but different organisations may have their own style guide.
-
Local Keys
- Surrogate
- Business
-
Foreign Keys
-
Dates
-
Attributes (strings)
-
Measures (numeric)
-
Metadata
Data Types
SQL accepts a number of datatypes for variables. The common INT
and CHAR
types are listed along with their attributes.
Data type | Size | Features |
---|---|---|
TINYINT |
1 byte | Unsigned (+) |
SMALLINT |
2 byte | Signed (+/-) |
INT |
4 byte | Signed (+/-) |
BIGINT |
8 byte | Signed (+/-) |
VARCHAR |
1 byte/char + 2 byte | Non-UNICODE, Variable width, maxchar=8000 |
CHAR |
1 byte/char | Non-UNICODE, Fixed width, maxchar=8000 |
NVARCHAR |
2 byte/char + 2 byte | UNICODE, Variable width, maxchar=4000 |
NCHAR |
2 byte/char | UNICODE, Fixed width, maxchar=4000 |
VARBINARY |
1 byte/int + 2 | Store ‘blobs’ of binary, maxchar=4000 |
DECIMAL(p,s)/NUMERIC(p,s) |
4 byte | Variable precision & scale |
FLOAT |
4-8 byte | Signed (+/-) |
Other datatypes available in SQL are MONEY
, DATE
, TIME
, DATETIME
(to milliseconds) , DATETIME2
(to nanoseconds), DATETIMEOFFSET
(with timezone). SQL_VARIANT
is a special data type which can hold any type of value.
Note: SQL performs implicit conversion from strings to ints when needed but not ints to strings. For the statement WHERE ListPrice > '40'
the '40'
will be converted to 40
. However stating WHERE PhoneNumber = 07734827429
will have to convert each PhoneNumber in the table to int to do the comparison which would be costly.
Environment Set-Up
When using SQL client side software a connection must be made to a relevant SQL server. If the server is running on your own computer you can use the address .
or 127.0.0.1
which both refer to ‘self’. Once connected, a new query can be initiated for that server.
A database is selected using the USE
keyword;
USE AdventureWorks
GO
Once this is done the client knows the SERVERNAME
and the DBNAME
so any reference to a table only requires SCHEMA.OBJECT
. The default Schema name is dbo
so it is possible to just referenct the object name, if that object is the default dbo.OBJECT
.
GO
is the syntax used to seperate transactions, everything between GO
statements will operate as a transaction before anything afterwards.
It is good practice to use schemas as staging areas. For example STAGE
for pre-processed tables and TRASH
for tables to be dropped.
CREATE SCHEMA trash
GO
ALTER SCHEMA trash
TRANSFER [Purchasing].[PurchaseOrders]