Python #2: Advanced Functions
This is the second in a series of Python notes I made during the Kubrick Data Engineering training course.
#1: Basics
#2: Advanced
#3: Scraping
#4: Pandas
#5: Matplotlib
Advanced Functions
- Do not use mutable objects as default values in a function (
y=[]
) as the object itself will be altered each time the function is run. - Functions that mutate their input values or change the state of other program areas are said to have side effects - this is generally discouraged as they can cause bugs.
- A local function variable can be explicitly defined as global using
global <var>
.
*Args & **Kwargs
Args and Kwargs are used in functions to represent a variable number of inputs, they can be combined with normal input variables.
def my_func_mix(x, *args, y=5, **kwargs):
print(args)
print(kwargs)
my_func_mix(5, 1,2,3, y=10, z=4)
# (1, 2, 3)
# {'z': 4}
Error Handling
Error handling or exception control is dealing with errors that may occur in a program without crashing the program itself. This can be done using if <condition>: raise Exception
but more common is try except
.
try:
x = 4/0
print('Calculation successful')
except ZeroDivisionError:
print('Zero Division Error')
except IndexError:
print('Index Error')
except Exception as e:
print('{}'.format(e))
Decorators
Decorators are functions whose primary purpose is to wrap another function, they are denoted by @.
import datetime as dt
def wrapper(f):
def callf(*args, **kwargs):
start = dt.datetime.now()
r = f(*args, **kwargs)
end = dt.datetime.now()
print(end - start)
return r
return callf
@wrapper
def add_five(x):
return x+5
add_five(10)
# 0:00:01
# 15
Generators
If a function uses the yield keyword then it is a generator, yield acts like return
. Generators are different to functions as they have a __next__
method which will return the next yield value when called. Generators act like lists in many ways but are more memory efficient as they generate values when needed.
def countdown(n):
print('Counting down...')
while n >= 0:
print(n)
yield n # Like a return
n -=1
g = countdown(10)
g.__next__()
# Counting down...
# 10
Run a second time the output will change;
g.__next__()
# 9
Comprehensions
List and generator comprehensions are techniques to apply some function to every value in a list / generator.
List comprehension:
my_list = [1,2,3,4,5]
def timestwo(x):
return x*2
[timestwo(i) for i in my_list if i % 2 == 0]
# [4, 8]
Generator comprehension:
g = (2*i for i in my_list)
list(g) # Explicitly tell generator to show as list
# [2, 4, 6, 8, 10]
Lambda
Lambda is the Python keyword for an in-line function. The syntax is lambda <input> : <output>
.
square = lambda x: x**2
square(5)
# 25
Maps
Maps allow us to apply a function to every item in an iterable object
my_list = [1,2,3,4,5]
list(map(lambda x: x+3, my_list))
# [4, 5, 6, 7, 8]
Zip
Zip is used to combine two lists into a dictionary type object
keys = ['Sea', 'Soil', 'Grass']
vals = ['blue', 'brown', 'green']
list(zip(keys, vals))
# [('Sea', 'blue'), ('Soil', 'brown'), ('Grass', 'green')]
dict(list(zip(keys, vals)))
# {'Sea': 'blue', 'Soil': 'brown', 'Grass': 'green'}
File I/O
import os
# Current directory
os.getcwd()
# 'C:\\Users\\admin\\code\\Python'
# List files in current directory
os.listdir()
# ['.ipynb_checkpoints', 'test_script.py.txt', 'Untitled.ipynb']
# List all jupyter notebooks in cwd
for file in os.listdir():
if file.endswith('.ipynb'):
print('{} is a Jupyter notebook'.format(file))
# Untitled.ipynb is a Jupyter notebook
Reading and writing to files can be done line wise or all at once
# Read file all in one ('r')
filepath = r'C:\\Users\\admin\\Documents\\test.txt'
with open(filepath, 'r') as f:
lines = f.readlines()
print(lines)
f.close()
# ['The quick brown fox\n', 'jumped over the lazy dog']
# Read file line wise
file_path = r'C:\\Users\\admin\\Documents\\test.txt'
f = open(file_path, 'r')
for line in f:
print(line)
f.close()
# The quick brown fox
# jumped over the lazy dog
# Creates or overwrite file ('w')
file_path = r'C:\\Users\\admin\\Documents\\test2.txt'
f = open(file_path, 'w')
f.writelines('hello\n')
f.writelines('goodbye')
f.close()
# Append to file ('a')
file_path = r'C:\\Users\\admin\\Documents\\test.txt'
f = open(file_path, 'a')
f.writelines('and went to market\n')
Documentation / Metadata
Program structure
Special functions can be used to find the structure of the program and objects
# Check whether object is callable:
callable(len)
# True
# Check type (returns function as it is not bound to an instance of object)
type(Dog.bark)
# function
# Returns method as d is an instance of object Dog
d=Dog()
type(d.bark)
# method
type(os)
# module
Documentation string
When a function is defined, the first line is often a string which is called the documentation string.
def add_five(x):
'''
This is the documentation string
'''
return x + 5
help(add_five)
# Help on function add_five in module __main__:
# add_five(x)
# This is the documentation string
Reference counting
References to an object can be counted using sys.getrefcount(<var>)
. If an object has no references it will be collected by the garbage collector and freed from memory. An object with a single real reference will show a reference count of 2-3 as the getrefcount
method itself creates references.
my_str = 'ggg'
import sys
sys.getrefcount(my_str)
# 2
dir(my_object)
- Shows all methods available for my_object
Regular Expressions
Regular Expressions (RegEx) are special codes used to search a string for matching characters or patterns. They can involve normal strings but also include characters for general cases of a type of element. The special characters are:
General Matching
----------------
\d: digit
\w: word (digits count as words)
\s: space or tab
\D: Not digit
\W: Not word
\S: Not space or tab
Quanitifiers
------------
* : 0 or more
+ : 1 or more
? : 0 or 1
{n}: Exactly n
Wildcards
---------
. - Matches anything
Character sets
--------------
[ab] : Match a or b
[a-b] : Match anything from a to b
An example for finding phone numbers of the form 07392-244-112 or 07221 222 456:
import re
# Define RegEx
pattern = re.compile(r'\d{5}[-\s]\d{3}[-\s]\d{3}')
# Print list of results from finditer()
list(pattern.finditer(document))
Modules
PYODBC
PYODBC is a module that allows communication with an SQL server through Python. The following is a short intro with the basic commands to set up a connection and send queries to the server.
import pyodbc
# Create connection function
def connect():
conn = pyodbc.connect(Trusted_Connection = 'yes',
driver = '{SQL Server}',
server = '.',
database = 'main')
return conn
# Define connection alias
conn = connect()
Once the connection is established any number of queries can be sent using a cursor. A cursor is a temporary working area to hold the query results.
# Define cursor for query
cursor = conn.cursor()
# Write SQL query as a Python string
query = '''
create table dbo.monsters(
name varchar(20),
home varchar(20),
strength int)
'''
# Execute query
cursor.execute(query)
# Commit to server
conn.commit()
Data can be inserted into a table from a Python list of tuples (for example) using special pyodbc placeholders ?
.
cursor = conn.cursor()
data = [('King Rat', 'London Underground', 8),
('Slenderman', 'Behind you', 1)]
for d in data:
# ? is used as placeholder
query = '''insert into dbo.monsters values (?,?,?)'''
cursor.execute(query, d[0], d[1], d[2])
conn.commit()
To retrieve data from a table we must use either cursor.fetchall()
or cursor.fetchone()
.
When finished with a connection we should use conn.close() to close the connection.
cursor = conn.cursor()
query = '''select * from dbo.monsters'''
cursor.execute(query)
# Fetch all results
results = cursor.fetchall()
conn.commit()
# Close connection
conn.close()
# Print results
results