Space and Time Complexity
Space complexity refers to the amount of memory used by an algorithm to complete its execution, as a function of the size of the input. The space complexity of an algorithm can be affected by various factors such as the size of the input data, the data structures used in the algorithm, the number and size of temporary variables, and the recursion depth. Time complexity refers to the amount of time required by an algorithm to run as the input size grows. It is usually measured in terms of the "Big O" notation, which describes the upper bound of an algorithm's time complexity.
Why do you think a programmer should care about space and time complexity?
- A programmer should care about space and time complexity because these factors play a crucial role in determining the efficiency of a program. Time complexity refers to the amount of time it takes for a program to run, given a particular input size. This is important because programs that take too long to run can be a major performance bottleneck and lead to poor user experiences. Moreover, as the size of the input grows, programs with poor time complexity can become completely impractical or infeasible to run. Space complexity refers to the amount of memory a program uses to store data and execute code. Inefficient use of memory can lead to issues like memory leaks, slow performance, and even crashes. This is especially important in the context of embedded systems or other devices with limited memory resources.
Take a look at our lassen volcano example from the data compression tech talk. The first code block is the original image. In the second code block, change the baseWidth to rescale the image.
from IPython.display import Image, display
from pathlib import Path
# prepares a series of images
def image_data(path=Path("images/"), images=None): # path of static images is defaulted
for image in images:
# File to open
image['filename'] = path / image['file'] # file with path
return images
def image_display(images):
for image in images:
display(Image(filename=image['filename']))
if __name__ == "__main__":
lassen_volcano = image_data(images=[{'source': "Peter Carolin", 'label': "Lassen Volcano", 'file': "lassen-volcano.jpg"}])
image_display(lassen_volcano)
from IPython.display import HTML, display
from pathlib import Path
from PIL import Image as pilImage
from io import BytesIO
import base64
# prepares a series of images
def image_data(path=Path("images/"), images=None): # path of static images is defaulted
for image in images:
# File to open
image['filename'] = path / image['file'] # file with path
return images
def scale_image(img):
#baseWidth = 625
#baseWidth = 1250
#baseWidth = 2500
baseWidth = 20000 # see the effect of doubling or halfing the baseWidth
#baseWidth = 10000
#baseWidth = 20000
#baseWidth = 40000
scalePercent = (baseWidth/float(img.size[0]))
scaleHeight = int((float(img.size[1])*float(scalePercent)))
scale = (baseWidth, scaleHeight)
return img.resize(scale)
def image_to_base64(img, format):
with BytesIO() as buffer:
img.save(buffer, format)
return base64.b64encode(buffer.getvalue()).decode()
def image_management(image): # path of static images is defaulted
# Image open return PIL image object
img = pilImage.open(image['filename'])
# Python Image Library operations
image['format'] = img.format
image['mode'] = img.mode
image['size'] = img.size
image['width'], image['height'] = img.size
image['pixels'] = image['width'] * image['height']
# Scale the Image
img = scale_image(img)
image['pil'] = img
image['scaled_size'] = img.size
image['scaled_width'], image['scaled_height'] = img.size
image['scaled_pixels'] = image['scaled_width'] * image['scaled_height']
# Scaled HTML
image['html'] = '<img src="data:image/png;base64,%s">' % image_to_base64(image['pil'], image['format'])
if __name__ == "__main__":
# Use numpy to concatenate two arrays
images = image_data(images = [{'source': "Peter Carolin", 'label': "Lassen Volcano", 'file': "lassen-volcano.jpg"}])
# Display meta data, scaled view, and grey scale for each image
for image in images:
image_management(image)
print("---- meta data -----")
print(image['label'])
print(image['source'])
print(image['format'])
print(image['mode'])
print("Original size: ", image['size'], " pixels: ", f"{image['pixels']:,}")
print("Scaled size: ", image['scaled_size'], " pixels: ", f"{image['scaled_pixels']:,}")
print("-- original image --")
display(HTML(image['html']))
numbers = list(range(1000))
print(numbers)
print(numbers[263])
ncaa_bb_ranks = {1:"Alabama",2:"Houston", 3:"Purdue", 4:"Kansas"}
#look up a value in a dictionary given a key
print(ncaa_bb_ranks[1])
Space
This function takes two number inputs and returns their sum. The function does not create any additional data structures or variables that are dependent on the input size, so its space complexity is constant, or O(1). Regardless of how large the input numbers are, the function will always require the same amount of memory to execute.
def sum(a, b):
return a + b
print(sum(90,88))
print(sum(.9,.88))
Time
An example of a linear time algorithm is traversing a list or an array. When the size of the list or array increases, the time taken to traverse it also increases linearly with the size. Hence, the time complexity of this operation is O(n), where n is the size of the list or array being traversed.
for i in numbers:
print(i)
Space
This function takes a list of elements arr as input and returns a new list with the elements in reverse order. The function creates a new list reversed_arr of the same size as arr to store the reversed elements. The size of reversed_arr depends on the size of the input arr, so the space complexity of this function is O(n). As the input size increases, the amount of memory required to execute the function also increases linearly.
def reverse_list(arr):
n = len(arr)
reversed_arr = [None] * n #create a list of None based on the length or arr
for i in range(n):
reversed_arr[n-i-1] = arr[i] #stores the value at the index of arr to the value at the index of reversed_arr starting at the beginning for arr and end for reversed_arr
return reversed_arr
print(numbers)
print(reverse_list(numbers))
Time
An example of a quadratic time algorithm is nested loops. When there are two nested loops that both iterate over the same collection, the time taken to complete the algorithm grows quadratically with the size of the collection. Hence, the time complexity of this operation is O(n^2), where n is the size of the collection being iterated over.
for i in numbers:
for j in numbers:
print(i,j)
Space
This function takes two matrices matrix1 and matrix2 as input and returns their product as a new matrix. The function creates a new matrix result with dimensions m by n to store the product of the input matrices. The size of result depends on the size of the input matrices, so the space complexity of this function is O(n^2). As the size of the input matrices increases, the amount of memory required to execute the function also increases quadratically.
def multiply_matrices(matrix1, matrix2):
m = len(matrix1)
n = len(matrix2[0])
result = [[0] * n] * m #this creates the new matrix based on the size of matrix 1 and 2
for i in range(m):
for j in range(n):
for k in range(len(matrix2)):
result[i][j] += matrix1[i][k] * matrix2[k][j]
return result
print(multiply_matrices([[1,2],[3,4]], [[3,4],[1,2]]))
Time
An example of a log time algorithm is binary search. Binary search is an algorithm that searches for a specific element in a sorted list by repeatedly dividing the search interval in half. As a result, the time taken to complete the search grows logarithmically with the size of the list. Hence, the time complexity of this operation is O(log n), where n is the size of the list being searched.
def binary_search(arr, low, high, target):
while low <= high:
mid = (low + high) // 2 #integer division
if arr[mid] == target:
return mid
elif arr[mid] < target:
low = mid + 1
else:
high = mid - 1
target = 263
result = binary_search(numbers, 0, len(numbers) - 1, target)
print(result)
Space
The same algorithm above has a O(logn) space complexity. The function takes an array arr, its lower and upper bounds low and high, and a target value target. The function searches for target within the bounds of arr by recursively dividing the search space in half until the target is found or the search space is empty. The function does not create any new data structures that depend on the size of arr. Instead, the function uses the call stack to keep track of the recursive calls. Since the maximum depth of the recursive calls is O(logn), where n is the size of arr, the space complexity of this function is O(logn). As the size of arr increases, the amount of memory required to execute the function grows logarithmically.
Time
An example of an O(2^n) algorithm is the recursive implementation of the Fibonacci sequence. The Fibonacci sequence is a series of numbers where each number is the sum of the two preceding ones, starting from 0 and 1. The recursive implementation of the Fibonacci sequence calculates each number by recursively calling itself with the two preceding numbers until it reaches the base case (i.e., the first or second number in the sequence). The algorithm takes O(2^n) time in the worst case because it has to calculate each number in the sequence by making two recursive calls.
def fibonacci(n):
if n <= 1:
return n
else:
return fibonacci(n-1) + fibonacci(n-2)
#print(fibonacci(5))
#print(fibonacci(10))
#print(fibonacci(20))
#print(fibonacci(30))
print(fibonacci(40))
Space
This function takes a set s as input and generates all possible subsets of s. The function does this by recursively generating the subsets of the set without the first element, and then adding the first element to each of those subsets to generate the subsets that include the first element. The function creates a new list for each recursive call that stores the subsets, and each element in the list is a new list that represents a subset. The number of subsets that can be generated from a set of size n is 2^n, so the space complexity of this function is O(2^n). As the size of the input set increases, the amount of memory required to execute the function grows exponentially.
def generate_subsets(s):
if not s:
return [[]]
subsets = generate_subsets(s[1:])
return [[s[0]] + subset for subset in subsets] + subsets
print(generate_subsets([1,2,3]))
#print(generate_subsets(numbers))
Using the time library, we are able to see the difference in time it takes to calculate the fibonacci function above.
- Based on what is known about the other time complexities, hypothesize the resulting elapsed time if the function is replaced.
import time
start_time = time.time()
print(fibonacci(34))
end_time = time.time()
total_time = end_time - start_time
print("Time taken:", total_time, "seconds")
start_time = time.time()
print(fibonacci(37))
end_time = time.time()
total_time = end_time - start_time
print("Time taken:", total_time, "seconds")
Hacks
- Record your findings when testing the time elapsed of the different algorithms.
Although we will go more in depth later, time complexity is a key concept that relates to the different sorting algorithms. Do some basic research on the different types of sorting algorithms and their time complexity.
Shell Sort: Shell Sort is a variant of the Insertion Sort algorithm that improves performance by comparing distant elements and gradually decreasing the gap between them. Its time complexity is O(n log n) in the best case and O(n^2) in the worst case.
Counting Sort: This algorithm operates by calculating, for every individual input element, the quantity of elements that have a smaller value than it. Using this data, the algorithm arranges each element in its appropriate position within the output list. Counting Sort exhibits a time complexity of O(n+k), wherein k refers to the range of potential input values.
Bucket Sort: Bucket Sort is a modified version of Counting Sort that splits the input into multiple buckets, sorts each bucket individually using a different sorting algorithm, and finally combines the sorted buckets. If the input is evenly distributed across the range of values, Bucket Sort can achieve a time complexity of O(n).
Why is time and space complexity important when choosing an algorithm?
- As a programmer, it is crucial to give due importance to the space and time complexity of code, as these factors can significantly impact the code's effectiveness and ease of understanding. When working on large-scale projects, the ideal code is one that occupies a minimal amount of space and takes the least amount of time to execute. Therefore, optimizing code for these two factors should be a top priority for any programmer.
Should you always use a constant time algorithm / Should you never use an exponential time algorithm? Explain?
The decision to use a constant time algorithm or an exponential time algorithm is not always mandatory or unfavorable. It varies depending on the problem at hand, the magnitude of the input data, and the resources accessible to you. If the input data is small and resources are limited, it may be optimal to use a constant time algorithm. However, if the input data is extensive, using a constant time algorithm may not be feasible, and you may need to opt for an algorithm with higher time complexity, such as an exponential time algorithm.
Exponential time algorithms should be avoided whenever the input size is relatively large, as they can take an impractically long time to execute. For example, if you are building a web application that needs to process large amounts of data in real-time, using an exponential time algorithm to perform this task would not be a suitable choice, as it would cause unacceptable delays and negatively affect user experience. In such cases, it would be better to use an algorithm with a lower time complexity, such as a polynomial time algorithm, that can handle larger input sizes efficiently.
What are some general patterns that you noticed to determine each algorithm's time and space complexity?
- When an algorithm has nested loops, its time complexity is typically O(n^2), O(n^3), or another polynomial time complexity, where n denotes the size of the input data. Recursive algorithms have their time complexity expressed using a recurrence relation and are often related to the number of recursive calls and the size of the processed data. Sorting and searching algorithms' time complexity is usually expressed by the number of elements being sorted or searched. For instance, quicksort and merge sort have a time complexity of O(n log n), whereas linear search has a time complexity of O(n). If an algorithm employs data structures such as arrays, lists, or trees, the space complexity is typically proportional to the size of the stored data. For example, an algorithm that creates an array of size n has a space complexity of O(n).*
Complete the Time and Space Complexity analysis questions linked below. Practice
- Answer is option 3
- In this case, the first loop has a time complexity of O(N), while the second loop has a time complexity of O(M). However, since N and M are independent variables, it's impossible to determine which one is the dominant term. Therefore, the overall time complexity of the problem is expressed as O(N+M). Meanwhile, the space complexity of an algorithm refers to the amount of memory it uses to solve a problem. For the given problem, the space complexity is constant or O(1) because the size of the variables used in the algorithm does not depend on the size of the input. In this case, the algorithm stores the sum of N random numbers in a single variable, a, which means that the only space being used is the variable a. Therefore, the space complexity is always 1.
- Answer is option 4
- In this case, the algorithm contains two loops that are nested, and they both iterate over the same collection. This means that as the size of the collection increases, the time taken for the algorithm to complete will grow exponentially, specifically quadratically, meaning that the time complexity of the algorithm will increase at a rate proportional to the square of the input size.
- Answer is option 2
- The given algorithm has two for statements, with the first one iterating N times and the second one iterating logN times. The variable j is used to represent the step size, which increases with each iteration of the second for statement until 2 to the power of j equals n. When this condition is met, the value of j is equal to the logarithm base 2 of n. Once j reaches this value, the algorithm stops. Based on this, the time complexity of the algorithm can be expressed as N multiplied by logN.
- Answer is option 2
- The worst-case time complexity of an algorithm is the maximum time it takes to run for any input size. When we say that one algorithm is more efficient than another, we mean that the first algorithm has a smaller worst-case time complexity than the second one. This means that as the input size gets larger and larger, the first algorithm will eventually become faster than the second one for big enough inputs.
- Answer is option 4
- In simpler terms, by examining how the code functions, I can see that every time it loops, "i" is divided in half, added, and assigned to the variable "a". As "i" gets halved repeatedly, it eventually gets close to 0, which means that "N" decreases rapidly at first but gradually slows down, following a logarithmic pattern.
- Answer is option 3
- Programmers need to think about both time and memory when deciding how efficient an algorithm is. They must make sure that the algorithm doesn't use too much memory, while also ensuring that the program runs fast enough.
- Answer is option 2
- To determine the time complexity of an algorithm, we first identify the operations that the algorithm performs, and then we count the number of times each operation is executed as a function of the input size n.
- Answer is option 3
- Answer is option 3
- The first for statement runs (n) times. the second for runs (n-1) times because "i" takes on values from 0 to n-1. Thus the overall time complexity is n*(n-1)
- Answer is FALSE
- To have a worst case running time means that as the input size increases towards infinity, algorithm B will eventually become faster than A for large enough inputs. However for smaller inputs, it may be the case that algorithm A is faster, thus B does not always run faster than A