Convolutions and Backpropagations

We all know the forward pass of a Convolutional layer uses Convolutions. But, the backward pass during Backpropagation also uses Convolutions!

So, let us dig in and start with understanding the intuition behind Backpropagation. (And for this, we are going to rely on Andrej Karpathy’s amazing CS231n lecture — https://www.youtube.com/watch?v=i94OvYb6noo).

Understanding Chain Rule in Backpropagation:

Consider this equation

f(x,y,z) = (x + y)z

To make it simpler, let us split it into two equations.

Computational Graph of f = q*z where q = x + y
Calculating gradients and their values in the computational graph
How do we find ∂f/∂x and ∂f/∂y
Chain rule of Differentiation
Backward pass of the Computational graph with all the gradients

Chain Rule in a Convolutional Layer

Now that we have worked through a simple computational graph, we can imagine a CNN as a massive computational graph. Let us say we have a gate f in that computational graph with inputs x and y which outputs z.

A simple function f which takes x and y as inputs and outputs z
Local gradients can be computed using the function f. Now, we need to find 𝛛L/𝛛x and 𝛛L/𝛛y, as it needs to be propagated to other layers.
Finding the loss gradients for x and y
A simple Convolutional Layer example with Input X and Filter F
Convolution Function between X and F, gives Output O
Convolution operation giving us values of the Output O
Function f during Backward pass

Well, but why do we need to find ∂L/∂X and ∂L/∂F?

Why do we need to find ∂L/∂X and ∂L/∂F

So let’s find the gradients for X and F — ∂L/∂X and ∂L/∂F

Finding ∂L/∂F

This has two steps as we have done earlier.

  • Find the local gradient ∂O/∂F
  • Find ∂L/∂F using chain rule
Formula to derive a partial derivative of a matrix with respect to a matrix, using the chain rule
Derivatives of ∂L/∂F
Using local gradients values from Equation A
∂L/∂F = Convolution of input matrix X and loss gradient ∂L/∂O

∂L/∂F is nothing but the convolution between Input X and Loss Gradient from the next layer ∂L/∂O

Finding ∂L/∂X:

Step 1: Finding the local gradient — ∂O/∂X:

Local gradients ∂O/∂X
Derivatives of ∂L/∂X using local gradients from Equation
Flipping Filter F by 180 degrees — flipping it vertically and horizontally
Full Convolution operation visualized between 180-degree flipped Filter F and loss gradient ∂L/∂O
∂L/∂X can be represented as ‘full’ convolution between a 180-degree rotated Filter F and loss gradient ∂L/∂O

Both the Forward pass and the Backpropagation of a Convolutional layer are Convolutions

Summing it up:

How to calculate ∂L/∂X and ∂L/∂F

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Pavithra Solai

Pavithra Solai

Staff Data Scientist - Computer Vision, Swiggy. Previously Co-Founder of Kint, acqui-hired by Swiggy