< Back

Env

import numpy as np
from matplotlib_inline import backend_inline
from d2l import torch as d2l

Partial derivatives
Given a function \( y = f(x_1, x_2, \ldots, x_n) \)
The partial derivative of \( x_i \) is defined as:
 \( \frac{\partial y}{\partial x_i} = \lim_{h \to 0} \frac{f(x_1, \ldots, x_{i-1}, x_i + h, x_{i+1}, \ldots, x_n) - f(x_1, \ldots, x_i, \ldots, x_n)}{h} \)
Treat \( x_1, \ldots, x_{i-1}, x_{i+1}, \ldots, x_n \) as constants, and calculate the derivative of \( y \) with respect to \( x_i \).

Given \( y = f(x_1, x_2, \ldots, x_n) \)
\(\frac{\partial y}{\partial x_i}\) \( = \frac{\partial f}{\partial x_i} \) \( = f_{x_i} \) \( = f_i = D_i f \) \( = D_{x_i}f\)

Gradient
Given a function \( f: \mathbb{R}^n \to \mathbb{R} \), which input \(\mathbf{x} = [x_1, x_2, \ldots, x_n]^\top\), output scalar
The Gradient of \( f \):
 \( \nabla_\mathbf{x} f(\mathbf{x}) = \left[ \frac{\partial f}{\partial \mathbf{x}_1}, \frac{\partial f}{\partial \mathbf{x}_2}, \ldots, \frac{\partial f}{\partial \mathbf{x}_n} \right]^\top \)

If \(\mathbf{x}\) is a n-dimensional vector, then:

For all \( \mathbf{A} \in \mathbb{R}^{m \times n} \), \( \nabla_\mathbf{x} \mathbf{A}\mathbf{x} = \mathbf{A}^\top \)
For all \( \mathbf{A} \in \mathbb{R}^{n \times m} \), \( \nabla_\mathbf{x} \mathbf{x}^\top\mathbf{A} = \mathbf{A} \)
For all \( \mathbf{A} \in \mathbb{R}^{n \times n} \), \( \nabla_\mathbf{x} \mathbf{x}^\top\mathbf{A}\mathbf{x} = (\mathbf{A} + \mathbf{A}^\top)\mathbf{x} \)
\( \nabla_\mathbf{x} \|\mathbf{x}\|^2 = \nabla_\mathbf{x} \mathbf{x}^\top\mathbf{x} = 2\mathbf{x} \)
\( \nabla_\mathbf{X} \|\mathbf{X}\|^{2}_F = 2\mathbf{X} \)

Chain rule
Given differentiable functions \( y = f(u) \) and u = \( g(x) \),
 \( \frac{dy}{dx} = \frac{dy}{du} \frac{du}{dx} \)
Given differentiable functions \( y = f(u_1, u_2, \ldots, u_m) \) and u = \( g(x_1, x_2, \ldots, x_n) \), For i = 1, 2, ..., n:
 \( \frac{\partial y}{\partial x_i} = \frac{\partial y}{\partial u_1} \frac{\partial u_1}{\partial x_i} \) \( + \frac{\partial y}{\partial u_2} \frac{\partial u_2}{\partial x_i} + \ldots + \frac{\partial y}{\partial u_m} \frac{\partial u_m}{\partial x_i} \)