torch.nn.init#
Created On: Jun 11, 2019 | Last Updated On: Jul 07, 2022
Warning
All the functions in this module are intended to be used to initialize neural network
parameters, so they all run in torch.no_grad()
mode and will not be taken into
account by autograd.
- torch.nn.init.calculate_gain(nonlinearity, param=None)[source]#
Return the recommended gain value for the given nonlinearity function.
The values are as follows:
nonlinearity
gain
Linear / Identity
1
Conv{1,2,3}D
1
Sigmoid
1
Tanh
35
ReLU
2
Leaky Relu
1+negative_slope22
SELU
43
Warning
In order to implement Self-Normalizing Neural Networks , you should use
nonlinearity='linear'
instead ofnonlinearity='selu'
. This gives the initial weights a variance of1 / N
, which is necessary to induce a stable fixed point in the forward pass. In contrast, the default gain forSELU
sacrifices the normalization effect for more stable gradient flow in rectangular layers.- Parameters
nonlinearity (Literal['linear', 'conv1d', 'conv2d', 'conv3d', 'conv_transpose1d', 'conv_transpose2d', 'conv_transpose3d', 'sigmoid', 'tanh', 'relu', 'leaky_relu', 'selu']) – the non-linear function (nn.functional name)
param (Optional[Union[int, float]]) – optional parameter for the non-linear function
- Return type
Examples
>>> gain = nn.init.calculate_gain( ... "leaky_relu", 0.2 ... ) # leaky_relu with negative_slope=0.2
- torch.nn.init.uniform_(tensor, a=0.0, b=1.0, generator=None)[source]#
Fill the input Tensor with values drawn from the uniform distribution.
U(a,b).
- Parameters
- Return type
Examples
>>> w = torch.empty(3, 5) >>> nn.init.uniform_(w)
- torch.nn.init.normal_(tensor, mean=0.0, std=1.0, generator=None)[source]#
Fill the input Tensor with values drawn from the normal distribution.
N(mean,std2).
- Parameters
- Return type
Examples
>>> w = torch.empty(3, 5) >>> nn.init.normal_(w)
- torch.nn.init.constant_(tensor, val)[source]#
Fill the input Tensor with the value val.
- Parameters
- Return type
Examples
>>> w = torch.empty(3, 5) >>> nn.init.constant_(w, 0.3)
- torch.nn.init.ones_(tensor)[source]#
Fill the input Tensor with the scalar value 1.
Examples
>>> w = torch.empty(3, 5) >>> nn.init.ones_(w)
- torch.nn.init.zeros_(tensor)[source]#
Fill the input Tensor with the scalar value 0.
Examples
>>> w = torch.empty(3, 5) >>> nn.init.zeros_(w)
- torch.nn.init.eye_(tensor)[source]#
Fill the 2-dimensional input Tensor with the identity matrix.
Preserves the identity of the inputs in Linear layers, where as many inputs are preserved as possible.
Examples
>>> w = torch.empty(3, 5) >>> nn.init.eye_(w)
- torch.nn.init.dirac_(tensor, groups=1)[source]#
Fill the {3, 4, 5}-dimensional input Tensor with the Dirac delta function.
Preserves the identity of the inputs in Convolutional layers, where as many input channels are preserved as possible. In case of groups>1, each group of channels preserves identity
- Parameters
- Return type
Examples
>>> w = torch.empty(3, 16, 5, 5) >>> nn.init.dirac_(w) >>> w = torch.empty(3, 24, 5, 5) >>> nn.init.dirac_(w, 3)
- torch.nn.init.xavier_uniform_(tensor, gain=1.0, generator=None)[source]#
Fill the input Tensor with values using a Xavier uniform distribution.
The method is described in Understanding the difficulty of training deep feedforward neural networks - Glorot, X. & Bengio, Y. (2010). The resulting tensor will have values sampled from U(−a,a) where
a=gain×fan_in+fan_out6Also known as Glorot initialization.
- Parameters
- Return type
Examples
>>> w = torch.empty(3, 5) >>> nn.init.xavier_uniform_(w, gain=nn.init.calculate_gain("relu"))
- torch.nn.init.xavier_normal_(tensor, gain=1.0, generator=None)[source]#
Fill the input Tensor with values using a Xavier normal distribution.
The method is described in Understanding the difficulty of training deep feedforward neural networks - Glorot, X. & Bengio, Y. (2010). The resulting tensor will have values sampled from N(0,std2) where
std=gain×fan_in+fan_out2Also known as Glorot initialization.
- Parameters
- Return type
Examples
>>> w = torch.empty(3, 5) >>> nn.init.xavier_normal_(w)
- torch.nn.init.kaiming_uniform_(tensor, a=0, mode='fan_in', nonlinearity='leaky_relu', generator=None)[source]#
Fill the input Tensor with values using a Kaiming uniform distribution.
The method is described in Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification - He, K. et al. (2015). The resulting tensor will have values sampled from U(−bound,bound) where
bound=gain×fan_mode3Also known as He initialization.
- Parameters
tensor (Tensor) – an n-dimensional torch.Tensor
a (float) – the negative slope of the rectifier used after this layer (only used with
'leaky_relu'
)mode (Literal['fan_in', 'fan_out']) – either
'fan_in'
(default) or'fan_out'
. Choosing'fan_in'
preserves the magnitude of the variance of the weights in the forward pass. Choosing'fan_out'
preserves the magnitudes in the backwards pass.nonlinearity (Literal['linear', 'conv1d', 'conv2d', 'conv3d', 'conv_transpose1d', 'conv_transpose2d', 'conv_transpose3d', 'sigmoid', 'tanh', 'relu', 'leaky_relu', 'selu']) – the non-linear function (nn.functional name), recommended to use only with
'relu'
or'leaky_relu'
(default).generator (Optional[Generator]) – the torch Generator to sample from (default: None)
- Return type
Examples
>>> w = torch.empty(3, 5) >>> nn.init.kaiming_uniform_(w, mode="fan_in", nonlinearity="relu")
Note
Be aware that
fan_in
andfan_out
are calculated assuming that the weight matrix is used in a transposed manner, (i.e.,x @ w.T
inLinear
layers, wherew.shape = [fan_out, fan_in]
). This is important for correct initialization. If you plan to usex @ w
, wherew.shape = [fan_in, fan_out]
, pass in a transposed weight matrix, i.e.nn.init.kaiming_uniform_(w.T, ...)
.
- torch.nn.init.kaiming_normal_(tensor, a=0, mode='fan_in', nonlinearity='leaky_relu', generator=None)[source]#
Fill the input Tensor with values using a Kaiming normal distribution.
The method is described in Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification - He, K. et al. (2015). The resulting tensor will have values sampled from N(0,std2) where
std=fan_modegainAlso known as He initialization.
- Parameters
tensor (Tensor) – an n-dimensional torch.Tensor
a (float) – the negative slope of the rectifier used after this layer (only used with
'leaky_relu'
)mode (Literal['fan_in', 'fan_out']) – either
'fan_in'
(default) or'fan_out'
. Choosing'fan_in'
preserves the magnitude of the variance of the weights in the forward pass. Choosing'fan_out'
preserves the magnitudes in the backwards pass.nonlinearity (Literal['linear', 'conv1d', 'conv2d', 'conv3d', 'conv_transpose1d', 'conv_transpose2d', 'conv_transpose3d', 'sigmoid', 'tanh', 'relu', 'leaky_relu', 'selu']) – the non-linear function (nn.functional name), recommended to use only with
'relu'
or'leaky_relu'
(default).generator (Optional[Generator]) – the torch Generator to sample from (default: None)
- Return type
Examples
>>> w = torch.empty(3, 5) >>> nn.init.kaiming_normal_(w, mode="fan_out", nonlinearity="relu")
Note
Be aware that
fan_in
andfan_out
are calculated assuming that the weight matrix is used in a transposed manner, (i.e.,x @ w.T
inLinear
layers, wherew.shape = [fan_out, fan_in]
). This is important for correct initialization. If you plan to usex @ w
, wherew.shape = [fan_in, fan_out]
, pass in a transposed weight matrix, i.e.nn.init.kaiming_normal_(w.T, ...)
.
- torch.nn.init.trunc_normal_(tensor, mean=0.0, std=1.0, a=-2.0, b=2.0, generator=None)[source]#
Fill the input Tensor with values drawn from a truncated normal distribution.
The values are effectively drawn from the normal distribution N(mean,std2) with values outside [a,b] redrawn until they are within the bounds. The method used for generating the random values works best when a≤mean≤b.
- Parameters
tensor (Tensor) – an n-dimensional torch.Tensor
mean (float) – the mean of the normal distribution
std (float) – the standard deviation of the normal distribution
a (float) – the minimum cutoff value
b (float) – the maximum cutoff value
generator (Optional[Generator]) – the torch Generator to sample from (default: None)
- Return type
Examples
>>> w = torch.empty(3, 5) >>> nn.init.trunc_normal_(w)
- torch.nn.init.orthogonal_(tensor, gain=1, generator=None)[source]#
Fill the input Tensor with a (semi) orthogonal matrix.
Described in Exact solutions to the nonlinear dynamics of learning in deep linear neural networks - Saxe, A. et al. (2013). The input tensor must have at least 2 dimensions, and for tensors with more than 2 dimensions the trailing dimensions are flattened.
- Parameters
- Return type
Examples
>>> w = torch.empty(3, 5) >>> nn.init.orthogonal_(w)
- torch.nn.init.sparse_(tensor, sparsity, std=0.01, generator=None)[source]#
Fill the 2D input Tensor as a sparse matrix.
The non-zero elements will be drawn from the normal distribution N(0,0.01), as described in Deep learning via Hessian-free optimization - Martens, J. (2010).
- Parameters
tensor (Tensor) – an n-dimensional torch.Tensor
sparsity (float) – The fraction of elements in each column to be set to zero
std (float) – the standard deviation of the normal distribution used to generate the non-zero values
generator (Optional[Generator]) – the torch Generator to sample from (default: None)
- Return type
Examples
>>> w = torch.empty(3, 5) >>> nn.init.sparse_(w, sparsity=0.1)