pretrained.demucs

Implementation of the Demucs model architecture.

From the paper Real Time Speech Enhancement in the Waveform Domain. The paper has a project page here.

This model is a relatively straight-forward autoencoder, similar to a UNet but with an RNN in between. The original model was trained to do denoising, which makes sense for this particular model since it simply requires removing some part of the input waveform.

pretrained.demucs.sinc(t: Tensor) → Tensor[source]

pretrained.demucs.kernel_upsample2(device: device, dtype: dtype, zeros: int = 56) → Tensor[source]

pretrained.demucs.upsample2(x: Tensor, zeros: int = 56) → Tensor[source]

pretrained.demucs.kernel_downsample2(device: device, dtype: dtype, zeros: int = 56) → Tensor[source]

pretrained.demucs.downsample2(x: Tensor, zeros: int = 56) → Tensor[source]

pretrained.demucs.fast_conv(conv: Conv1d | ConvTranspose1d, x: Tensor) → Tensor[source]

pretrained.demucs.rescale_conv(conv: Conv1d | ConvTranspose1d, reference: float) → None[source]

pretrained.demucs.rescale_module(module: Module, reference: float) → None[source]

class pretrained.demucs.RNN(dim: int, layers: int = 2, bi: bool = True)[source]

Bases: Module

Initializes internal Module state, shared by both nn.Module and ScriptModule.

forward(x: Tensor, hidden: Tensor | None = None) → tuple[torch.Tensor, torch.Tensor][source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class pretrained.demucs.Encoder(in_channels: int, out_channels: int, kernel_size: int, stride: int, act: Literal['no_act', 'relu', 'relu6', 'relu2', 'clamp6', 'leaky_relu', 'elu', 'celu', 'selu', 'gelu', 'gelu_fast', 'sigmoid', 'log_sigmoid', 'hard_sigomid', 'tanh', 'softsign', 'softplus', 'silu', 'mish', 'swish', 'hard_swish', 'soft_shrink', 'hard_shrink', 'tanh_shrink', 'soft_sign', 'relu_squared', 'laplace'] = 'relu')[source]

Bases: Module

Initializes internal Module state, shared by both nn.Module and ScriptModule.

forward(x: Tensor) → Tensor[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class pretrained.demucs.Decoder(in_channels: int, out_channels: int, kernel_size: int, stride: int, act: Literal['no_act', 'relu', 'relu6', 'relu2', 'clamp6', 'leaky_relu', 'elu', 'celu', 'selu', 'gelu', 'gelu_fast', 'sigmoid', 'log_sigmoid', 'hard_sigomid', 'tanh', 'softsign', 'softplus', 'silu', 'mish', 'swish', 'hard_swish', 'soft_shrink', 'hard_shrink', 'tanh_shrink', 'soft_sign', 'relu_squared', 'laplace'] = 'relu')[source]

Bases: Module

Initializes internal Module state, shared by both nn.Module and ScriptModule.

forward(x: Tensor) → Tensor[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class pretrained.demucs.Demucs(in_channels: int, out_channels: int, hidden: int = 48, depth: int = 5, kernel_size: int = 8, stride: int = 4, causal: bool = True, resample: int = 4, growth: float = 2, max_hidden: int = 10000, normalize: bool = True, rescale: float = 0.1, floor: float = 0.001, sample_rate: int = 16000)[source]

Bases: Module

Demucs speech enhancement model.

Parameters:

in_channels – Number of input channels.
out_channels – Number of output channels.
hidden – Number of initial hidden channels.
depth – Number of layers.
kernel_size – Kernel size for each layer.
stride – Stride for each layer.
causal – If false, uses BiLSTM instead of LSTM.
resample – Amount of resampling to apply to the input/output. Can be one of 1, 2 or 4.
growth – Number of channels is multiplied by this for every layer.
max_hidden – Maximum number of channels. Can be useful to control the size/speed of the model.
normalize – If true, normalize the input.
rescale – Controls custom weight initialization.
floor – Floor value for normalization.
sample_rate – Sample rate used for training the model.

valid_length(length: int) → int[source]

Returns the nearest valid length to use with the model.

Return the nearest valid length to use with the model so that there is no time steps left over in a convolutions, e.g. for all layers, size of the input - kernel_size % stride = 0.

If the mixture has a valid length, the estimated sources will have exactly the same length.

Parameters:: length – Length of the input.
Returns:: The nearest valid length.

property total_stride: int

forward(mix: Tensor) → Tensor[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

streamer(*, dry: float = 0.0, num_frames: int = 1, resample_lookahead: int = 64, resample_buffer: int = 256, device: base_device | None = None) → DemucsStreamer[source]

Gets a streamer for the current model.

Parameters:

dry – Percentage of the unaltered signal to preserve (0 to 1).
num_frames – Number of frames to process at once. Higher values will increase overall latency but improve the real time factor.
resample_lookahead – Extra lookahead used for the resampling.
resample_buffer – Size of the buffer of previous inputs/outputs kept for resampling.
device – The device to use for predictions. If None, will use the device returned by detect_device().

Returns:

A streamer for streaming from the current model.

class pretrained.demucs.DemucsStreamer(demucs: Demucs, dry: float = 0.0, num_frames: int = 1, resample_lookahead: int = 64, resample_buffer: int = 256, device: base_device | None = None)[source]

Bases: object

reset_time_per_frame() → None[source]

property time_per_frame: float

flush() → Tensor[source]

feed(wav: Tensor) → Tensor[source]