pretrained.wav_codec
Defines a simple API for an audio quantizer model that runs on waveforms.
from pretrained.wav_codec import pretrained_wav_codec
model = pretrained_mel_codec("wav-codec-small")
quantizer, dequantizer = model.quantizer(), model.dequantizer()
# Convert some audio to a quantized representation.
quantized = quantizer(audio)
# Convert the quantized representation back to audio.
audio = dequantizer(quantized)
- pretrained.wav_codec.cast_pretrained_mel_codec_type(s: str | Literal['base']) Literal['base'] [source]
- pretrained.wav_codec.split_waveform(waveform: Tensor, stride_length: int, waveform_prev: Tensor | None = None) tuple[torch.Tensor, torch.Tensor] [source]
- class pretrained.wav_codec.CBR(in_channels: int, out_channels: int, kernel_size: int)[source]
Bases:
Module
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- forward(x: Tensor, state: tuple[torch.Tensor, int] | None = None) tuple[torch.Tensor, tuple[torch.Tensor, int]] [source]
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class pretrained.wav_codec.Encoder(stride_length: int, d_model: int, kernel_size: int = 5)[source]
Bases:
Module
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- forward(waveform: Tensor, state: tuple[torch.Tensor, list[tuple[torch.Tensor, int]]] | None = None) tuple[torch.Tensor, tuple[torch.Tensor, list[tuple[torch.Tensor, int]]]] [source]
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class pretrained.wav_codec.Decoder(stride_length: int, hidden_size: int, num_layers: int)[source]
Bases:
Module
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- forward(toks: Tensor, waveform: Tensor, state: tuple[torch.Tensor, torch.Tensor] | None = None) tuple[torch.Tensor, tuple[torch.Tensor, torch.Tensor]] [source]
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class pretrained.wav_codec.WavCodec(stride_length: int, hidden_size: int, num_layers: int, codebook_size: int, num_quantizers: int)[source]
Bases:
Module
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- forward(waveform: Tensor) tuple[torch.Tensor, torch.Tensor] [source]
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- encode(waveform: Tensor, waveform_prev: Tensor | None = None) tuple[torch.Tensor, torch.Tensor] [source]
- decode(toks: Tensor, state: tuple[torch.Tensor, torch.Tensor] | None = None) tuple[torch.Tensor, tuple[torch.Tensor, torch.Tensor]] [source]
- quantizer() WavCodecQuantizer [source]
- dequantizer() WavCodecDequantizer [source]
- class pretrained.wav_codec.WavCodecQuantizer(model: WavCodec)[source]
Bases:
Module
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- encode(waveform: Tensor, state: tuple[torch.Tensor, list[tuple[torch.Tensor, int]]] | None = None) tuple[torch.Tensor, tuple[torch.Tensor, list[tuple[torch.Tensor, int]]]] [source]
Converts a waveform into a set of tokens.
- Parameters:
waveform – The single-channel input waveform, with shape
(B, T)
This should be at 16000 Hz.state – The encoder state from the previous step.
- Returns:
The quantized tokens, with shape
(N, B, Tq)
, along with the encoder state to pass to the next step.
- forward(waveform: Tensor, state: tuple[torch.Tensor, list[tuple[torch.Tensor, int]]] | None = None) tuple[torch.Tensor, tuple[torch.Tensor, list[tuple[torch.Tensor, int]]]] [source]
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class pretrained.wav_codec.WavCodecDequantizer(model: WavCodec)[source]
Bases:
Module
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- decode(toks: Tensor, state: tuple[torch.Tensor, torch.Tensor] | None = None) tuple[torch.Tensor, tuple[torch.Tensor, torch.Tensor]] [source]
Converts a set of tokens into a waveform.
- Parameters:
toks – The quantized tokens, with shape
(N, B, Tq)
state – The previous state of the decoder.
- Returns:
The single-channel output waveform, with shape
(B, T)
, along with the new state of the decoder.
- forward(toks: Tensor, state: tuple[torch.Tensor, torch.Tensor] | None = None) tuple[torch.Tensor, tuple[torch.Tensor, torch.Tensor]] [source]
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.