shiftPitch

Shift audio pitch

Syntax

audioOut = shiftPitch(audioIn,nsemitones)

audioOut = shiftPitch(audioIn,nsemitones,Name,Value)

Description

audioOut = shiftPitch(audioIn,nsemitones) shifts the pitch of the audio input by the specified number of semitones, nsemitones.

example

audioOut = shiftPitch(audioIn,nsemitones,Name,Value) specifies options using one or more Name,Value pair arguments.

example

Examples

collapse all

Apply Pitch-Shifting to Time-Domain Audio

Open Live Script

Read in an audio file and listen to it.

[audioIn,fs] = audioread('Counting-16-44p1-mono-15secs.wav');
sound(audioIn,fs)

Increase the pitch by 3 semitones and listen to the result.

nsemitones = 3;
audioOut = shiftPitch(audioIn,nsemitones);
sound(audioOut,fs)

Decrease the pitch of the original audio by 3 semitones and listen to the result.

nsemitones = -3;
audioOut = shiftPitch(audioIn,nsemitones);
sound(audioOut,fs)

Apply Pitch-Shifting to Frequency-Domain Audio

Open Live Script

Read in an audio file and listen to it.

[audioIn,fs] = audioread("SpeechDFT-16-8-mono-5secs.wav");
sound(audioIn,fs)

Convert the audio signal to a time-frequency representation using stft. Use a 512-point kbdwin with 75% overlap.

win = kbdwin(512);
overlapLength = 0.75*numel(win);

S = stft(audioIn, ...
    "Window",win, ...
    "OverlapLength",overlapLength, ...
    "Centered",false);

Increase the pitch by 8 semitones and listen to the result. Specify the window and overlap length you used to compute the STFT.

nsemitones = 8;
lockPhase = false;
audioOut = shiftPitch(S,nsemitones, ...
                     "Window",win, ...
                     "OverlapLength",overlapLength, ...
                     "LockPhase",lockPhase);

sound(audioOut,fs)

Decrease the pitch of the original audio by 8 semitones and listen to the result. Specify the window and overlap length you used to compute the STFT.

nsemitones = -8;
lockPhase = false;
audioOut = shiftPitch(S,nsemitones, ...
                     "Window",win, ...
                     "OverlapLength",overlapLength, ...
                     "LockPhase",lockPhase);

sound(audioOut,fs)

Increase Fidelity Using Phase Locking

Open Live Script

Read in an audio file and listen to it.

[audioIn,fs] = audioread('FemaleSpeech-16-8-mono-3secs.wav');
sound(audioIn,fs)

Increase the pitch by 6 semitones and listen to the result.

nsemitones = 6;
lockPhase = false;
audioOut = shiftPitch(audioIn,nsemitones, ...
                     'LockPhase',lockPhase);
sound(audioOut,fs)

To increase fidelity, set LockPhase to true. Apply pitch shifting, and listen to the results.

lockPhase = true;
audioOut = shiftPitch(audioIn,nsemitones, ...
                     'LockPhase',lockPhase);
sound(audioOut,fs)

Increase Fidelity Using Formant Preservation

Open Live Script

Read in the first 11.5 seconds of an audio file and listen to it.

[audioIn,fs] = audioread('Rainbow-16-8-mono-114secs.wav',[1,8e3*11.5]);
sound(audioIn,fs)

Increase the pitch by 4 semitones and apply phase locking. Listen to the results. The resulting audio has a "chipmunk effect" that sounds unnatural.

nsemitones = 4;
lockPhase = true;
audioOut = shiftPitch(audioIn,nsemitones, ...
    "LockPhase",lockPhase);

sound(audioOut,fs)

To increase fidelity, set PreserveFormants to true. Use the default cepstral order of 30. Listen to the result.

cepstralOrder = 30;
audioOut = shiftPitch(audioIn,nsemitones, ...
    "LockPhase",lockPhase, ...
    "PreserveFormants",true, ...
    "CepstralOrder",cepstralOrder);

sound(audioOut,fs)

Input Arguments

collapse all

`audioIn` — Input signal
column vector | matrix | 3-D array

Input signal, specified as a column vector, matrix, or 3-D array. How the function interprets audioIn depends on the complexity of audioIn:

If audioIn is real, audioIn is interpreted as a time-domain signal. In this case, audioIn must be a column vector or matrix. Columns are interpreted as individual channels.
If audioIn is complex, audioIn is interpreted as a frequency-domain signal. In this case, audioIn must be an L-by-M-by-N array, where L is the FFT length, M is the number of individual spectra, and N is the number of channels.

Data Types: single | double
Complex Number Support: Yes

`nsemitones` — Number of semitones to shift audio by
real scalar

Number of semitones to shift the audio by, specified as a real scalar.

The range of nsemitones depends on the window length (numel(Window)) and the overlap length (OverlapLength):

-12*log2(numel(Window)-OverlapLength) ≤ nsemitones ≤ -12*log2((numel(Window)-OverlapLength)/numel(Window))

Data Types: single | double

Name-Value Arguments

collapse all

Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Before R2021a, use commas to separate each name and value, and enclose Name in quotes.

Example: 'Window',kbdwin(512)

`Window` — Window applied in time domain
`sqrt(hann(1024,'periodic'))` (default) | real vector

Window applied in the time domain, specified as the comma-separated pair consisting of 'Window' and a real vector. The number of elements in the vector must be in the range [1, size(audioIn,1)]. The number of elements in the vector must also be greater than OverlapLength.

Note

If using shiftPitch with frequency-domain input, you must specify Window as the same window used to transform audioIn to the frequency domain.

Data Types: single | double

`OverlapLength` — Number of samples overlapped between adjacent windows
`round(0.75*numel(Window))` (default) | scalar in the range [0, `numel(Window)`)

Number of samples overlapped between adjacent windows, specified as the comma-separated pair consisting of 'OverlapLength' and an integer in the range [0, numel(Window)).

Note

If using shiftPitch with frequency-domain input, you must specify OverlapLength as the same overlap length used to transform audioIn to a time-frequency representation.

Data Types: single | double

`LockPhase` — Apply identity phase locking
`false` (default) | `true`

Apply identity phase locking, specified as the comma-separated pair consisting of 'LockPhase' and false or true.

Data Types: logical

`PreserveFormants` — Preserve formants
`false` (default) | `true`

Preserves formants, specified as the comma-separated pair consisting of 'PreserveFormants' and true or false. Formant preservation is attempted using spectral envelope estimation with cepstral analysis.

Data Types: logical

`CepstralOrder` — Cepstral order used for formant preservation
30 (default) | nonnegative integer

Cepstral order used for formant preservation, specified as the comma-separated pair consisting of 'CepstralOrder' and a nonnegative integer.

Dependencies

To enable this name-value pair argument, set PreserveFormants to true.

Data Types: single | double

Output Arguments

collapse all

`audioOut` — Pitch-shifted audio
column vector | matrix

Pitch-shifted audio, returned as a column vector or matrix of independent channels.

Algorithms

collapse all

To apply pitch shifting, shiftPitch modifies the time-scale of audio using a phase vocoder and then resamples the modified audio. The time scale modification algorithm is based on [1] and [2] and is implemented as in stretchAudio.

After time-scale modification, shiftPitch performs sample rate conversion using an interpolation factor equal to the analysis hop length and a decimation factor equal to the synthesis hop length. The interpolation and decimation factors of the resampling stage are selected as follows: The analysis hop length is determined as analysisHopLength = numel(Window)-OverlapLength. The shiftPitch function assumes that there are 12 semitones in an octave, so the speedup factor used to stretch the audio is speedupFactor = 2^(-nsemitones/12). The speedup factor and analysis hop length determine the synthesis hop length for time-scale modification as synthesisHopLength = round((1/SpeedupFactor)*analysisHopLength).

The achievable pitch shift is determined by the window length (numel(Window)) and OverlapLength. To see the relationship, note that the equation for speedup factor can be rewritten as: nsemitones = -12*log2(speedupFactor), and the equation for synthesis hop length can be rewritten as speedupFactor = analysisHopLengh/synthesisHopLength. Using simple substitution, nsemitones = -12*log2(analysisHopLength/synthesisHopLength). The practical range of a synthesis hop length is [1, numel(Window)]. The range of achievable pitch shifts is:

Max number of semitones lowered: -12*log2(numel(Window)-OverlapLength)
Max number of semitones raised: -12*log2((numel(Window)-OverlapLength)/numel(Window))

Formant Preservation

Pitch shifting can alter the spectral envelope of the pitch-shifted signal. To diminish this effect, you can set PreserveFormants to true. If PreserveFormants is set to true, the algorithm attempts to estimate the spectral envelope using an iterative procedure in the cepstral domain, as described in [3] and [4]. For both the original spectrum, X, and the pitch-shifted spectrum, Y, the algorithm estimates the spectral envelope as follows.

For the first iteration, EnvX_a is set to X. Then, the algorithm repeats these two steps in a loop:

Lowpass filters the cepstral representation of EnvX_a to get a new estimate, EnvX_b. The CepstralOrder parameter controls the quefrency bandwidth.
To update the current best fit, the algorithm takes the element-by-element maximum of the current spectral envelope estimate and the previous spectral envelope estimate:

$E n v X_{a} = \max (E n v X_{a}, E n v X_{b}) .$

The loop ends if either a maximum number of iterations (100) is reached, or if all bins of the estimated log envelope are within a given tolerance of the original log spectrum. The tolerance is set to log(10^(1/20)).

Finally, the algorithm scales the spectrum of the pitch-shifted audio by the ratio of estimated envelopes, element-wise:

$Y = Y \times (\frac{E n v X_{b}}{E n v Y_{b}}) .$

References

[1] Driedger, Johnathan, and Meinard Müller. "A Review of Time-Scale Modification of Music Signals." Applied Sciences. Vol. 6, Issue 2, 2016.

[2] Driedger, Johnathan. "Time-Scale Modification Algorithms for Music Audio Signals." Master's Thesis. Saarland University, Saarbrücken, Germany, 2011.

[3] Axel Roebel, and Xavier Rodet. "Efficient Spectral Envelope Estimation and its application to pitch shifting and envelope preservation." International Conference on Digital Audio Effects, pp. 30–35. Madrid, Spain, September 2005. hal-01161334

[4] S. Imai, and Y. Abe. "Spectral envelope extraction by improved cepstral method." Electron. and Commun. in Japan. Vol. 62-A, Issue 4, 1997, pp. 10–17.

Extended Capabilities

expand all

C/C++ Code Generation
Generate C and C++ code using MATLAB® Coder™.

GPU Arrays
Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™.

Usage notes and limitations:

LockPhase must be set to false.
Using gpuArray (Parallel Computing Toolbox) input with shiftPitch is only recommended for a GPU with compute capability 7.0 ("Volta") or above. Other hardware might not offer any performance advantage. To check your GPU compute capability, see ComputeCompability in the output from the gpuDevice (Parallel Computing Toolbox) function. For more information, see GPU Computing Requirements (Parallel Computing Toolbox).

For an overview of GPU usage in MATLAB^®, see Run MATLAB Functions on a GPU (Parallel Computing Toolbox).

Version History

Introduced in R2019b

shiftPitch

Syntax

Description

Examples

Apply Pitch-Shifting to Time-Domain Audio

Apply Pitch-Shifting to Frequency-Domain Audio

Increase Fidelity Using Phase Locking

Increase Fidelity Using Formant Preservation

Input Arguments

audioIn — Input signal column vector | matrix | 3-D array

nsemitones — Number of semitones to shift audio by real scalar

Name-Value Arguments

Window — Window applied in time domain sqrt(hann(1024,'periodic')) (default) | real vector

OverlapLength — Number of samples overlapped between adjacent windows round(0.75*numel(Window)) (default) | scalar in the range [0, numel(Window))

LockPhase — Apply identity phase locking false (default) | true

PreserveFormants — Preserve formants false (default) | true

CepstralOrder — Cepstral order used for formant preservation 30 (default) | nonnegative integer

Dependencies

Output Arguments

audioOut — Pitch-shifted audio column vector | matrix

Algorithms

Formant Preservation

References

Extended Capabilities

C/C++ Code Generation Generate C and C++ code using MATLAB® Coder™.

GPU Arrays Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™.

Version History

See Also

`audioIn` — Input signal
column vector | matrix | 3-D array

`nsemitones` — Number of semitones to shift audio by
real scalar

`Window` — Window applied in time domain
`sqrt(hann(1024,'periodic'))` (default) | real vector

`OverlapLength` — Number of samples overlapped between adjacent windows
`round(0.75*numel(Window))` (default) | scalar in the range [0, `numel(Window)`)

`LockPhase` — Apply identity phase locking
`false` (default) | `true`

`PreserveFormants` — Preserve formants
`false` (default) | `true`

`CepstralOrder` — Cepstral order used for formant preservation
30 (default) | nonnegative integer

`audioOut` — Pitch-shifted audio
column vector | matrix

C/C++ Code Generation
Generate C and C++ code using MATLAB® Coder™.

GPU Arrays
Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™.