You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For certain input audio lengths, pitch calculation and MFCC calculation will produce different length results. See below for a minimal repro. Here is the output of this script: output.txt
#include "feat/feature-mfcc.h"
#include "feat/pitch-functions.h"
#include "feat/wave-reader.h"
int main() {
using namespace kaldi;
PitchExtractionOptions pitch_options;
ProcessPitchOptions opp;
WaveData wave;
MfccOptions mfcc_options;
Mfcc mfcc(mfcc_options);
for (int i = 400; i < 4000; i++) {
Vector<BaseFloat> waveform(i);
Matrix<BaseFloat> m1, m2;
ComputeAndProcessKaldiPitch(pitch_options, opp, waveform, &m1);
mfcc.Compute(waveform, 1.0, &m2);
if (m1.NumRows() != m2.NumRows()) {
KALDI_LOG << "I: " << i << " Pitch " << m1.NumRows() << " MFCC " << m2.NumRows();
}
}
}
Note that the phenomenon happens when approaching 600, 720, 880. I think the pattern is when len(input) - frame_size) % frame_shift > 156
i.e. len(input) - 400) % 160 > 156
i.e. in the last 3 samples oof each frame calculation window (?)
Assuming these values are right, I think that the MFCC lengths are as expected, and pitch extraction sometimes returns one more frame than expected
Doesn't seem like a major bug, but caused some headaches for me when using the MFA library (ref)
The text was updated successfully, but these errors were encountered:
For certain input audio lengths, pitch calculation and MFCC calculation will produce different length results. See below for a minimal repro. Here is the output of this script: output.txt
Note that the phenomenon happens when approaching 600, 720, 880. I think the pattern is when
len(input) - frame_size) % frame_shift > 156
i.e.
len(input) - 400) % 160 > 156
i.e. in the last 3 samples oof each frame calculation window (?)
Assuming these values are right, I think that the MFCC lengths are as expected, and pitch extraction sometimes returns one more frame than expected
Doesn't seem like a major bug, but caused some headaches for me when using the MFA library (ref)
The text was updated successfully, but these errors were encountered: