Convert audio stereo to audio byte
我正在尝试进行一些音频处理,我真的对立体声到单声道的转换感到困惑。我在互联网上看过立体声到单声道的转换。
据我所知,我可以采用左声道,右声道,将它们求和并除以2。但是当我再次将结果转储到WAV文件中时,我得到了很多前景色。我知道在处理数据时可能会产生噪音,字节变量中有一些溢出。
这是我从MP3文件中检索byte []数据块的类:
公共类InputSoundDecoder {
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 | private int BUFFER_SIZE = 128000; private String _inputFileName; private File _soundFile; private AudioInputStream _audioInputStream; private AudioFormat _audioInputFormat; private AudioFormat _decodedFormat; private AudioInputStream _audioInputDecodedStream; public InputSoundDecoder(String fileName) throws UnsuportedSampleRateException{ this._inputFileName = fileName; this._soundFile = new File(this._inputFileName); try{ this._audioInputStream = AudioSystem.getAudioInputStream(this._soundFile); } catch (Exception e){ e.printStackTrace(); System.err.println("Could not open file:" + this._inputFileName); System.exit(1); } this._audioInputFormat = this._audioInputStream.getFormat(); this._decodedFormat = new AudioFormat(AudioFormat.Encoding.PCM_SIGNED, 44100, 16, 2, 1, 44100, false); this._audioInputDecodedStream = AudioSystem.getAudioInputStream(this._decodedFormat, this._audioInputStream); /** Supported sample rates */ switch((int)this._audioInputFormat.getSampleRate()){ case 22050: this.BUFFER_SIZE = 2304; break; case 44100: this.BUFFER_SIZE = 4608; break; default: throw new UnsuportedSampleRateException((int)this._audioInputFormat.getSampleRate()); } System.out.println ("# Channels:" + this._decodedFormat.getChannels()); System.out.println ("Sample size (bits):" + this._decodedFormat.getSampleSizeInBits()); System.out.println ("Frame size:" + this._decodedFormat.getFrameSize()); System.out.println ("Frame rate:" + this._decodedFormat.getFrameRate()); } public byte[] getSamples(){ byte[] abData = new byte[this.BUFFER_SIZE]; int bytesRead = 0; try{ bytesRead = this._audioInputDecodedStream.read(abData,0,abData.length); } catch (Exception e){ e.printStackTrace(); System.err.println("Error getting samples from file:" + this._inputFileName); System.exit(1); } if (bytesRead > 0) return abData; else return null; } |
} ??
这意味着,每次我调用getSamples时,它都会返回一个类似于以下内容的数组:
buff = {Lchannel,Rchannel,Lchannel,Rchannel,Lchannel,Rchannel,Lchannel,Rchannel ...}
转换为单声道的处理例程如下:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | byte[] buff = null; while( (buff = _input.getSamples()) != null ){ /** Convert to mono */ byte[] mono = new byte[buff.length/2]; for (int i = 0 ; i < mono.length/2; ++i){ int left = (buff[i * 4] << 8) | (buff[i * 4 + 1] & 0xff); int right = (buff[i * 4 + 2] <<8) | (buff[i * 4 + 3] & 0xff); int avg = (left + right) / 2; short m = (short)avg; /*Mono is an average between 2 channels (stereo)*/ mono[i * 2] = (byte)((short)(m >> 8)); mono[i * 2 + 1] = (byte)(m & 0xff); } |
} ??
并使用以下命令写入wav文件:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 | public static void writeWav(byte [] theResult, int samplerate, File outfile) { // now convert theResult into a wav file // probably should use a file if samplecount is too big! int theSize = theResult.length; InputStream is = new ByteArrayInputStream(theResult); //Short2InputStream sis = new Short2InputStream(theResult); AudioFormat audioF = new AudioFormat( AudioFormat.Encoding.PCM_SIGNED, samplerate, 16, 1, // channels 2, // framesize samplerate, false ); AudioInputStream ais = new AudioInputStream(is, audioF, theSize); try { AudioSystem.write(ais, AudioFileFormat.Type.WAVE, outfile); } catch (IOException ioe) { System.err.println("IO Exception; probably just done with file"); return; } } |
以44100作为采样率。
请记住,实际上我所拥有的byte []数组已经是pcm,因此mp3-> pcm转换是通过指定
完成的
1
2 this._decodedFormat = new AudioFormat(AudioFormat.Encoding.PCM_SIGNED, 44100, 16, 2, 1, 44100, false);
this._audioInputDecodedStream = AudioSystem.getAudioInputStream(this._decodedFormat, this._audioInputStream);blockquote>
正如我所说,在写入Wav文件时,我会产生很多噪音。我假装将FFT应用于每个字节块,但我认为由于声音嘈杂,结果是不正确的。
因为我要拍两首歌,其中一首是另一首的20秒裁切,并且将裁切fft结果与原始20秒子集进行比较时,根本不匹配。
我认为这是立体声->单声道转换错误的原因。
希望有人对此有所了解,
致谢。
正如评论中指出的那样,字节序可能是错误的。同样,转换为带符号的short并将其移位可能会导致第一个字节为0xFF。
尝试:
1
2
3
4
5
6 int HI = 0; int LO = 1;
int left = (buff[i * 4 + HI] << 8) | (buff[i * 4 + LO] & 0xff);
int right = (buff[i * 4 + 2 + HI] << 8) | (buff[i * 4 + 2 + LO] & 0xff);
int avg = (left + right) / 2;
mono[i * 2 + HI] = (byte)((avg >> 8) & 0xff);
mono[i * 2 + LO] = (byte)(avg & 0xff);然后切换HI和LO的值以查看是否变好。