Using ICA on Sound Files

Let’s begin by looking at the dataset we have. We have three WAVE files, each of which is a mix. If you haven’t worked with audio files in python basically boil down to being lists of floats.

import numpy as np
import wave

# Read the wave file
mix_1_wave = wave.open('ica-on-sound-files/ICA mix 1.wav','r')

Let’s peak at the parameters of the wave file to learn more about it

mix_1_wave.getparams()
_wave_params(nchannels=1, sampwidth=2, framerate=44100, nframes=264515, comptype='NONE', compname='not compressed')

So this file has only channel (so it’s mono sound). It has a frame rate of 44100, which means each second of sound is represented by 44100 integers (integers because the file is in the common PCM 16-bit format). The file has a total of 264515 integers/frames, which means its length in seconds is:

264515/44100
5.998072562358277

Let’s extract the frames of the wave file, which will be a part of the dataset we’ll run ICA against:

# Extract Raw Audio from Wav File
signal_1_raw = mix_1_wave.readframes(-1)
signal_1 = np.frombuffer(signal_1_raw, 'Int16')
/home/ryan/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:3: DeprecationWarning: Numeric-style type codes are deprecated and will result in an error in the future.
  This is separate from the ipykernel package so we can avoid doing imports until

signal_1 is now a list of ints representing the sound contained in the first file.

'length: ', len(signal_1) , 'first 100 elements: ',signal_1[:100]
('length: ',
 264515,
 'first 100 elements: ',
 array([  879,  1268,  1460,  1756,  1943,  2216,  2407,  2668,  2866,
         3106,  3308,  3546,  3752,  3981,  4175,  4395,  4588,  4790,
         4966,  5146,  5292,  5436,  5550,  5643,  5717,  5759,  5790,
         5798,  5789,  5756,  5713,  5649,  5576,  5478,  5381,  5267,
         5146,  4999,  4856,  4682,  4502,  4308,  4097,  3875,  3637,
         3380,  3107,  2825,  2514,  2194,  1847,  1472,  1087,   671,
          227,  -219,  -691, -1176, -1666, -2167, -2669, -3179, -3668,
        -4170, -4643, -5116, -5559, -5985, -6380, -6765, -7105, -7422,
        -7706, -7955, -8163, -8339, -8470, -8557, -8600, -8618, -8585,
        -8524, -8425, -8298, -8129, -7947, -7720, -7475, -7205, -6916,
        -6606, -6266, -5922, -5556, -5165, -4774, -4353, -3922, -3476,
        -3021], dtype=int16))

If we plot this array as a line graph, we’ll get the familiar wave form representation:

import matplotlib.pyplot as plt

fs = mix_1_wave.getframerate()
timing = np.linspace(0, len(signal_1)/fs, num=len(signal_1))

plt.figure(figsize=(12,2))
plt.title('Recording 1')
plt.plot(timing,signal_1, c="#3ABFE7")
plt.ylim(-35000, 35000);

png

mix_2_wave = wave.open('ica-on-sound-files/ICA mix 2.wav','r')

#Extract Raw Audio from Wav File
signal_raw_2 = mix_2_wave.readframes(-1)
signal_2 = np.frombuffer(signal_raw_2, 'Int16')

mix_3_wave = wave.open('ica-on-sound-files/ICA mix 3.wav','r')

#Extract Raw Audio from Wav File
signal_raw_3 = mix_3_wave.readframes(-1)
signal_3 = np.frombuffer(signal_raw_3, 'Int16')

plt.figure(figsize=(12,2))
plt.title('Recording 2')
plt.plot(timing,signal_2, c="#3ABFE7")
plt.ylim(-35000, 35000);

plt.figure(figsize=(12,2))
plt.title('Recording 3')
plt.plot(timing,signal_3, c="#3ABFE7")
plt.ylim(-35000, 35000);
/home/ryan/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:5: DeprecationWarning: Numeric-style type codes are deprecated and will result in an error in the future.
  """
/home/ryan/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:11: DeprecationWarning: Numeric-style type codes are deprecated and will result in an error in the future.
  # This is added back by InteractiveShellApp.init_path()

png

png

Now that we’ve read all three files, we’re ready to zip them to create our dataset.

  • Create dataset X by zipping signal_1, signal_2, and signal_3 into a single list
X = list(zip(signal_1, signal_2, signal_3))

X[:10]
[(879, 157, 879),
 (1268, 899, 2584),
 (1460, 1610, 4081),
 (1756, 2325, 5682),
 (1943, 3022, 7147),
 (2216, 3718, 8689),
 (2407, 4386, 10114),
 (2668, 5056, 11583),
 (2866, 5690, 12954),
 (3106, 6318, 14327)]

We are now ready to run ICA to try to retrieve the original signals.

  • Import sklearn’s FastICA module
  • Initialize FastICA look for three components
  • Run the FastICA algorithm using fit_transform on dataset X
# Import FastICA
from sklearn.decomposition import FastICA

# Initialize FastICA with n_components=3
ica = FastICA(n_components=3)

# Run the FastICA algorithm using fit_transform on dataset X
ica_result = ica.fit_transform(X)

ica_result now contains the result of FastICA, which we hope are the original signals. It’s in the shape:

ica_result.shape
(264515, 3)

Split into separate signals and look at them

result_signal_1 = ica_result[:,0]
result_signal_2 = ica_result[:,1]
result_signal_3 = ica_result[:,2]

Plot, to see how the wave forms look

# Plot Independent Component #1
plt.figure(figsize=(12,2))
plt.title('Independent Component #1')
plt.plot(result_signal_1, c="#df8efd")
plt.ylim(-0.010, 0.010)
plt.show()

# Plot Independent Component #2
plt.figure(figsize=(12,2))
plt.title('Independent Component #2')
plt.plot(result_signal_2, c="#87de72")
plt.ylim(-0.010, 0.010)
plt.show()

# Plot Independent Component #3
plt.figure(figsize=(12,2))
plt.title('Independent Component #3')
plt.plot(result_signal_3, c="#f65e97")
plt.ylim(-0.010, 0.010)
plt.show()

png

png

png

Now, save the files as wave files. This requires:

  • convert them to integer (so we can save as PCM 16-bit Wave files), otherwise only some media players would be able to play them and others won’t
  • Map the values to the appropriate range for int16 audio. That range is between -32768 and +32767. A basic mapping can be done by multiplying by 32767.
  • The sounds will be a little faint, we can increase the volume by multiplying by a value like 100
from scipy.io import wavfile

# Convert to int, map the appropriate range, and increase the volume a little bit
result_signal_1_int = np.int16(result_signal_1*32767*100)
result_signal_2_int = np.int16(result_signal_2*32767*100)
result_signal_3_int = np.int16(result_signal_3*32767*100)

# Write wave files
wavfile.write("ica-on-sound-files/result_signal_1.wav", fs, result_signal_1_int)
wavfile.write("ica-on-sound-files/result_signal_2.wav", fs, result_signal_2_int)
wavfile.write("ica-on-sound-files/result_signal_3.wav", fs, result_signal_3_int)