Coding the Unknown - Generate Midi from Whistle
I decided to write an article about how I made a midi generator from recording whistle sound because I think it is a great example to teach both tech and non-tech people about how to think like a real programmer.
The interesting thing about this project is this. I had no Idea how to make a whistle to midi generator but I started this project because even I dont know about how to do it; I knew I could make it anyway. And my approach is similar in many areas in the programming work I do.
Lets start with what I know, I know what midi is, it contains seperate track of notes just like musical note sheets. All tracks are associated with some instrument so it can play those notes with any instrument. Your midi synthesizer actually plays this midi stream so it is not like a mp3 (it can sound diffrent on diffrent sound cards). I know this because I worked with a lot of midi files when I was creating patches to crack many software in the 90's.
So lets define our steps.
Split the Problem into smaller Problems
Ok so I have no idea what I am doing so I have a big problem. But I know how to split my problem into smaller problems. So what do I do not know?
- I do not know how to record audio from microphone.
- I do not know how to convert 1 second of constant whistle to related midi note
- I do not know the previous step because I do not know the data type of any kind of sound.
- I am no musician, I do not even know what a note is.
- Even if I know what a note is and I can resolve it from a whistle, I do not know how to put it into a midi file
Now I have more problems, but all of them are easier.
Recording the Microphone
My first step is easy because it is very common and I am sure I can find it in stackoverflow or google. And I found it, and it also solved my 3rd problem because now I see that the sound from the microphone is received as packets of byte arrays constantly while recording.
Now lets learn some music, I search wikipedia to learn what a musical note is and now I see some formulas. There is a a formula that converts frequency of sound to an integer representing the piano key index I think. Great. But recording my microphone only provides me a buffer of 8000 bytes each some miliseconds. and they do not look like a frequency it is a set of integers.
Now I have even more problems.
Problem 6: I clearly see that If I want to get a note out of this byte array I need to convert this wave data to frequency.
Converting the wave buffer to frequency
How the fuc am I supposed to do that? Thanks to some search engines, I learn that it is done with Fourier Transform. Of course we learned about those stuff in engineering classes but I cannot learn anything if it solves none of my real world problems so I used high tech gadgets to cheat regularly most of my classes, even then I was a bad student. Schools are not for smart people anyway. Some other articles topic.
Now because I know the exact name of the function, I am sure some dude has a library somewhere about this thing. I knew I did not need to learn about this stuff in Engineering. All the wasted hours of my life for just some lines of code I can easily copy paste. Most computer engineers will never have the need for this formula in their lifetimes anyway. Even if this code was not available somewhere I could split this problem into more smaller parts and solve it.
This step added even additional problems to my list but they are relatively easy to figure out since my whistle often has multiple frequences, I just get the frequency with maximum power.
Converting Hertz to Note
Now I have the power to convert my byte array to frequency using Fast Fourier Transform. So lets look at the musical note formula.
Musical Note Formula
Starting at any note the frequency to other notes may be calculated from its frequency by:
Freq = note x 2 N/12,
where N is the number of notes away from the starting note. N may be positive, negative or zero.
For example, starting at D (146.84 Hz), the frequency to the next higher F is:
146.84 x 2 3/12 = 174.62,
since F is three notes above. The frequency of A in the next lower octave is:
146.84 x 2 -17/12 = 55
Do not tell me I am a cheater because I get the formula from some website. I could just press the piano keys one by one and record the byte arrays and convert them to hertz using FFT and write down frequencies to a CSV file and solve my problem with a lookup table. I will always buy a wheel for my car from the store even if I am able to make one at home.
The point is, I am able to split the problem into infinite steps and I am able to solve them one by one. So nothing is difficult in my eyes. This skill of splitting the problem into infinite parts makes me the greatest programmer alive. If you learn this skill, you can be the second greatest.
Getting Audio Samples from a Whistle
This part is easy, just add a timer to read some buffer each 0.5 second. (I know it is not the best way since some notes have to be played longer than others but diving deep into that problem defeats the purpose of this article) . Do an FFT on the buffers, convert to notes and store in a generic list to later save to a midi file.
Creating a Midi Output file
This is also an easy step, just find some midi package from your package manager which allows to work on streams and you are ready to go. If you want to suffer, read midi specification or choose an easier and more primitive format which is convertible to midi.
I did not provide a full source code to this article because it was a quick and dirty project to use for myself, It has long shitty but working code, article is enaugh long already. To test this skill you should of course choose another impossible project and start to code.
This is the final output