milq
milq is a system for automatically learning the emotional qualities of music.
It works by being trained on several thousand songs (in MP3) form. Each song is labeled with a set of styles and adjectives, like Melancholy, Ironic, Techno, and Indie Pop. There are 100 different label words in total.
milq looks at the audio features of the songs, which generally translate into things like beat structure and regularity, instrumentation, changes between loud and soft sections, and so on (there are 646 features in all). Because it is given examples of what is Melancholy or not, milq learns (using Logistic Discriminative Networks for each label) what the features are that songs labeledMelancholy have in common that makes them different from songs that aren't labeledMelancholy.
Then, when a previously unseen song is given to the system, milq is able to predict the probability that each of the 100 label words applies. For example, for Nirvana's 'Polly', milq determines that the probability of the label Bleak being applicable is 0.999 (that is, 99.9% likely), Passionate is 0.622 (62.2% likely) and Jazz is 0.018 (1.8% likely).
capturing culture
However, it is not easy for any Machine Learning algorithm to learn subtle emotional keywords, especially ones that even the humans who have labeled the songs might well disagree with. Every label I used worked better than chance, so there's definitely something that sounds different for, say, Wry songs. But milq is much better at detecting Electronica than Wry.
To deal with this, milq also looks for sets of labels that are 'informative' of each other. These could be labels that very often occur together, such as Ironic, Wry and Indie Rock, or labels never, if ever, occur together, like Manic, Boisterous and Mellow.
These labels are used to modify each other -- if a song is Ironic and Indie Rock, is is made more likely that it will be Wry. The entire set of labels, in fact, is organized into a network of 'informativeness', with every label informing and being informed by several others.
This is intuitively similar to how people often apply labels to things -- when in doubt, most people would be more likely to assign a label of Ironic to a College Rock song than a House one. This is useful, because the goal is not so much to get the labels 'correct' as to get a set of labels that seems plausible to the users. This system reduces things like songs that are labeled both Tense and Soothing, instead choosing one or the other.
![]() |
Example results for the song 'Djed' by
Tortoise. The higher a word appears, the more likely milq has
determined it to apply to the song. |
more
- this rather large movie (about 110 MB; I'll fix that someday if enough people complain) shows a simple demo of the software
- the gory details can be found in my thesis
- my thesis talk is shorter and more readable, and has some examples
- the second half of the talk I gave at the Banff New Media Centre is even shorter and more readable
