Music Mood Classification: Relativity to Music Therapy
Introduction
Music, an art form that transcends language and culture, possesses an ability to evoke thousands of emotions in its listeners. From the euphoria of a catchy pop tune to the contemplative depths of a classical symphony, music has a radical impact on our emotional and psychological well-being. But have you ever wondered why certain melodies make you feel a rush of joy, while others transport you to a world of melancholy reflection that goes beyond the lyrical understanding? This project embarks on a journey to unravel the intricacies of music theory and how its various elements contribute to the emotional tapestry woven by every note and rhythm.
This interdisciplinary effort brings together the realms of music theory, psychology, and Music Information Retrieval (MIR) to dissect the emotional language of music. Our mission is clear: to comprehensively study the multifaceted elements of music, including melody, rhythm, key, mode, meter, cadence, pitch, duration, form, and texture. Each of these components plays a vital role in shaping the emotional responses of audiences, and through rigorous examination, we aim to decipher their influence.
The “Emotional Connections” project takes our exploration a step further, striving to forge links between specific musical elements and the emotional responses they trigger. We delve deep into chord progressions, rhythms, and other musical features to understand how they can conjure up emotions ranging from joy to extreme sadness to serene relaxation. We meticulously collect and categorize data based on sensory descriptors, setting the stage for machine-learning algorithms to perform mood-based music segmentation.
Drawing inspiration from Music Information Retrieval (MIR), we employ feature extraction techniques to capture the fundamental essence of music, from spectral characteristics to rhythmic patterns and harmonic content. With these musical fingerprints in hand, we develop advanced machine learning models, meticulously trained on labeled datasets, to accurately classify and segment music into distinct emotional states.
Our journey also explores how strategic combinations of musical elements can be wielded to evoke precise emotional responses. We’ll unravel the secrets behind the melancholic allure of a minor key, the tranquil embrace of a slower tempo, and the rhythmic patterns that can tug at the heartstrings. Beyond the realm of theory, the practical applications of this research are vast, spanning fields like music therapy, entertainment, and marketing. It empowers us to create emotionally tailored music repertoires and products that resonate with individuals profoundly. Join us on this melodic expedition as we unlock the emotional power of music and delve into the fascinating world of mood classification and emotional connections.
Related Work
The study titled “The Neurochemistry of Music” by Chanda ML and Levitin DJ, published in Trends in Cognitive Sciences in April 2013, delves into the intricate relationship between music and the brain’s neurochemistry.
In this comprehensive review, the authors explore how music impacts various neural processes and neurotransmitter systems. They discuss how listening to music can trigger the release of neurotransmitters like dopamine and serotonin, which are associated with pleasure and mood regulation. Additionally, the study examines the role of the brain’s reward pathway, the basal ganglia, in music processing and the emotional responses it elicits.
Furthermore, the paper discusses the neural mechanisms behind music’s ability to modulate stress and reduce anxiety, highlighting the involvement of the anterior cingulate gyrus and the anterior insula. The authors emphasize the potential therapeutic applications of music, particularly in stress management and improving mental well-being.
Overall, this study sheds light on the neurochemical underpinnings of music perception and its profound impact on human emotions and cognition. It provides valuable insights into how music can be harnessed for therapeutic purposes, offering a deeper understanding of the brain’s response to this universal art form.
2. Music Mood Classification by Michael Nuzzolo
The classification of music based on mood can be challenging due to the subjective nature of emotional reactions among listeners. Simple harmonic sound sources are associated with ‘darker’ timbres that tend to evoke specific emotions. Higher-energy moods like happiness and excitement are characterized by increased intensity, timbre, pitch, and rhythm compared to lower-energy moods such as calmness and sadness. Various methods of digital signal processing (DSP) can be employed to analyze musical components like rhythm, pitch, intensity, and timbre. The acoustical analysis involves interpreting analog signals representing the vibration of air molecules, which, when played back, translate into voltage variations on a sound card, ultimately affecting speaker cones after amplification. Analyzing these samples helps extract musical elements.
Drum beats’ amplitudes during hits are prominent and can be used to identify patterns like the accented 1 and 3 beats in a 4/4 rhythm, facilitating the calculation of a song’s BPM. The number of harmonics and tone saturation can also indicate mood, with larger values suggesting more saturated tones. Accuracy can improve through further experimentation and data collection across various song types. Experiments compare and contrast audio features in songs across different moods, employing algorithms to identify intensity, timbre, pitch, and rhythm, and comparing them against preset thresholds for mood classification.
While the technology for identifying a song’s mood exists, it has not been widely implemented in commercial music stores. Projects like the Music Genome Project aim to personalize music recommendations based on these classifications, enhancing the radio experience. In conclusion, breaking down songs into quantifiable components like rhythm, harmony, and timbre enables categorization based on expected data for different moods. While not perfect, this classification system finds utility in technical applications such as suggesting similar-mood songs for online radio or automatically organizing extensive music catalogs like iTunes. The effectiveness of such applications depends on the accuracy of results and the speed of classification algorithms.
This research addresses the challenging task of automatically detecting the mood or emotional content of music, a critical aspect in music recommendation systems and content organization. The study is particularly significant as it targets two diverse music genres, Western and Hindi, reflecting distinct musical styles and emotional expressions.
The primary approach employed in this research is audio feature extraction, a common technique in music analysis. The authors utilize various audio features to capture essential characteristics of music tracks, such as rhythm, pitch, and timbre. These features are then used as inputs to a classification algorithm.
One notable aspect of this study is its efficiency in dealing with a large dataset of music tracks, which is essential for real-world applications. The algorithm’s efficiency is crucial for timely and accurate mood detection, especially in systems with high user demand.
The research contributes to the field by achieving significant accuracy in music mood detection across Western and Hindi music genres. The authors demonstrate that the proposed algorithm can effectively distinguish between different mood categories, providing valuable insights into the emotional content of music.
Additionally, the paper provides insights into the potential applications of mood detection, such as personalized music recommendations based on the listener’s emotional preferences and mood-based music playlists.
In conclusion, the paper by A. S. Bhat, V. S. Amith, N. S. Prasad, and D. M. Mohan present an efficient classification algorithm for music mood detection in Western and Hindi music genres using audio feature extraction. The research offers a valuable contribution to the field of music analysis and has the potential to enhance user experiences in music-related applications by accurately identifying the emotional content of music tracks.
4. When Lyrics Outperform Audio For Music Mood Classification: A Feature Analysis
The paper titled “When Lyrics Outperform Audio for Music Mood Classification: A Feature Analysis” by Xiao Hu and J. Stephen Downie explores the effectiveness of using different types of features, including lyrics and audio, for music mood classification. The study is based on a dataset of 5,296 songs categorized into 18 mood categories derived from user-generated tags.
The key objectives of the paper are to determine which source, lyrics or audio, is more useful for music classification, identify moods where audio is more useful, and analyze the association between lyric features and different mood categories.
The findings reveal that the effectiveness of lyrics versus audio features depends on the specific feature set used and the mood category being considered. In several categories such as “romantic,” “angry,” and “hopeful,” lyric features significantly outperform audio, highlighting the semantic relevance of certain words in capturing mood. Conversely, in the “calm” category, audio spectral features excel.
The paper also ranks the top features in various categories where lyric features outperformed audio. These features provide valuable insights into the connection between specific words and moods.
Additionally, the authors discuss the use of psycholinguistic resources like General Inquirer and Affective Norms for English Words (ANEW) to enhance lyric feature analysis.
In summary, the paper emphasizes that the effectiveness of lyrics versus audio for music mood classification varies depending on the feature type and the specific mood category. It highlights the importance of considering both sources for a comprehensive understanding of how sound and text interact to establish music mood.
Examples
Before we dive deeper into the intricacies of our “Music Mood Classification” project, it’s essential to explore some of the inspiring work being done in the realm of understanding the emotional dimensions of music.
A few noteworthy examples are:
Thayer’s Emotional Model: One of the classic approaches in this field is Thayer’s Emotional Model, which seeks to understand how music influences human affect. It explores the interplay between musical elements and emotional states, shedding light on the profound connection between music and our moods.
Expert Annotation: Some projects have relied on the expertise of music and emotion psychologists. These experts listen to music and annotate it with detailed emotional descriptions. Such annotated datasets provide valuable insights into the emotional nuances of music.
Classification via Tags: Music mood classification often involves tagging songs with emotional descriptors such as “happy,” “sad,” “energetic,” and “relaxing.” Machine learning models can then be trained on these tags to automatically classify songs based on their emotional content.
Visualizations: To better understand the emotional impact of music, various visual representations have been employed. Two-dimensional grids and emotion wheels, such as Plutchik’s and the Basic Emotion Wheel, are used to map the emotional landscape of songs.
Feature Extraction and Machine Learning: Many researchers use techniques from Music Information Retrieval (MIR) to extract features from music, including spectral characteristics, rhythmic patterns, and harmonic content. Advanced machine learning models are then applied to classify music based on mood.
Real-world Applications: Music mood classification isn’t just an academic pursuit; it has tangible real-world applications. It has been utilized in music therapy to create playlists that cater to specific emotional needs. In entertainment and marketing, it informs the selection of music that resonates with audiences on a deep emotional level.
The inclusion of various examples, from Thayer’s Emotional Model to expert annotation and feature extraction, adds depth to the understanding of music mood classification. These examples represent just a glimpse of the vibrant landscape of research and projects that explore the emotional dimensions of music. As we embark on our own journey of music mood classification, we draw inspiration from these endeavors, recognizing the profound impact music has on our emotional well-being and the boundless possibilities it presents.
An example of video implementation of the Musical mood classification
An implementation method can be visualized as follows:
Datasets
Relevant datasets, such as FMA, MSD, GTZAN Genre Collection, EmoReact, and DEAM, are essential for researchers and data enthusiasts who might want to delve into this field.
Free Music Archive (FMA):
FMA provides a large collection of audio tracks with mood annotations. It includes various features such as MFCC (Mel-frequency cepstral coefficients) and Chroma features
Website: https://paperswithcode.com/paper/fma-a-dataset-for-music-analysis
Dataset: https://github.com/mdeff/fma
Million Song Dataset (MSD):
MSD is a comprehensive dataset with audio and metadata for a million songs. It includes features like timbre, chroma, and more.
Website: http://millionsongdataset.com/
Dataset: http://millionsongdataset.com/pages/getting-dataset/
GTZAN Genre Collection:
Though primarily used for genre classification, you can also perform mood classification on this dataset. It contains 30-second audio clips across various genres. Features like MFCC can be extracted for mood analysis.
Dataset: http://marsyas.info/downloads/datasets.html/
EmoReact:
This dataset contains audio and video clips labeled with emotions. You can extract mood-related features from the audio portion for mood classification.
Dataset: https://github.com/fabioperez/emotion_classification/tree/master/dataset/EmReact/
DEAM (Dynamic and Emotional Analysis of Music)
DEAM provides a collection of music with annotations for arousal and valence. Arousal and valence can be used as proxies for mood.
Dataset: https://cvml.unige.ch/databases/DEAM/
Demo Projects
A couple of projects that demonstrate relevant results that we are trying to achieve are mentioned below:
https://github.com/veerryait/Mood-Therapy-using-music-and-emotion-detectio n/tree/main/
System Description
Exploratory Data Analysis (EDA):
For the purpose of this project, we’ve used the FMA dataset.
During the preliminary exploration of data, the following was derived:
The acoustic values of most songs were concentrated towards either end of the spectrum, i.e., closer to 0 and 1.
The tempo values are more center concentrated with two peaks in the distribution, one closer to the 80–100 range and another one in the 130–135 range.
The danceability feature values are centered with a broader distribution for the values instead of being concentrated on a few points.
We also calculated the correlation between different music features to understand how each feature would affect the others while training the model.
The graph shows a correlation matrix between different features of audio from the FMA dataset.
The correlation matrix shows how strongly each pair of features is correlated. A correlation of 1 means that the two features are perfectly correlated, while a correlation of -1 means that they are perfectly anti-correlated. A correlation of 0 means that the two features are not correlated at all.
The graph shows that some features are more strongly correlated than others. For example, the energy feature is highly correlated with the acousticness feature.
Other features are less strongly or even negatively correlated. For example, the tempo feature is negatively correlated with the liveness feature.
Overall, the correlation matrix provides a useful overview of the relationships between different features of audio. This information can be used for a variety of tasks, such as audio classification, music recommendation, and audio restoration.
Machine Learning Models:
In this section, we discuss the machine learning models employed to classify music tracks into different mood categories. Our approach involves the utilization of Deep Neural Networks (DNN) and Convolutional Neural Networks (CNN), each tailored to extract and process features from audio data efficiently.
5.1. Deep Neural Networks (DNN):
- Deep Neural Networks, also known as feedforward neural networks, are a fundamental component of our mood classification system.
- DNNs consist of multiple layers, including input, hidden, and output layers, which are interconnected via weighted connections. These connections are adjusted during the training process.
- We employ DNNs for the classification of audio features extracted from music tracks. These features include spectral information, pitch, and timbre, which are critical in determining the emotional content of the music.
- Our DNN architecture includes fully connected layers with non-linear activation functions, such as ReLU (Rectified Linear Unit), and output layers with softmax activation to predict the mood class probabilities.
- The DNN model is trained using backpropagation and gradient descent techniques. Hyperparameter tuning and cross-validation are performed to optimize model performance.
Model Training:
After trying multiple combinations of the number of layers as well as different activation functions and learning rates, the final model used looked as follows:
Music_mood_model = Sequential([
Input(shape=(525,)),
Dense(1024,activation='tanh'),
Dense(512,activation='tanh'),
Dense(256,activation='tanh'),
Dense(256,activation='tanh'),
Dense(512,activation='tanh'),
Dense(24,activation='relu'),
Dropout(0.2),
Dense(8,activation='relu'),
Dense(8,activation='tanh'),
Dense(8,activation='tanh'),
Dense(2, activation='linear')
])
# acc = 71.4 learning_rate=0.000001 batch=256
Classification Methodology and Evaluation Metrix:
we’ve developed a method that classifies music into four distinct quadrants, each representing a unique emotional space. This approach allows us to place music tracks into different categories based on their predicted values of valence and energy. Here’s how we do it:
1. Happy and Energetic:
- Music falling into this quadrant is characterized by high valence (positive emotions) and high energy levels. It’s the kind of music that makes you want to dance, filled with joy and vitality.
2. Happy but Calm:
- In this quadrant, music still carries a sense of happiness, with a high valence, but it’s on the calmer side in terms of energy. It might exude a content and soothing vibe, perfect for relaxation.
3. Sad but Energetic:
- This quadrant features music with a lower valence, evoking sadder or more introspective emotions, but it’s paired with high energy. It can be emotionally intense and thought-provoking.
4. Sad and Calm:
- Finally, music in this quadrant conveys a sense of sadness or melancholy, with both valence and energy on the lower end of the spectrum. It’s the kind of music that might be best suited for introspection or relaxation.
def classify_mood(valence, energy):
if valence >= 0.5 and energy >= 0.5:
return "Happy and Energetic"
elif valence >= 0.5 and energy < 0.5:
return "Happy but Calm"
elif valence < 0.5 and energy >= 0.5:
return "Sad but Energetic"
else:
return "Sad and Calm"
Why Quadrants Matter?
By classifying music into these quadrants, we aim to provide a more nuanced understanding of its emotional qualities. Instead of oversimplifying music’s emotional depth, we can visualize its diverse range of feelings in a structured and insightful way.
This approach not only enhances our ability to understand music but also improves the visualization of data. It helps us create a more comprehensive emotional map of music, one that captures the intricacies of joy, calm, energy, and sadness. And, while our model is linear in its predictions, this quadrant system offers a multi-dimensional perspective on music emotion, making it easier to navigate and appreciate the emotional richness of the songs we love.
Results:
Discussion:
1. System Performance
The effectiveness of the MMCS in classifying music into different mood categories is a crucial aspect of its evaluation. Our results demonstrate that the system achieves a high level of accuracy and consistency in mood classification. However, it is important to acknowledge that the performance of the MMCS can vary depending on the dataset used and the specific mood categories defined.
Moreover, the choice of evaluation metrics is essential. While accuracy, precision, recall, and F1 score are commonly used metrics in classification tasks, they may not fully capture the nuances of mood classification. Future work could explore more holistic evaluation methods that consider factors such as user satisfaction and the ability to capture subtle mood transitions in music.
2. Applications
The MMCS has a wide range of potential applications in the fields of music recommendation, content personalization, and music therapy. It can enhance the user experience by providing tailored music playlists based on mood preferences. Additionally, it can be utilized in the film and gaming industry to synchronize music with the emotional context of the content.
Furthermore, the MMCS can play a significant role in music therapy and mental health applications. It can be used to curate playlists that help individuals manage their emotional states and improve their well-being. This highlights the potential societal impact of the system.
3. Challenges and Limitations
Despite the promising performance of the MMCS, there are several challenges and limitations that need to be addressed:
- Subjectivity: Mood is a highly subjective concept, and different individuals may perceive and interpret the same piece of music differently. The system may not always align with the user’s emotional state, and this subjectivity should be acknowledged.
- Data Bias: The performance of the system heavily relies on the quality and diversity of the training data. Biases in the training data, such as cultural or genre biases, can result in misclassifications or a limited scope of moods recognized.
- Context and Time Sensitivity: Mood is context-dependent and can change over time. The MMCS may struggle to adapt to dynamic and evolving emotional states in a real-time setting, such as in the context of live events or interactive applications.
- Interdisciplinary Collaboration: To enhance the system’s performance and address these challenges, interdisciplinary collaboration between computer scientists, musicologists, and psychologists is essential. This can lead to a more nuanced understanding of music and emotions, improving the accuracy of mood classification.
Conclusion:
In conclusion, the Music Mood Classification System represents a significant advancement in the field of music information retrieval and has the potential to positively impact various industries and individuals. As we continue to refine and expand its capabilities, we must remain cognizant of the system’s limitations and work collaboratively to overcome them, making the world of music more emotionally resonant and accessible. Another important use of the system could be in music therapy, enhancing the emotional and therapeutic impact of music sessions. By accurately categorizing music based on mood, it enables music therapists to tailor their playlists to match the emotional needs of individual clients. This personalized approach can help individuals explore and manage their emotions, reduce stress, and improve their overall well-being. MMCS empowers music therapists to curate therapeutic experiences that resonate with the unique emotional states and preferences of their clients, making music therapy sessions more effective and meaningful in promoting emotional healing and personal growth.
Future Work:
As with any research project, there are several avenues for future work and enhancements that can further improve the accuracy and applicability of our music mood classification system:
1. Broadened Dataset:
- Expanding the dataset to include a wider variety of music genres, languages, and cultural contexts. A more diverse dataset will make the model more robust and applicable to a broader audience.
- Including a more extensive range of historical and contemporary music to capture evolving trends and changing musical expressions.
2. Compatibility with Audio Files Beyond Spotify Links:
- Developing a system that accepts audio files in various formats (e.g., MP3, WAV) rather than relying solely on Spotify links. This would allow users to analyze their own music libraries.
- Exploring audio file conversion and processing techniques to make the system more versatile and accessible.
3. User-Uploaded Audio File Feature Extraction:
- Implementing a feature extraction module that can process audio files uploaded by users directly. This feature would enable individuals to analyze their personal music collections, further enhancing the system’s practicality.
- Extending feature extraction to include not only audio features but also user-generated metadata, such as song titles, artist information, and user-provided mood labels.
4. Enhanced User Experience:
- Developing a user-friendly web or mobile application that integrates the mood classification system, making it easily accessible to a broader audience.
- Implementing personalized recommendations based on the user’s mood preferences, which would require user profiling and continuous learning.
5. Real-Time Mood Detection:
- Exploring the feasibility of real-time mood detection during music playback. This would enhance user experiences by adapting music playlists based on their current mood.
6. Multimodal Mood Classification:
- Combining audio analysis with text analysis of song lyrics to provide a more comprehensive mood classification. Lyrics often contain valuable emotional context that can complement audio-based analysis.
7. User Feedback Integration:
- Incorporating user feedback and crowd-sourced mood annotations to improve model performance over time. Users could confirm or correct mood classifications, helping the model learn and adapt.
8. Cross-Platform Integration:
- Ensuring that the system can be integrated with popular music streaming platforms (e.g., Spotify, Apple Music) and music player applications.
9. Ethical Considerations:
- Addressing ethical concerns related to user privacy, data security, and the potential for mood classifications to be misused. Developing transparent and responsible guidelines for system usage and data handling.
By pursuing these future work directions, we aim to create a more robust, versatile, and user-centered music mood classification system that can benefit a broader user base and adapt to evolving music trends and technologies. These enhancements will contribute to a more comprehensive understanding of the emotional aspects of music and its impact on people’s lives.
References: