Projectwale,Opp. DMCE,Airoli,sector 2
projectwale@gmail.com

Speech Emotion

Speech Emotion

ABSTRACT: –

 

The main goal of this work is to create an elementary, efficient, and practical model that uses machine learning techniques as a fundamental feature and where we can trust the system to provide accurate and error-free results. Designing and running this job is quite easy with the help of a Python programming framework called Librosa. The operation is heavily dependent on this language. Besides writing the code, there are some other parts of this work that will be explored in depth. The technique is straightforward, and we have approached this work conscientiously. This includes adding some of the new functions and features to the existing execution process. This proposed system has two outcomes: preliminary and final. The first result is simply the answer that can be obtained by matching an existing cue to the input provided for it. We built the system to access this response and run further tests by collecting more samples and analyzing them to generate a pattern-based response. The system is built on the basis of this pattern. To figure out the pattern or extract features from the audio, we use librosa, a Python module, to parse the audio. Librosa takes all the tonal information from the audio and modifies it using the Fuourir transform before plotting any extracted values or patterns onto graphs to represent them visually. We will give this system random inputs to test its functionality, response generation, crashes, etc., to find out the efficiency of the model. To classify speech, we use deep learning algorithms from Inception V3 and the Exception B7. So we test the model to see how well it works. The overall accuracy of the system ranges from 60% to 75%. But our proposed system goes up to 85–90%, and when it goes over 95%, we will lower some values to address the problem of overfitting in the algorithms. This Inception  v3 launch and Exception B7 exception are fed by the SAVEE dataset, which is available in Kaggle and contains audio files at 16000 Hz. We extracted features using Librosa, which represents the audio. After training the starting model, we combine this model with some seed neural network layers and add some dropout layers that randomly remove 20% of the neural layers. Also, we use RELU activation in the hidden layer.

 

SYSTEM:-

 

The Sentiment of Speech Project is a way of measuring a speaker’s emotions based on his speech. The system uses advanced learning techniques to measure speech patterns and detect emotions such as happiness, sadness, anger, fear, and conflict.

 

The system consists of the following:

 

  • Speech Recognition Module: Speech Recognition Module is in charge of converting speech to text. The speech recognition module uses speech recognition techniques such as Hidden Markov Models (HMM), Deep Neural Networks (DNN), or Convolutional Neural Networks (CNN).

 

  • Feature Extraction Module: Feature extraction module extracts related features such as volume, intensity, and frequency from speech.
  • The module extracts features using techniques such as Mel frequency cepstral coefficients (MFCC), linear predictive coding (LPC), or perceptual linear prediction (PLP).

 

  • Emotion Classification Module: The Emotion Classification Module is responsible for classifying the speaker’s emotions according to the extracted features. This model isolates emotions using machine learning algorithms such as support vector machines (SVM), random forests, or deep neural networks (DNN).

 

  • Application integration: the system integrates various applications such as call centers, voice assistants and customer service centers. The system uses a need state to provide personalized responses to the speaker.
  • Feedback Module: The feedback module receives feedback from the speaker about the accuracy of the cognitive algorithm. The feedback is used to improve the performance of the system.

Speech Emotional System The

Speech Emotional System is a powerful tool that can be used in a variety of applications such as call center, psychological diagnosis, and speech therapy. The system provides accurate feedback based on the speaker’s voice, enabling personalized responses and better communication.

 

PROPOSED SYSTEM:-

 

The main goal of this work is to create an elementary, efficient, and practical model that uses machine learning techniques as a fundamental feature and where we can trust the system to provide accurate and error-free results. Designing and running this job is quite easy with the help of a Python programming framework called Librosa. The operation is heavily dependent on this language. Besides writing the code, there are some other parts of this work that will be explored in depth. The technique is straightforward, and we have approached this work conscientiously. This includes adding some of the new functions and features to the existing execution process. This proposed system has two outcomes: preliminary and final. The first result is simply the answer that can be obtained by matching an existing cue to the input provided for it. We built the system to access this response and run further tests by collecting more samples and analysing them to generate a pattern-based response. The system is built on the basis of this pattern. To figure out the pattern or extract features from the audio, we use librosa, a Python module, to parse the audio. Librosa takes all the tonal information from the audio and modifies it using the Fuourir transform before plotting any extracted values or patterns onto graphs to represent them visually. We will give this system random inputs to test its functionality, response generation, crashes, etc., to find out the efficiency of the model. To classify speech, we use deep learning algorithms from v3 and the exception b7. So we test the model to see how well it works. The overall accuracy of the system ranges from 60% to 75%. But our proposed system goes up to 85–90%, and when it goes over 95%, we will lower some values to address the problem of overfitting in the algorithms. This v3 launch and b7 exception are fed by the SAVEE dataset, which is available in Kaggle and contains audio files at 16000 Hz. We extracted features using Librosa, which represents the audio. After training the starting model, we combine this model with some seed neural network layers and add some dropout layers that randomly remove 20% of the neural layers. Also, we use RELU activation in the hidden layer.

 

MODULES:-

 

Speech thought division is responsible for dividing the speaker’s thought according to the speaker’s speech. This module uses the speech signal as input and displays the emotional state.

 

This model includes the following steps:

 

  • Speech preprocessing: Preprocessing of the speech signal to remove noise and other undesirable effects. This is done using techniques such as filtering, normalization and signal enhancement.

 

  • Feature Extraction: Remove related features from the predicate.
  • These properties can be temporal properties such as pitch, power, and duration, or frequency properties such as Mel Cepstral Coefficients (MFCC).

 

  • Feature Selection: The extracted features are then processed to select the most important features for classification theory. This is done using techniques such as Principal Component Analysis (PCA), Latent Analysis (LDA), or sequencing techniques.

 

  • Emotion Classification: Selected features are used to describe the speaker’s emotional state. This is done using machine learning algorithms such as support vector machines (SVM), random forests or deep neural networks (DNN).

 

 

 

APPLICATION:-

 

Speech Emotion Analyzer is an application that detects the speaker’s emotions through their speech. The application is used to understand the audio signal from the speaker and display the emotional response.

 

The application has the following features:

 

  • Real-time emotional awareness: The application can control the speaker’s mood in real time. This allows the app to provide instant feedback and feedback based on identified needs.

 

  • Emotion Visualization: The app can visualize emotions using graphs and tables.
  • This allows the user to understand the speaker’s emotional state at a glance.

 

  • EMOTIONAL HISTORY: The app tracks the speaker’s emotional history over time. This allows users to identify the speaker’s emotional state and identify changes in his emotional state.

 

  • Emotional Feedback: The app provides feedback on the correctness of the emotional state. This allows users to improve the performance of the app by providing feedback on negative feedback.
  • Integration with other applications: The application can be integrated with other applications such as call centers, customer service centers and mental health tools. This allows the app to provide personalized responses and better communication based on mood detection.

 

Speech Sentiment Analyzer is a powerful tool that can be used in a variety of applications such as call center, customer service and psychological testing. The app provides personalized responses and better communication by providing accurate feedback based on the speaker’s voice.

 

HARDWARE AND SOFTWARE REQUIREMENTS:-

 

HARDWARE:-
  • Processor: Intel Core i3 or more.
  • RAM: 4GB or more.
  • Hard disk: 250 GB or more.
  • Web camera
SOFTWARE:-
  • Operating System : Windows 10, 7, 8.
  • python
  • anaconda
  • Spyder, Jupyter notebook, Flask.
  • Ganache

Leave a Reply

Your email address will not be published. Required fields are marked *