alternative
  • Home (current)
  • About
  • Tutorial
    Technologies
    C#
    Deep Learning
    Statistics for AIML
    Natural Language Processing
    Machine Learning
    SQL -Structured Query Language
    Python
    Ethical Hacking
    Placement Preparation
    Quantitative Aptitude
    View All Tutorial
  • Quiz
    C#
    SQL -Structured Query Language
    Quantitative Aptitude
    Java
    View All Quiz Course
  • Q & A
    C#
    Quantitative Aptitude
    Java
    View All Q & A course
  • Programs
  • Articles
    Identity And Access Management
    Artificial Intelligence & Machine Learning Project
    How to publish your local website on github pages with a custom domain name?
    How to download and install Xampp on Window Operating System ?
    How To Download And Install MySql Workbench
    How to install Pycharm ?
    How to install Python ?
    How to download and install Visual Studio IDE taking an example of C# (C Sharp)
    View All Post
  • Tools
    Program Compiler
    Sql Compiler
    Replace Multiple Text
    Meta Data From Multiple Url
  • Contact
  • User
    Login
    Register

Statistics for AIML - Regression Metrics - Overview Tutorial

Total Absolute Error = |Y1 –Y1h| + |Y1 –Y1h| + …. + |Yn –Ynh|

Squared Error/Loss Function-

The loss function is a method of evaluating how well your machine learning algorithm trains your featured data set. In other words, loss functions are a measurement of how good your model is in terms of predicting the expected outcome.

 

Loss Function Vs Cost Function

A function that calculates loss for 1 data point is called the loss function.

squared error

A function that calculates loss for the entire data point used is called the cost function.

mse formula

 

The lower value of MAE, MSE, and RMSE implies higher accuracy of a regression model.

Error(Cost Function)

  1. MAE –Mean Absolute Error

The Mean absolute error represents the average of the absolute difference between the actual and predicted values in the dataset. It measures the average of the residuals in the dataset.

https://miro.medium.com/max/578/1*_mx6zD6Qw_1sYEvWszK5Nw.png

https://miro.medium.com/max/748/1*3DI_3QdQgyBWixpRxTgCHQ.png

Advantage – 

Easy To understand

Same unit

Robust to Outlier

 

Disadvantage – 

Graph is not differentiable at 0

 

  1. MSE – Mean Squared Error

Mean Squared Error represents the average of the squared difference between the original and predicted values in the data set. It measures the variance of the residuals.

https://miro.medium.com/max/532/1*1TV6T6hNP-iwJrtvfsvCRQ.png

Advantage –

Use to calculate loss function, because Graph is differentiable at all point

Easy to interpret.

1 local minima

Disadvantage – 

Unit are in Square

Not robust to outlier(if there is outlier, then it will not perform well)


 

  1. RMSE – Root Mean Squared Error

Root Mean Squared Error is the square root of Mean Squared error. It measures the standard deviation of residuals.

https://miro.medium.com/max/850/1*usaMSyi6jUT3f2bOMyiYdA.png

 

Advantage – Same unit, Graph is differentiable at all point

Disadvantage –Not robust to outlier


 

  1. Huber Loss

Now we know that the MSE is great for learning outliers while the MAE is great for ignoring them. But what about something in the middle.

For example there is 25% abnormal point in the dataset,then we will not consider it outlier,  to solve such issue we use Huber loss.

 

If data has outlier, then Huber loss function will behave like mean squared error, If there is no outlier then it will behave like mean absolute error.


 

Advantage – Same unit, Graph is differentiable at all point

Disadvantage –Not robust to outlier

  1. Binary Cross Entropy Loss(Log Loss)

It is used for binary classification problem, if we have only two class like ‘YES’ or ‘NO’

Advantage –Graph is differentiable at all point

Disadvantage –Multiple Local minima

 

  1. Category Cross Entropy [Used in Softmax Regression][Deep Learning]

It is used for multiclass classification problem. If we have more than 2 class for example – Morning, Evening , Night.

Activation Function to be used is Softmax to train Multilayer Perceptron and no. of output should be same as no. of class.

 

Difference between Sparse Categorical Cross Entropy and Categorical Cross Entropy is?

Both are same, only difference is Sparse Categorical Cross Entropy is fast, because in this there is no need to do one hot encoding

Formula for cost function

 

Accuracy

  1. R2 Score – Coefficient Of Determination

The coefficient of determination or R-squared represents the proportion of the variance in the dependent variable which is explained by the independent variable. R2 Score range from 0 to 1

OR

R2 score calculates to compare how much Linear Regression line is better than the Mean line

Higher R2 score is better than lower R2 score

SSr = Sum of squared error in Regression Line

SSm = Sum of squared error in Mean Line

R2 = 1 – SSrSSm

 

https://miro.medium.com/max/548/1*X0_3mtDXwuhd3dl88xR4yA.png

Disadvantage – It will not work in multiple input.

 

Here, yi is the observed target value, y(bar)i is the predicted value, and y-bar is the mean value, m represents the total number of observations.

 

R2 score can also be negative .(worst line)

 

  1. Adjusted R2 Score

Adjusted R squared is a modified version of R square, and it is adjusted for the number of independent variables in the model, and it will always be less than or equal to R².In the formula below n is the number of observations in the data and k is the number of the independent variables in the data.

https://miro.medium.com/max/524/1*bBrMI54b85FGilV_ciAL-Q.png

 

Why Adjusted R2 score come into picture, if we already have R2 Score?

If we add irrelevant column suppose, temp in cgpa-package dataset. Then k=1, denominator will decrease .hence R2 adjusted will decrease, which is right, but in case of R2 score it will increase. Hence, Adjusted R2 score come into picture.

But if we add relevant column suppose, IQ in cgpa-package dataset. Then k=1, denominator will also decrease and (1-R2) will also decrease. The rate of decreasing (1-R2) is much more than denominator. Hence Adjusted R2 score will increase in this case.

Statistics for AIML

Statistics for AIML

  • Introduction
  • Data Visualization
    • Overview
  • Descriptive statistics
    • Overview
    • Calculate Z Score
    • Covariance and Covariance matrix
    • Covariance vs. Correlation
    • QQ-Plot
    • Central Limit Theorem
  • Inferential Statistics
    • Overview
    • Hypothesis Testing
    • Statistical Test and there types
    • Bias Variance Trade Off
  • Regression Metrics
    • Overview
    • Accuracy
    • PR Curve (Precision-Recall Curve)
    • AUC-ROC Curve
    • Different types of Sampling
    • Skewness
    • Kurtosis
    • Degree Of freedom
    • Different Types Of Probability Distribution
    • Outlier
    • Bayes Theorem
    • Probability

About Fresherbell

Best learning portal that provides you great learning experience of various technologies with modern compilation tools and technique

Important Links

Don't hesitate to give us a call or send us a contact form message

Terms & Conditions
Privacy Policy
Contact Us

Social Media

© Untitled. All rights reserved. Demo Images: Unsplash. Design: HTML5 UP.

Toggle