alternative
  • Home (current)
  • About
  • Tutorial
    Technologies
    C#
    Deep Learning
    Statistics for AIML
    Natural Language Processing
    Machine Learning
    SQL -Structured Query Language
    Python
    Ethical Hacking
    Placement Preparation
    Quantitative Aptitude
    View All Tutorial
  • Quiz
    C#
    SQL -Structured Query Language
    Quantitative Aptitude
    Java
    View All Quiz Course
  • Q & A
    C#
    Quantitative Aptitude
    Java
    View All Q & A course
  • Programs
  • Articles
    Identity And Access Management
    Artificial Intelligence & Machine Learning Project
    How to publish your local website on github pages with a custom domain name?
    How to download and install Xampp on Window Operating System ?
    How To Download And Install MySql Workbench
    How to install Pycharm ?
    How to install Python ?
    How to download and install Visual Studio IDE taking an example of C# (C Sharp)
    View All Post
  • Tools
    Program Compiler
    Sql Compiler
    Replace Multiple Text
    Meta Data From Multiple Url
  • Contact
  • User
    Login
    Register

Machine Learning - Supervised Learning - Information Gain Tutorial

We can define information gain as a measure of how much information a feature/nodes provides about a class. Information gain helps to determine the order of attributes in the nodes of a decision tree.

The information gain is based on the decrease in entropy after a dataset is split on an attribute. Constructing a decision tree is all about finding attribute that return the highest information gain.

Information Gain= Entropy(S)- [(Weighted Avg) *Entropy(each feature)

 

Example:
Now, lets draw a Decision Tree for the following data using Information gain.

Training set: 3 features(X,Y,Z) and 2 classes(I & II)

X

Y

Z

C

1

1

1

I

1

1

0

I

0

0

1

II

1

0

0

II


To build a decision tree using Information gain. We will take each of the feature and calculate the information for each feature. 

 

Split on feature X


https://media.geeksforgeeks.org/wp-content/uploads/tr4.png

Split on feature Y

https://media.geeksforgeeks.org/wp-content/cdn-uploads/20210317184559/y-attribute.png

Split on feature Z

https://media.geeksforgeeks.org/wp-content/cdn-uploads/20210317184631/z-attribute.png

 

From the above images we can see that the information gain is maximum when we make a split on feature Y. So, for the root node best suited feature is feature Y. Now we can see that while splitting the dataset by feature Y, the child contains pure subset of the target variable. So we don’t need to further split the dataset.

The final tree for the above dataset would be look like this:
https://media.geeksforgeeks.org/wp-content/uploads/tr6.png

 

  • Gini index/impurity

It is always between 0 and 0.5, where 0 mean all the sample got the same result, and 0.5 mean that split is done at the exactly middle.

  • Gini index is a measure of impurity or purity used while creating a decision tree in the CART (Classification and Regression Tree) algorithm.
  • An attribute with the low Gini index should be preferred as compared to the high Gini index.
  • It only creates binary splits, and the CART algorithm use only Gini index to create binary splits.
  • Gini index can be calculated using the below formula:

Gini Index= 1- (Py2 + Pn2)

 

 

  • Pruning: Getting an Optimal Decision Tree

 

Pruning is a process of deleting unnecessary nodes from a tree without reducing accuracy in order to get the optimal decision tree.

Pruning helps to reduce the risk of overfitting on the large tree by making it small tree of important features.

Pruning can occur in:

  • Top-down fashion. It will traverse nodes and trim subtrees starting at the root
  • Bottom-up fashion. It will begin at the leaf nodes

Two types of Pruning Technology is-

  • Cost Complexity Pruning
  • Reduced Error Pruning.

There is a popular pruning algorithm called reduced error pruning, in which:

Starting at the leaves, each node is replaced with its most popular class

If the prediction accuracy is not affected, the change is kept

There is an advantage of simplicity and speed

 

 

Machine Learning

Machine Learning

  • Introduction
  • Overview
    • Type Of Machine Learning
    • Batch Vs Online Machine Learning
    • Instance Vs Model Based Learning
    • Challenges in Machine Learning
    • Machine Learning Development Life Cycle
  • Machine Learning Development Life Cycle
    • Framing the Problem
    • Data Gathering
    • Understanding your Data
    • Exploratory Data Analysis (EDA)
    • Feature Engineering
    • Principal Component Analysis
    • Column Transformer
    • Machine Learning Pipelines
    • Mathematical Transformation
    • Binning and Binarization | Discretization | Quantile Binning | KMeans Binning
  • Supervised Learning
    • Overview
    • Linear Regression [Regression]
    • Multiple Linear Regression
    • Polynomial Linear Regression [Regression]
    • Bias Variance Trade Off
    • Regularization
    • LOGISTIC REGRESSION [Regression & Classification]
    • Polynomial Logistic Regression
    • Support Vector Machines / Support Vector Regressor
    • Naïve Bayes Classifier [classification]
    • Decision Tree
    • Entropy
    • Information Gain
    • K Nearest Neighbor (KNN)
    • Neural Network (MultiLayer Perceptron)
  • Ensemble Learning
    • Introduction to Ensemble Learning
    • Basic Ensemble Techniques
    • Advanced Ensemble Techniques
    • Random Forest Classifier
    • Boosting
  • UnSupervised Learning
    • Overview
    • K Mean Clustering

About Fresherbell

Best learning portal that provides you great learning experience of various technologies with modern compilation tools and technique

Important Links

Don't hesitate to give us a call or send us a contact form message

Terms & Conditions
Privacy Policy
Contact Us

Social Media

© Untitled. All rights reserved. Demo Images: Unsplash. Design: HTML5 UP.