alternative
  • Home (current)
  • About
  • Tutorial
    Technologies
    C#
    Deep Learning
    Statistics for AIML
    Natural Language Processing
    Machine Learning
    SQL -Structured Query Language
    Python
    Ethical Hacking
    Placement Preparation
    Quantitative Aptitude
    View All Tutorial
  • Quiz
    C#
    SQL -Structured Query Language
    Quantitative Aptitude
    Java
    View All Quiz Course
  • Q & A
    C#
    Quantitative Aptitude
    Java
    View All Q & A course
  • Programs
  • Articles
    Identity And Access Management
    Artificial Intelligence & Machine Learning Project
    How to publish your local website on github pages with a custom domain name?
    How to download and install Xampp on Window Operating System ?
    How To Download And Install MySql Workbench
    How to install Pycharm ?
    How to install Python ?
    How to download and install Visual Studio IDE taking an example of C# (C Sharp)
    View All Post
  • Tools
    Program Compiler
    Sql Compiler
    Replace Multiple Text
    Meta Data From Multiple Url
  • Contact
  • User
    Login
    Register

Machine Learning - UnSupervised Learning - K Mean Clustering Tutorial

K-Means clustering is most commonly used unsupervised learning algorithm to find groups in unlabeled data. Here K represents the number of groups or clusters and the process of creating these groups is known as clustering.

In this, the process is repeated until convergence or till it does not find the best clusters.

Inner_Working_K_Means

Step to achieve clustering-

step 1] - Finalize the number of cluster you want to identify in your data. This is the K in K mean clustering.

step 2] - Now randomly initialize the points or centroid anywhere in dataset equal to the number of clusters K.

step 3] - Assign cluster based on the centroid and point near to the centroid.

step 4] - Then again find the centroid using mean/average of cluster that has formed in step 3, the centroid is slightly moved from earlier centroid position

 

If the centroid moved from earlier centroid position, then again repeat step 3 and step 4 till it get converge(Or the centroid stopped moving).

 

step 5] - Once the centroid stopped moving. Then the final cluster and cluster centroid will formed

 

How to choose number of cluster?

To choose number of cluster, we can use-

1] Visualization-

 If we have domain knowledge and proper understanding of given data which also help to make more informed decisions.

Some clusters are visible by naked eye.

 

2] Elbow method

Elbow method is widely used method to find number of cluster in dataset. The elbow method constitutes running  K-Means clustering on the dataset.

Elbow method is achieved by plotting graph between no. of cluster on x axis and WSS (Within sum of squared distance) or inertia on y axis.

In this, we run the K-Means algorithm multiple times over a loop, with an increasing number of cluster choice(say from 1 to 10) and then plotting a graph with respect to WSS and no. of cluster.

For one cluster or one centroid, WSS is calculated by totaling the squared euclidian distance between the each point and centroid.

Example for 1 centroid and 50 data point, the WSS = d12 + d22 + d32 + …. d502

 

And for 50 centroid and 50 data point, the WSS = 0, because distance become zero for all centroid, as all centroid are itself a datapoint.

Which means as the cluster increase then WSS decrease.

After certain point the graph become steady in the form of elbow, that steady point/elbow point can be determined as the number of cluster. In the below graph we can determine the number of cluster as 3

K_Means_Elbow_Method

But sometimes we don’t get clear elbow point on the plot, in such cases its very hard to finalize the number of clusters.

See video number 103

Advantages-

  • One of the simplest algorithm to understand and efficient
  • It can work on any dimension with same piece of code.
  • Gives better results when there is less data overlapping

Disadvantages-

  • Number of clusters need to be defined by user
  • Doesn’t work well in case of overlapping data
  • Unable to handle the noisy data and outliers
  • Algorithm fails for non-linear data set

 

 

Compare K-means and KNN Algorithms.

K-means

KNN

  • K-Means is unsupervised
  • K-Means is a clustering algorithm
  • K represent number of cluster
  • The points in each cluster are similar to each other, and each cluster is different from its neighboring clusters
  • KNN is supervised in nature
  • KNN is a classification algorithm
  • K represent number of nearest neigbor
  • It classifies an unlabeled observation based on its K (can be any number) surrounding neighbors
  • Hierarchical Clustering
  • PCA
Machine Learning

Machine Learning

  • Introduction
  • Overview
    • Type Of Machine Learning
    • Batch Vs Online Machine Learning
    • Instance Vs Model Based Learning
    • Challenges in Machine Learning
    • Machine Learning Development Life Cycle
  • Machine Learning Development Life Cycle
    • Framing the Problem
    • Data Gathering
    • Understanding your Data
    • Exploratory Data Analysis (EDA)
    • Feature Engineering
    • Principal Component Analysis
    • Column Transformer
    • Machine Learning Pipelines
    • Mathematical Transformation
    • Binning and Binarization | Discretization | Quantile Binning | KMeans Binning
  • Supervised Learning
    • Overview
    • Linear Regression [Regression]
    • Multiple Linear Regression
    • Polynomial Linear Regression [Regression]
    • Bias Variance Trade Off
    • Regularization
    • LOGISTIC REGRESSION [Regression & Classification]
    • Polynomial Logistic Regression
    • Support Vector Machines / Support Vector Regressor
    • Naïve Bayes Classifier [classification]
    • Decision Tree
    • Entropy
    • Information Gain
    • K Nearest Neighbor (KNN)
    • Neural Network (MultiLayer Perceptron)
  • Ensemble Learning
    • Introduction to Ensemble Learning
    • Basic Ensemble Techniques
    • Advanced Ensemble Techniques
    • Random Forest Classifier
    • Boosting
  • UnSupervised Learning
    • Overview
    • K Mean Clustering

About Fresherbell

Best learning portal that provides you great learning experience of various technologies with modern compilation tools and technique

Important Links

Don't hesitate to give us a call or send us a contact form message

Terms & Conditions
Privacy Policy
Contact Us

Social Media

© Untitled. All rights reserved. Demo Images: Unsplash. Design: HTML5 UP.

Toggle