Member-only story

K-Nearest Neighbors — A Complete Guide

Harshit Maheshwari
4 min readApr 4, 2021

K-Nearest Neighbors is a simple and easy to use supervised machine learning algorithm. It can be used for both classification as well as regression problems. The disparity is due to the dependent variable’s characteristics. The dependent variable in classification KNN is categorical whereas the dependent variable in regression KNN is continuous. We will look at both of these in detail with an example.

KNN is a Lazy Algorithm as it does not perform training when the data is passed, rather it just saves that data and training in done when a query is passed. KNN works by measuring the distance between the query and all the points in the data, selecting the specified number of points closest to the query either by considering the most frequent label (in the case of classification) or averaging the labels (in the case of regression).

Also, it is a non-parametric algorithm, that is, it does not assume anything about the underlying dataset.

KNN Classifier:
We will use the breast cancer Wisconsin dataset. This is a classification problem where the aim is to classify instances as either being malignant or benign based on the following 10 features:

  1. radius (mean of distances from center to points on the perimeter)
  2. texture (standard deviation of gray-scale values)

Create an account to read the full story.

The author made this story available to Medium members only.
If you’re new to Medium, create a new account to read this story on us.

Or, continue in mobile web

Already have an account? Sign in

Harshit Maheshwari
Harshit Maheshwari

Written by Harshit Maheshwari

Cultivating AI insights for over 5 years, I'm on a mission to demystify the machine learning landscape, one Medium article at a time.

No responses yet

Write a response