Member-only story
K-Nearest Neighbors — A Complete Guide

K-Nearest Neighbors is a simple and easy to use supervised machine learning algorithm. It can be used for both classification as well as regression problems. The disparity is due to the dependent variable’s characteristics. The dependent variable in classification KNN is categorical whereas the dependent variable in regression KNN is continuous. We will look at both of these in detail with an example.
KNN is a Lazy Algorithm as it does not perform training when the data is passed, rather it just saves that data and training in done when a query is passed. KNN works by measuring the distance between the query and all the points in the data, selecting the specified number of points closest to the query either by considering the most frequent label (in the case of classification) or averaging the labels (in the case of regression).
Also, it is a non-parametric algorithm, that is, it does not assume anything about the underlying dataset.
KNN Classifier:
We will use the breast cancer Wisconsin dataset. This is a classification problem where the aim is to classify instances as either being malignant or benign based on the following 10 features:
- radius (mean of distances from center to points on the perimeter)
- texture (standard deviation of gray-scale values)