KNN |
K-Means |
When you look at
the names of KNN and Kmeans algorithms you may what to ask if Kmeans is
related to the k-Nearest Neighbors algorithm? And one could make the mistake of
saying they’re related after all they both have "k" in their
names and logically that they're both machine learning algorithms, that is
finding ways to label things, even though not the same types of things. But
again the k's refer to completely different things. And that "k" in
kmeans has absolutely nothing to do with the "k" in knn
K-Nearest Neighbors:
The k-nearest-neighbors
algorithm is a classification algorithm, and it is supervised:
it takes a bunch of labeled points and uses them to learn how to
label other points. To label a new point, it looks at the labeled points
closest to that new point those are its nearest neighbors, and has those
neighbors vote, so whichever label the most of the neighbors have is the label
for the new point the "k" is the number of neighbors it checks. It is
supervised because you are trying to classify a point based on the known
classification of other points. For example
If I have a dataset of
Soccer players, their positions, and their measurements, and I want to assign
positions to Soccer players in a new dataset where I have measurements but no
positions, I might use k-nearest neighbors.
K-means:
The k-means algorithm is a clustering algorithm, and it is unsupervised: it takes a bunch of unlabeled points and tries to group them into clusters the "k" is the number of clusters.
It is unsupervised
because the points have no external classification.
The k in k-means means
the number of clusters I want to have in the end. If k = 5, I will have 5
clusters, or distinct groups, of Soccer players after I run the algorithm on my
dataset.
For example if I have a dataset of Soccer players who need to be grouped into k distinct groups based off of similarity, I might use k-means.
Correspondingly, the K in each case also means different things! In k-nearest neighbors, the k represents the number of neighbors who have a vote in determining a new player's position. Take the example where k =4. If I have a new Soccer player who needs a position, I take the 4 Soccer players in my dataset with measurements closest to my new Soccer player, and I have them vote on the position that I should assign the new player.
In summary, they are two different algorithms with two very different end results, but the fact that they both use k can be very confusing!
For example if I have a dataset of Soccer players who need to be grouped into k distinct groups based off of similarity, I might use k-means.
Correspondingly, the K in each case also means different things! In k-nearest neighbors, the k represents the number of neighbors who have a vote in determining a new player's position. Take the example where k =4. If I have a new Soccer player who needs a position, I take the 4 Soccer players in my dataset with measurements closest to my new Soccer player, and I have them vote on the position that I should assign the new player.
In summary, they are two different algorithms with two very different end results, but the fact that they both use k can be very confusing!
No comments:
Post a Comment