Topics Learnt Today:
1: K-Means:
- Strengths:
K-means excels at identifying clusters of data points that are close to each other spatially. It operates by partitioning data into a predetermined number of clusters, with each cluster having a centroid point. The algorithm groups data points based on their proximity to these centroids. It is particularly effective when dealing with clusters of relatively uniform size and shape, making it suitable for situations where clusters are spherical or have similar geometries. - Weaknesses:
K-means struggles with irregularly shaped clusters, as it assumes that clusters are spherical and uniform in size. When clusters are elongated or have varying sizes, K-means may produce suboptimal results, leading to the mixing of data points between clusters.
2: DBSCAN (Density-Based Spatial Clustering of Applications with Noise):
Strengths:
DBSCAN excels at finding clusters of varying shapes and sizes. It identifies clusters based on the density of data points, allowing it to uncover clusters of different geometries.
DBSCAN is particularly adept at identifying outliers within the dataset, as it doesn’t force data points into clusters if they don’t meet density criteria.
Weaknesses:
DBSCAN may require careful parameter tuning, such as the radius of the neighborhood around each point, to yield optimal results. In some cases, inappropriate parameter choices can lead to under-segmentation or over-segmentation of the data.