Previous slide
Next slide
Toggle fullscreen
Open presenter view
Unsupervised Machine Learning
Overview
Definition
: Unsupervised machine learning involves training algorithms on data without labeled responses.
Goal
: Discover underlying patterns, structures, and anomalies in the data.
*
Applications
:
Exploratory data analysis.
Used as a preprocessing step for supervised learning.
Clustering
What is Clustering?
Definition
: Clustering is the task of dividing a set of data points into groups (clusters) such that points in the same group are more similar to each other than to those in other groups.
Applications
: Market segmentation, document clustering, image segmentation, etc.
K-Means Clustering
Algorithm
:
Initialize
cluster centroids randomly.
Assign each data point to the nearest centroid.
Recompute the centroids as the mean of all points in the cluster.
Repeat steps 2-3 until convergence.
Objective Function
:
where
is the centroid of cluster
.
center
center
center
center
Anomaly Detection
What is Anomaly Detection?
Definition
: Identifying rare items, events, or observations that raise suspicions by differing significantly from the majority of the data.
Applications
: Fraud detection, network security, fault detection in systems.
New Methods
Machine Learning Methods
Isolation Forest
:
Randomly select a feature and split the data.
Anomalies are isolated quicker than normal points.
Clustering-Based Methods
DBSCAN
(Density-Based Spatial Clustering of Applications with Noise):
Points in low-density regions are considered anomalies.
Principal Component Analysis (PCA)
What is PCA?
Definition
: PCA is a dimensionality reduction technique that transforms data into a new coordinate system.
Goal
: Reduce the number of dimensions while preserving as much variance as possible.
PCA Steps
Standardize the Data
:
Ensure each feature has zero mean and unit variance.
Compute the Covariance Matrix
:
Measure how features vary with respect to each other.
Compute Eigenvalues and Eigenvectors
:
Determine the directions of maximum variance.
Project the Data
:
Transform the data onto the new axes defined by the eigenvectors.
Applications of PCA
Visualization
:
Reducing data to 2 or 3 dimensions for plotting.
Noise Reduction
:
Removing components with low variance.
Feature Extraction
:
Combining original features into a smaller set of uncorrelated features.
Summary
Clustering
groups similar data points together.
Anomaly Detection
identifies data points that differ significantly from the majority.
PCA
reduces the dimensionality of data while preserving variance.
Conclusion
Unsupervised machine learning techniques are powerful tools for discovering hidden patterns and structures in data.
Clustering, anomaly detection, and PCA each have unique applications and methodologies.
We can pair unsupervised learning with supervised learning to build more robust models.
Exercise
https://shorturl.at/J9hGc