John Wakaba

Posted on Jul 25

UNSUPERVISED LEARNING

#machinelearning #datascience #eventdriven #analytics

Unsupervised learning falls under machine learning and it deals with unlabeled data. Without any human supervision unsupervised learning tries to find patterns, structures, relationships in the input data.

Imagine having a large dtaset of customer transactions without having any predefined customer sections. But you start grouping the customers based on their purchasing habits.

How It Works

Unsupervised learning works by analyzing unlabeled data to identify patterns and relationships. The data is not labeled with any predefined categories or outcomes therefore the algorithm must find these patterns and relationships on its own.

Different Techniques Under Unsupervised Learning

There are three key types of unsupervised learning.

Clustering
Dimensionality Reduction
Association Rule Mining

CLUSTERING

Clustering is the most commonly used technique under unsupervised learning. Clustering entails grouping data points into clusters based on similarity and differences. When two instances appear in different groups, we can infer they have dissimilar properties.

Imagine as a parent, you give your child a set of toy building blocks in different shapes. Without giving any instructions, you ask them to arrange the blocks however they like. Left to explore on their own, the child might group the blocks by color, by shape, or even invent their own system. There’s no right or wrong way they are simply finding patterns based on what they observe.

Clustering In Customer Segmentation Or Anomaly Detection

Just like the child grouped blocks based on patterns they noticed without being told what those patterns should be, clustering algorithms group data points based on similarity.

In customer segmentation, businesses use clustering to group customers with similar behaviors, such as spending habits, browsing history, or product preferences. For example, one cluster might include high-spending customers who shop frequently, while another might consist of budget-conscious shoppers who only buy during sales. These insights help companies tailor marketing strategies to each group.

In anomaly detection, clustering helps identify data points that don’t fit into any cluster well just like if the child found a strange block that didn’t match any shape or color and set it aside.

In a banking context, this could be a transaction that doesn’t resemble any known customer behavior, flagging it as potentially fraudulent.

Common Clustering Algorithms

Algorithm	Description
K-means	Groups data into K clusters based on how close the poins are to each other
Hierarchical	Creates clusters by building a tree(Dendongram) through merging or splitting groups. A dendogram is a tree of clusters
Spectral	Groups data through analyzing connections between points using graphs
Mean-Shift	Clusters are discovered by moving points towards the most crowded areas
Densisty Based (DBSCAN)	Identifies clusters in dense areas and trees scattered points as noise

Dimensionality Reduction

Dimensionality reduction entails reducing the number of features in a dataset while preserving as much information as possible. Techniques under dimensionality reduction are typically deployed during exploratory data analysis (EDA) or data processing to prepare the data for modeling.

Imagine you have a dataset with 100 different characteristics about students such as height, weight, grades, and more. To simplify the analysis, you narrow it down to just two important features: height and grades. This makes it easier to visualize patterns or gain insights from the data.

Popular Techniques

The most Popular algorithm used for dimensionality reduction is principal component analysis (PCA).

Reduces many variables into fewer that still capture most of the information.

Imagine you have a dataset with dozens of features about students—things like height, weight, grades in multiple subjects, attendance, and extracurricular activities. Analyzing all these dimensions at once can be overwhelming and hard to visualize. Principal Component Analysis (PCA) helps by finding the most important patterns in the data and reducing it to just a few key dimensions that still capture most of the original information.

Others Include:

Linear Discriminant Analysis (LDA): Reduces dimensions while maximizing class separability for classification tasks.
Non-negative Matrix Factorization (NMF): Breaks data into non-negative parts to simplify representation.
Locally Linear Embedding (LLE): Reduces dimensions while preserving the relationships between nearby points.
Isomap: Captures global data structure by preserving distances along a manifold

Association Rule Mining

It is a rule-based unsupervised learning method aimed at discovering relationships and associations between different variables in large-scale datasets. It works by using a measure of interest to identify strong rules found within a dataset. The rules present how often a certain data item occurs in datasets and how strong and weak the connections between different objects are.

Analogy

Association rule mining is used for market analysis. It is a data mining technique retailers use to gain a better understanding of customer purchasing patterns based on the relationships between various products.

Association Rule Examples

Rule	Utilization
Recommender Systems	Association rules analyze shopping baskets to find products often bought together, helping businesses improve cross-selling, up-selling, and product recommendations.

Target Marketing	Association rules can help uncover patterns that improve targeted marketing strategies. For example, a streaming service might analyze viewing history and user preferences to identify which types of content appeal to specific age groups, enabling them to tailor recommendations or promotions more effectively.

Common Association Rule Learning algorithms:

Apriori Algorithm: Finds patterns by exploring frequent item combinations step-by-step.
FP-Growth Algorithm(frequent pattern growth algorithm ): An Efficient Alternative to Apriori. It quickly identifies frequent patterns without generating candidate sets.

Shortcomings of Unsupervised Learning

Unsupervised learning models may produce less accurate results since the data lacks labeled answers.
Human or expert validation is often needed to assess the quality of the output.
The training process can be time-consuming, as the algorithm must explore and evaluate many possible patterns.
These models typically handle large datasets, which increases computational complexity.

Final Thoughts

Unsupervised learning algorithms explore data to uncover hidden patterns or structures using only the input features provided. Unlike supervised learning, they don’t rely on labeled outcomes to guide them or correct mistakes. Instead, these models independently group, organize, or reduce data by identifying natural relationships and similarities within the dataset.

DEV Community