ML 1.3 Un Supervised learning

 Unsupervised Learning:

  • Unsupervised learning is a type of machine learning where the algorithm or model is trained and learns to recognize or finds patterns on data without labelled data examples.
  • There is no “right answer” given to the system. Instead, the algorithm explores the data and discovers hidden patterns, structures, or relationships on its own.
  • In unsupervised learning, the dataset does not have labels or target values. The algorithm must learn to identify inherent patterns without guidance on what the “correct” answers are.
  • The main aim of the unsupervised learning algorithm is to group or categories the unsorted dataset according to the similarities, patterns, and differences. Machines are instructed to find the hidden patterns from the input dataset.
  •  unsupervised learning works

                    * Input data is unlabeled

                    * No predefined categories or target values

                    * The model groups, organizes, or summarizes data

                    * Useful when you do not know what patterns exist beforehand



customer segmentation in marketing is an example.

some popular unsupervised learning algorithms:

  • K-means clustering
  • KNN (k-nearest neighbours)
  • Hierarchal clustering
  • Anomaly detection
  • Neural Networks
  • Principle Component Analysis
  • Independent Component Analysis
  • Apriori algorithm
  • Singular value decomposition

 

There are three types of unsupervised learning:

1) Clustering: - 

  • The algorithm groups similar data points together.
  • Clustering is a method of unsupervised machine learning, which is the process of grouping the unlabelled objects or data points into clusters based on their similarities.
  • The goal of clustering is to identify characteristics like patterns and relationships in the data without any prior knowledge of the data’s meaning.
  • Cluster analysis finds the commonalities between the data points or objects with most similarities remains into a group and has less or no similarities with the objects of another group.




Some popular clustering algorithms include K-means clustering algorithm, Mean-shift algorithm, Hierarchical clustering algorithm, and DBSCAN algorithm.

  1. K-means Clustering: Groups data into K clusters based on how close the points are to each other.
  2. Mean-Shift Clustering: Discovers clusters by moving points toward the most crowded areas.
  3. Hierarchical Clustering: Creates clusters by building a tree step-by-step, either merging or splitting groups.
  4. Density-Based Clustering (DBSCAN): Finds clusters in dense areas and treats scattered points as noise.
  5. Spectral Clustering: Groups data by analysing connections between points using graphs.

 

2) Dimensionality reduction: -

  • Dimensionality reduction algorithms reduce the number of input features or variables in a dataset while preserving as much of the original information as possible.
  • This is useful for reducing the complexity of a dataset and making it easier to visualize and analyze.
  • This technique is useful for improving the performance of machine learning algorithms and for data visualization.





Some popular dimensionality reduction algorithms include Principal Component Analysis (PCA), t-SNE, and Autoencoders.



3) Association: -

  • An association rule is an unsupervised learning method which is used for finding the relationships between variables in the large datasets.
  • It determines the set of items that occurs together in the dataset.
  • This technique is basically used for market basket analysis that helps to better understand the relationship between different products. It makes marketing strategy more effective. Such as people who buy X item (suppose a bread) are also tend to purchase Y (Butter/Jam) item.
  • Association rule learning works on the concept of If and Else Statement, such as if A then B.
  • A typical example of Association rule is Market Basket Analysis and Web usage mining, continuous production, etc.
  • For example, if a customer buys bread, he most likely can also buy butter, eggs, or milk, so these products are stored within a shelf or mostly nearby. 
  • Consider the below diagram:



Association rule learning can be divided into three types of algorithms:

a) Apriori

b) Eclat

c) F-P Growth Algorithm

 

a) Apriori Algorithm: -

  • This algorithm uses frequent datasets to generate association rules. It is designed to work on the databases that contain transactions. This algorithm uses a breadth-first search and Hash Tree to calculate the itemset efficiently.
  • It is mainly used for market basket analysis and helps to understand the products that can be bought together.
  • It can also be used in the healthcare field to find drug reactions for patients.





b) Eclat Algorithm: -

  • Eclat algorithm stands for Equivalence Class Transformation.
  • This algorithm uses a depth-first search technique to find frequent item sets in a transaction database.
  • It performs faster execution than Apriori Algorithm.

 

c) F-P Growth Algorithm: -

  • The F-P growth algorithm stands for Frequent Pattern, and it is the improved version of the Apriori Algorithm.
  • It represents the database in the form of a tree structure that is known as a frequent pattern or tree. The purpose of this frequent tree is to extract the most frequent patterns.



Popular posts from this blog

operators in c programming

2.4 Arrays in c programming

Variables in c