Unsupervised learning is like you only have input variables and no corresponding output variables. It means the dataset is unlabeled or same labeled, unlike supervised learning. The goal of unsupervised machine learning technique is to find similarities/patterns in the data points and group similar data points together. For example, grouping a crowd based on the color of the shirt/ t-shirt they are wearing, or skin color or hair category or gender etc. Unsupervised learning is less accurate unlike supervised learning because it only gives summarization based on the grouping of the input dataset.
Andrew Ng “Unsupervised learning allows us to approach problems with little or no idea what our results should look like. We can derive structure from data where we don’t necessarily know the effect of the variables.”
Unsupervised Machine Learning can be further grouped into clustering and association.
The process of gathering similar entities/items in groups is called clustering. This is the most common technique to find some hidden patterns in unlabeled data to draw some conclusion. Most common algorithm for clustering is K-mean clustering, hierarchical clustering, Singular Value Decomposition (SVD), Principle Component Analysis (PCA) etc.
Eg. Google News (groups them in Cohesive news stories), Organize Computing Clusters, Social Network Analysis, Market Segmentation, Astronomical data analysis
The process of finding interesting relationships or dependencies in the large set of data items is an association. One of the most intuitive applications of association problem is market-basket analysis. In market-basket analysis buying pattern of the customer is analyzed based on the previous purchase by the customer. For example, when a customer buys milk and bread together, then they are most likely to buy an egg. Some other example is Amazon giving related product suggestions based on the searches history.
I have taken some example reference from the video from Andrew Ng. If you want to know more than you can watch 14 min video below.
If you want to test your knowledge than first go through this question before watching this video. (Question is taken from the video)
Of the following examples, which would you address using unsupervised learning algorithm? (choose all that apply)
a. Given email labeled as spam / not spam, learn a spam filter.
b. Given a set of the article found on the web, grouping them into a set of articles about the same story.
c. Given a database of the customer data, automatically discover market segments and group customer into the different market segment.
d. Given a dataset of the patient diagnosed as either having a tumor or not, learn to classify new patients as having tumor not.