Labeling of text data using autoencoders

Machine learning has come a long way in solving business use cases that has remained a nightmare to human. Today machines learn data in ways like human, machine learning has matured so much that all it requires is data and it can solve any problem if the correct data is provided. Among the different learning techniques, we have in current ML world, supervised learning is a popular technique where the model learns from labeled dataset. The model tries to learn the pattern from the data and tries to correlate the independent and the dependent variable. But the challenge in real time is we don’t have the readily available labeled data which applies to unstructured text as well. Given the volume of the text data available and the multiple sources available, it would take humongous efforts to label these text data manually. This has led to the rise of many unsupervised techniques to learn the data for solving use cases. However, in-spite of numerous improvements in the domain of unsupervised learning, the supervised learning continues to one of the preferred techniques for humans to train machines. The objective of this paper is to use AutoEncoders combined with clustering technique to label the unlabeled text training data when the number of classes for the dataset is known.

Author:

Praveen Thenraj Gunasekaran, Selvakuberan Karuppasamy and Subhashini Lakshminarayanan

Download PDF:

9275.pdf

Journal Area:

Health Sciences