Machine Learning Approaches for Text Mining and Spam E-mail Filtering: Industry 4.0 Perspective

Pp: 25-52 (28)

* (Excluding Mailing and Handling)

Abstract

The revolution of Industry 4.0 will leave an impact on the domain of everyone's lives directly or indirectly. Several new complex applications will be developed in the days to come that are complicated to predict in the current scenario. With the help of machine learning approaches and intelligent IoT devices, people will be relieved from extra overheads of redundant work currently being performed. Industry 4.0 has become a significant catalyst for innovation and development in various industrial sectors like production processes and quality improvement with greater flexibility. This chapter applied different machine learning algorithms for spam detection and classifying emails into legitimate and spam. Seven classification models: Decision Trees, Random Forest, Artificial Neural Network, Gradient Boosting Machines, AdaBoost, Naive Bayes, and Support Vector Machines are applied. Three benchmark spam datasets are extracted from standard repositories to conduct the experiments. The chapter also presents a quantitative performance analysis. The results from rigorous experiments reveal that ensemble methods, Gradient Boosting and AdaBoost, outperformed other methods with an overall accuracy of 98.70% and 98.18%, respectively. The ensembled models are effective on a large-sized dataset embedded with more extensive features. The performance of non-ensemble methods, ANN and Naïve Bayes, was instrumental on large datasets as a viable alternative, with an overall accuracy of 98.38% and 97.63% on test data.

Keywords: Cross-validation, Industrial revolution, Machine learning methods, Parameter optimization, Performance measurement, Preprocessing techniques.

Cite as