Logistic Regression for Intelligent Email Spam Detection: A Practical Approach

Authors

  • K. Srikanth Department of Data Science, Malla Reddy University, Telangana, India.

DOI:

https://doi.org/10.9734/bpi/mcsru/v2/3819

Keywords:

Spam, ROC curve, logistic, UCI data

Abstract

This paper presents an experiment on spam filters using Logistic Regression, where the filter's effectiveness is influenced by the characteristics of the token frequency distribution. The focus of the discussion is on the importance of data cleaning before model development. It emphasizes the necessity of excluding inconsistent features prior to their inclusion in the model. The experiment utilizes the UCI dataset, which shows the percentage of token counts in each email. The model’s discriminative performance is evaluated through the use of an ROC curve. The use of the UCI dataset provided valuable insights into how token counts influence spam classification. The ROC curve analysis reinforced the importance of evaluating model performance comprehensively, offering a clear view of its discriminative power.

Published

2025-01-25

How to Cite

K. Srikanth. (2025). Logistic Regression for Intelligent Email Spam Detection: A Practical Approach. Mathematics and Computer Science: Research Updates Vol. 2, 44–57. https://doi.org/10.9734/bpi/mcsru/v2/3819