Spam classifier in Go using Naive Bayes

1 hour ago 1

A Naive Bayes spam classifier implementation in Go, enabling text classification system using the Naive Bayes algorithm with Laplace smoothing to classify messages as spam or not spam.

Naive Bayes Classification: Uses probabilistic classification based on Bayes' theorem with naive independence assumptions
Laplace Smoothing: Implements additive smoothing to handle zero probabilities for unseen words
Training & Classification: Simple API for training on labeled datasets and classifying new messages
Real Dataset Testing: Includes tests with actual spam/ham email datasets

go get github.com/igomez10/nspammer

package main import ( "fmt" "github.com/igomez10/nspammer" ) func main() { // Create training dataset (map[string]bool where true = spam, false = not spam) trainingData := map[string]bool{ "buy viagra now": true, "get rich quick": true, "meeting at 3pm": false, "project update report": false, } // Create and train classifier classifier := nspammer.NewSpamClassifier(trainingData) // Classify new messages isSpam := classifier.Classify("buy now") fmt.Printf("Is spam: %v\n", isSpam) }

NewSpamClassifier(dataset map[string]bool) *SpamClassifier

Creates a new spam classifier and trains it on the provided dataset. The dataset is a map where keys are text messages and values indicate whether the message is spam (true) or not spam (false).

(*SpamClassifier).Classify(input string) bool

Classifies the input text as spam (true) or not spam (false) based on the trained model.

The classifier uses the Naive Bayes algorithm:

Training Phase:
- Calculates prior probabilities: P(spam) and P(not spam)
- Builds a vocabulary from all training messages
- Counts word occurrences in spam and non-spam messages
- Stores word frequencies for likelihood calculations
Classification Phase:
- Calculates log probabilities to avoid numerical underflow
- Computes: log(P(spam)) + Σ log(P(word|spam))
- Computes: log(P(not spam)) + Σ log(P(word|not spam))
- Returns true (spam) if the spam score is higher
Laplace Smoothing:
- Adds a smoothing constant to avoid zero probabilities for unseen words
- Formula: P(word|class) = (count + α) / (total + α × vocabulary_size)
- Default α = 1.0