From Email Overload to Efficiency: A Transformer-Based LLM Solution for SAS Tech Support

SAS Tech Assist lately developed an AI-driven e mail classification system utilizing SAS Viya’s textClassifier, paving the way in which for a extra environment friendly future in buyer communication. Rigorous testing achieved very excessive validation accuracy in distinguishing between respectable buyer queries, spam and misdirected emails. Key achievements in the course of the growth part embrace environment friendly processing, near-perfect identification of respectable buyer emails (<0.2% misclassification), remarkably quick mannequin coaching utilizing GPU acceleration and profitable validation on information from the ServiceNow/CSM platform. This mannequin is poised to considerably enhance e mail dealing with effectivity upon deployment.

Introduction

At SAS Tech Assist, environment friendly dealing with of buyer communication is paramount. Nevertheless, we face an awesome inflow of emails—many spam or misdirected to Scandinavian Airways System (SAS)—which diverts our brokers from addressing real buyer considerations. To handle this problem, we developed an AI-driven e mail classification system utilizing SAS Viya’s superior textClassifier. A key goal of this work is to develop a mannequin that may precisely categorize incoming emails into three teams: respectable buyer inquiries, spam and misdirected emails supposed for Scandinavian Airways System. This can allow us to flag the misdirected ‘SAS airways’ emails and the spam emails so brokers can carry out the suitable actions. This case examine particulars the event of this transformer-based spam detection mannequin.

Information Privateness and Safety

Given the sensitivity of buyer information and the growing significance of accountable AI (as highlighted in Exploring generative AI’s impression on ethics, information privateness and collaboration), information safety was a major concern. SAS Viya, deployed in a safe Azure cloud setting, offered the required safety whereas providing the scalability to course of our intensive dataset (104,000+ emails) whereas complying with laws like GDPR.

Information Assortment and Preparation

The dataset comprised 18 months of Sirius v2 buyer tracks, spanning from 2022 to July 2023. Content material was extracted completely from the preliminary incoming e mail of every monitor, omitting values within the From and To fields. E-mail subjects had been divided into three classes: ‘SAS airways,’ ‘different’ and ‘spam.’ With over 104,000 paperwork, coaching a big language mannequin required cautious planning. To stability prices and effectiveness, two samples had been created—one with 20% and one other with 30% of the unique information—permitting for a comparability of mannequin efficiency. Stratified sampling ensured the coaching information remained consultant, significantly for the uncommon ‘Scandinavian Airways System (SAS)’ circumstances.

Reliable emails are considerably higher represented within the dataset

Methodology

A number of textual content classification fashions had been thought-about: a BERT-based strategy utilizing SAS Viya’s textClassifier motion, a SAS BOOLRULE classifier, and a subject modeling strategy mixed with machine studying. Whereas BOOLRULE supplied interpretability, its rule-based nature lacked the contextual understanding wanted for the nuances of our e mail information. Matter modeling, although highly effective, proved much less environment friendly and scalable for our massive dataset (104,000+ emails) because of the iterative nature of matter discovery and computationally intensive textual content parsing.

The BERT-based textClassifier mannequin (Devlin et al., 2019) was finally chosen for its superior contextual understanding, adaptability to massive datasets, and effectivity. Its transformer-based structure offered high-quality classification with minimal preprocessing and handbook effort, making it probably the most appropriate alternative for this mission.

Mannequin Coaching

Leveraging the computational energy of an NVIDIA A100 GPU proved essential for environment friendly mannequin coaching. The mannequin, educated on a 30% subset of the information, achieved outstanding pace, finishing in roughly 42 minutes—a major enchancment over preliminary estimations. Anticipating a prolonged coaching course of after work, I made a decision to take a stroll, solely to return and discover that the mannequin had already completed coaching! Whereas this sudden pace is optimistic, it underscores the necessity for proactive mannequin saving, a essential lesson discovered throughout a system shutdown that resulted in information loss and required retraining. Importantly, the mannequin carried out effectively with out requiring conventional textual content preprocessing steps (corresponding to stemming or cease phrase elimination), highlighting the robustness of the transformer structure. The iteration historical past and an in depth visualization of the coaching progress are offered in Appendix A. For a deeper dive into coaching concerns and reminiscence administration for the trainTextClassifier motion, seek advice from the SAS Viya documentation. Coaching prices had been additionally carefully tracked all through the method, permitting us to keep up cost-effectiveness whereas reaching the specified efficiency.

Mannequin Analysis and Outcomes

The mannequin’s efficiency exceeded expectations, significantly contemplating the single-shot coaching with out hyperparameter tuning. Whereas the fashions achieved general misclassification charges of roughly 3.38% and three.43% for the 30% and 20% information subsets respectively, the important thing efficiency indicator (KPI) was minimizing the misclassification of respectable buyer emails (“different”) as both ‘SAS airways’ or ‘spam,’ a vital think about sustaining environment friendly customer support. The misclassification charges for the “‘different’ class had been exceptionally low: lower than 0.2% for “different” misclassified as ‘SAS airways’ and round 1.5% for “different” misclassified as ‘spam,’ for each fashions. (See Appendix B for the detailed mannequin analysis metrics.)

Addressing Information High quality Points and Mannequin Robustness

Whereas our mannequin achieved low misclassification charges on the holdout information, we recognized cases of mislabeling throughout our detailed evaluation. For instance, as proven within the accompanying picture, some emails from the holdout information initially labeled as ‘different’ however containing references to flights or reserving modifications had been accurately categorised by the mannequin as ‘SAS airways’. This raises the chance that the unique coaching information might have contained comparable labeling errors and that with good coaching labels, the misclassification charges might have been decrease. This highlights each the mannequin’s capability to beat these information high quality limitations and the potential for bettering mannequin efficiency by way of higher information labeling.

Proof of mislabeled entries within the authentic dataset

Additional evaluation utilizing information from a brand new supply (the ServiceNow/CSM platform) bolstered these findings, confirming the mannequin’s excessive accuracy and skill to establish misclassified information, even with CPU processing.

Conclusion and Future Work

This mission marked an vital first step in creating a sturdy and scalable spam detection system for SAS Tech Assist. By leveraging SAS Viya’s BERT-based textClassifier motion, we had been in a position to effectively course of a big dataset whereas sustaining a excessive stage of accuracy. Crucially, this method was developed whereas prioritizing information privateness and safety, utilizing SAS Viya deployed in a safe Azure setting. The mannequin’s capability to deal with massive datasets effectively whereas reaching exceptionally low misclassification charges for respectable buyer emails (“different”) has created clear potential for bettering assist operations.

Future efforts will concentrate on constantly bettering mannequin efficiency by way of common information updates and person suggestions. We’re additionally exploring the potential of saving the educated mannequin as an Astore, which might streamline its deployment to environments corresponding to SAS Micro Analytic Service (MAS) or SAS Container Runtime (SCR), after registration in SAS Mannequin Supervisor.

We’re dedicated to sharing future updates as we transfer nearer to full deployment and integration of this highly effective classification resolution.

Study Extra

Examine one other transformer-based classification use case

Find out how the same strategy can carry out sentiment evaluation

Discover varied sorts of NLP with SAS

Appendix A: Mannequin Coaching Log and Efficiency

This appendix summarizes the important thing metrics obtained in the course of the coaching of the BERT-based textual content classification mannequin. The coaching was performed utilizing an NVIDIA A100 GPU, finishing in roughly 42 minutes in actual time on a 30% subset of the information. The chart under visualizes the coaching progress:

Key Observations:

Speedy Convergence: The mannequin achieved excessive validation accuracy (96%) inside the first two epochs, demonstrating environment friendly studying, with an optimum efficiency noticed at epoch 2.
Minimal Preprocessing: Excessive accuracy was obtained with out conventional textual content preprocessing steps, showcasing the robustness of the transformer structure.
Early Stopping Potential: The marginal enchancment in validation accuracy past epoch 2, coupled with the truth that validation loss reached a minimal at epoch 1 and elevated after that, signifies that coaching past the second epoch didn’t yield any vital efficiency advantages. It is very important notice that the trainTextClassifier motion, as of the writing of this publish, doesn’t assist the direct use of early stopping or step-based coaching. Ought to trainTextClassifier implement step-based coaching sooner or later, it might be clever to coach for a only a few epochs at a time, assessing the efficiency advantages towards the computational prices of coaching extra epochs.

Appendix B: Key Mannequin Analysis Metrics

This appendix summarizes key analysis metrics for the fashions educated on 20% and 30% of the information. The first focus is on the misclassification charges of respectable buyer emails (‘different’) as both ‘SAS airways’ or ‘spam’.