Half II: Subject Extraction with BERTopic
Within the first a part of this collection, I launched you to my artificially created pal John, who was good sufficient to offer us along with his chats with 5 of the closest individuals in his life. We used simply the metadata, similar to who despatched messages at what time, to visualise when John met his girlfriend, when he had fights with one among his finest pals and which members of the family he ought to write to extra usually. In case you didn’t learn the primary a part of the collection, you will discover it right here.
What we didn’t cowl but however we are going to dive deeper into now could be an evaluation of precise messages. Due to this fact, we are going to use the chat between John and Maria to determine the matters they focus on. And naturally, we won’t undergo the messages one after the other and classify them — no, we are going to use the Python library BERTopic to extract the matters that the chats revolve round.
What’s BERTopic?
BERTopic is a subject modeling approach launched by Maarten Grootendorst that makes use of transformer-based embeddings, particularly BERT embeddings, to generate coherent and interpretable matters from massive collections of paperwork. It was designed to beat the restrictions of conventional matter modeling approaches like LDA (Latent Dirichlet Allocation), which frequently wrestle to deal with quick…