How Mythic League uses AI to help identify toxicity.
Mythic League is a premium CS2 league/clan on FaceIT. As some people know, Counter-Strike isn't always known for having the most wholesome and friendly community. While the community has so many wonderful people, and I've made several life long friends through CS... there are also plenty of bad actors and it can get pretty toxic at times.
So what do we do about it? We gather data! User reports are only good at potentially flagging problem users. We'd have no way to verify there was an issue without data. So for tracking toxicity we record voice communications, both in game, and in the mandatory discord chats.
Lets take a moment to talk about the scale of this. Counter-strike is a fairly popular game [Steam Charts]. But people who play on FaceIT, in North America, and are willing to pay a monthly sub for Mythic League.. well that cuts on the number of players considerably. Still depending on the month Mythic may have a few hundred matches, or a few thousand. With 10 players, each communicating at least once a round, games going 20-22 rounds on average. Well on slow months we easily have 100,000 audio files produced. We keep these available for at least a year, but prefer to store for 3 or more years.
At the time of writing we have a couple million audio recordings on file. Each and every file is stored in our own privately hosted S3 (minio) bucket. And is also backed up offsite.
How do we make this all searchable and usable? Well proper organization, a database entry for each recording, and... AI.
You might be wondering, what sort of AI are we using here. We use OpenAI's Whisper for transcribing every single audio recording. That transcription is stored in the same database record that keeps metadata for the file. So which user the recording is from, what match it was recorded during, the round in that match, the timestamp, and some other metadata. This allows us to search for a user, or a match, or even specific words spoken. And find the exact audio clip we need to.
The transcriptions and additional metadata, makes these searchable. And having them all searchable, makes closing reports a much simpler task. Allowing us to catch and punish toxicity and bad actors faster than we could otherwise.
How do you pull voice data from a CS2 match you might wonder? Well if its a FaceIT match, that can be done with this repository https://github.com/DandrewsDev/CS2VoiceData
This is used not only by Mythic League, but also by other tools and vendors. Its was provided by myself, and is maintained by a small handful of contributors. Luckily its a very simple bit of code, and doesn't need to be updated that often.
