The researchers mined 1 million tweets and refined them down to 5000 tweets containing three abusive key words – rape, whore and slut – and used a base dictionary to develop the algorithm’s broader understanding of social media’s particular form of language.
The algorithm retains previous understanding of terminology and changes its model over time to increase its grasp of context and semantics, so it doesn’t automatically wipe out tweets that may use key words in non-abusive conversations.
Professor Nayak, a machine learning researcher, said her past projects had focused on teaching algorithms to extract language-based data, so she knew the deep-learning algorithm could learn Twitter’s unique and complex language model.
She said the most important aspect was giving the algorithm quality data to learn from, so the researchers generated the training data by selecting tweets that included those three key words and manually labelling each tweet as misogynistic or not, in context.
The algorithm was later tested on the phrase “go back to the kitchen”, which it successfully identified as misogynistic. It can now identify misogynistic content with 75 per cent accuracy.
At present, Twitter requires users to manually report misogynistic content through a series of steps.
An online poll by Amnesty International in 2017 found just under half of the women who responded to the survey had been subjected to misogynistic or sexist abuse.
In 2019, former Democratic presidential candidate Hillary Clinton was one of many prominent women who spoke about the issue of targeted harassment against women, while Australia’s eSafety Commissioner has an entire section dedicated to women’s safety online.
If a social media site implemented a trial of the algorithm, Professor Nayak said, it could be tested through several options, such as flagging abusive comments for manual review.
“It would still require some manual input, but not too much,” she said.
“That’s the beauty of machine learning, that it’s … progressive learning. So we could see during that trial period what patterns are missing, or what patterns are false positives, and then we can better train the model, and the more we train it the better it would become.
“Once it happens over time I think it would become a completely automatic process.”
The researchers’ final paper – Regularising LSTM classifier by transfer learning for detecting misogynistic tweets with small training set – was recently published by Springer.
Professor Nayak said it might also have impacts on user behaviour: if a tweet is not published because it contains misogynistic language, it may discourage users from attempting such behaviour.
She said she hoped social media platforms would consider the algorithm as a way to help curb such behaviour on their domains.
Lucy is the urban affairs reporter for the Brisbane Times, with a special interest in Brisbane City Council.