We use cookies. You have options. Cookies help us keep the site running smoothly and inform some of our advertising, but if you’d like to make adjustments, you can visit our Cookie Notice page for more information.
We’d like to use cookies on your device. Cookies help us keep the site running smoothly and inform some of our advertising, but how we use them is entirely up to you. Accept our recommended settings or customise them to your wishes.

How to Detect Social Bots That Drain Your Marketing Budget

Cyber Monday turned out to be the biggest online shopping day ever in the US, with over $3 billion in digital sales. To drive sales, marketing professionals engaged with their target audience before, during the event, and in the days following. The spend for the supporting marketing activities? Likely over $300 million for this day alone.

I'm sure many of you feel pretty good about the campaign performance as sales numbers were sky rocketing. As performance marketers, the goal is to maximize the return of every dollar spent. One area with opportunity for cost savings is to consider the quality and confidence of digital audience identities. According to the IAB, the cost of an untrustworthy supply chain in the US digital advertising industry is alarmingly high — $8.2 billion a year. Roughly half of this is attributed to "non-human traffic" creating digital exhaust, corrupting the ecosystem.

Know your audience

If over $4 billion could be saved by eliminating "non-human traffic," where can we start? One essential topic of people-based marketing is the identification and knowledge of the consumer. You get to know consumers by having a keen understanding of the data, which starts with having core knowledge of the organizations and individuals in the marketspace, manifested in things like names and addresses, etc.

Over the past 30 years, marketers have used CDI (customer data integration) to make sense of first, second and third-party data. Sophisticated processing filters out and cleans up fictitious organizations and individuals, as well as manages keys by creating, consolidating, and splitting records of actual individuals and organizations. Techniques to perform these critical parts and determine confidence in the marketing entities and relationships between them are mature.

Many of the same techniques can be applied in the digital space, but the complexity to perform person-based identity management is higher. The digital space has many diverse, volatile identity spaces managed by a range of actors. Marketers have limited ways to verify that people and organizations are who they say they are. This exposes organizations to unknowingly target social bots, and potentially reacting to astroturfing.

For this blog entry I used Twitter to analyze audiences. However, results are likely trending the same with other similar social platforms. The basic concept of Twitter is to engage with a potentially very large audience, including the ones that have opted in to follow you, and have conversations with users through tactics like hashtags. You also use Twitter Ads to get your messages in front of users as part of your social media mix. Sounds wonderful right? What if I told you that a third of your Twitter followers are not humans?

I looked at a sample set of Twitter followers of some Fortune 100 brands with active Twitter marketing programs. I then ran these users through a tool to detect features and common patterns of social bots. 34% of all followers had at least a 50% probability of being a social bot.

My next test was to analyze conversations using a specific hashtag. During a two-hour window on Cyber Monday, I had a program recording all tweets with #cybermonday as part of the message. Brands and users were immensely active during this time. It is good Merkle has invested in large-scale big data Hadoop clusters, which is exactly the type of technology needed to analyze the roughly 10 tweets per second in near real time.

What were the results? Roughly 50,000 users tweeted during the period. 2% of these sent more than five tweets during the period, which is one indication of suspicious activity. These users were examined with the same method as the first analysis. 31% of the users were likely social bots.

The technology I used during the tests was a standard ETL tool, some custom coding, and Spark on Hadoop. I also integrated with BotOrNot, an application funded through the US Department of Defense. It inspects Twitter users and their tweets to determine if they look like a real person or a social bot. The result is a "bot probability score" which is determined by looking at sentiments, linguistic cues, timing of messages, and network statistics.