Flowchart of decisions to make when running a bot

  Posted Monday, August 10th, 2020

Curating a Twitter bot

I’ve been running a bot account on Twitter for a year now, and I’ve converted some of the less obvious pitfalls into questions that may guide you before starting.

Regardless of the how, if you’re planning on making a bot that will produce content you’d like to share on social media, then it’s worthwhile to consider the spectrum of effort required before you start. Some key decisions will invariably lead to more or less work. I’ve been running a bot account on Twitter for a year now, and I’ve converted some of the less obvious pitfalls into questions that may guide you before starting.

For context, my bot that scrapes game reviews from Steam written by players with the highest playtimes. I’ve built a curation system that acts as a layer between the bot and Twitter, where I can cherry-pick interesting, un-offensive reviews to post.

For the sake of this article, ‘content’ describes anything that may be found in the body of a social media post, such as a Tweet on Twitter. ‘Curation’ is the process of finding and organising quality information, whereas ‘moderation’ is the process of monitoring and revising information so it complies with certain rules.

Can the bot produce content independently?

Depending on the type of content the bot will produce, it may require human intervention.

From a technological perspective, a bot such as a maze generator, might produce content entirely algorithmically. In such situations, you are at an advantage as a creator because you can rely on prior academic work. Additionally, producing something wholly encapsulated means it can be re-used elsewhere – perhaps the function of incorporating social media into your plan is to advertise the capabilities of your bot itself. The benefit here is that the bot can be written once and need never change. This simplest form of a generative bot is the easiest to maintain. It may, however, lack a sort of ‘human’ charm and creativity – it depends on what you are trying to achieve. Bot creation is open ended, and novel ideas might inspire further work – a maze algorithm itself could receive dynamic content such as people’s names to feature in the walls. Letting your mind drift forward to consider feature-creep can help you plan.

"Tweet me directions pic.twitter.com/j7llwjhAMG"

— Maze Bot (@mazingbot) June 5, 2017

By contrast, a clickbait news generator may require seeding with some interesting example news items – you may produce your own list of verbs and nouns for it to create sentences with, or attempt to generate comprehensible natural language with AI. In this example, there’s either going to be an initial load of manual content creation, or a steep learning curve prone to bugs. By creating a formulaic post structure, the content may come to feel stilted or limited, whereas using AI runs the risk of sacrificing on quality control if it produces something illegible.

"These paediatricians looked thoroughly unimpressed when Travis Scott lectured them about second-wave feminism."

— Clickbait Bot (@ParodyClickBait) August 10, 2020

Consideration should also be placed on external resources – will the bot be making use of any APIs / databases? If you rely on a third-party in order to produce your content at run-time, then the third-party must be available. For example, if your DnD character-sheet generator needs names, and you elect to use a random name API, what happens if it goes down? It may be possible to save some content from a third-party and cache it locally to the bot ahead of running it. Using multiple sources may increase the quality of the bot and lighten the burden of producing your own content, but it may put you at the mercy of other services.

If the source of content for your bot is inconsistent in quality, it may be worth also automatically detecting and filtering this out in a process referred to as ‘wrangling’ or ‘munging’; a bot that tweets song lyrics may want to remove glyphs such as the brackets surrounding “[Chorus]”; a bot that stylises screenshots from Google maps may want to ignore uninteresting areas; a bot that shares Kickstarter products may want to remove hyperlinks from product descriptions.

Lastly, regardless of the source of your content, you may need to consider its safety. A bot leveraging a community to influence its content may be prone to tampering-with. An AI could be led to mimicking hate speech, a Simon-Says bot could parrot doxxing information, and a user-review bot may post pornographic content in ASCII (theoretically, a maze could even contain a repeating Swastika pattern). These may break the terms of service of your social media platform. Ultimately, if you are monitoring your bot’s output, it may not be an issue for you to manually remove unwanted posts after-the-fact.

Takeaway: The more aspects of the bot you can isolate and automate, the less maintenance it will need. However, if you are going to make the bot a closed system with full control over producing its own content, then you must trust it to produce quality content.

Can the bot produce an infinite amount of content?

Where does the bot’s content come from? If it’s purely algorithmic, then there’s a high likelihood that it’s an essentially infinite pool. A bot that shuffles a deck of cards could do so 52! (or 8.06e+67) times before running out of content. Bots that rely on community-generated content such as other social media posts are unlikely to run out of content unless the platform they’re relying on becomes unpopular or unavailable (see Myspace, Vine) – new content should outpace your post cadence (although new desirable content might not!). Bots that rely on content you’ve provided them yourself may run out of original content.

In some cases, interactive bots relying on a gimmick may remain feasible despite this – a rock-paper-scissors bot that can be challenged by users may produce the same kind of content indefinitely without breaking its appeal, as the content itself is mechanical. A bot that Tweets a random word from the dictionary at the end of every month might be allowed to Tweet the same word if the fun itself is to be found in other people contextualising the word with current events.

"fuck virus"

— Fuck Every Word 2.0 (@fckeveryword) July 13, 2020

A bot that Tweets a new Olympic sport once a day will run out of content in 33 days as of 2020. Bots that remain dormant until they receive stimulus from the world around them, such as one that emails you when a new Olympic sport has been ratified, might be more future proof, but less consistent. The mileage here can be extended with new ideas – a bot simulating the Game-of-Life could periodically change its initial conditions or grid size. You may elect to change the type of content produced entirely from textual to visual. Ultimately, your interest in the bot will be bolstered if it can produce content that surprises or otherwise engages you as if you were a follower.

Knowing whether it would be OK if your bot were to re-use content will help inform whether you need to track what content it posts. You can only prevent duplicates if you keep track of its posts somewhere – the logistics of this may very depending on whether you let the bot post on a timer or do so yourself manually. You could keep a database of previous posts, or call your social media platform’s API every time to check (although this will become inefficient faster the more posts you make). If you want to store the content your bot produces to select from later, then there is an additional cost of storage to reckon with in addition to the cost of running your bot – this can be done on your own computer, though I’m operating within Heroku and Firebase’s free tiers.

Takeway: You may get bored of seeing the same things, and you may seek to prevent this. Being realistic about the quality and quantity of content your bot can produce will inform you of its lifespan, and whether you need to consider storing information in order to actively curate it.

What kind of media should the bot produce?

All platforms impose limitations on content – from character to file-size limits, to more subtle things: Twitter uses AI to crop image previews for images that don’t obey their aspect-ratio guidelines, meaning at-a-glance, in people’s feeds, your image may appear to be missing something. If your bot fails to provide valid content to your platform, it may even reject requests to post, or worse, change the output unpredictably. You should be aware of these limitations, which are usually described in API documentation.

"😮 open-mouth + 😓 cold-sweat = pic.twitter.com/mMSUVqCsL6"

— Emoji Mashup Bot (@EmojiMashupBot) August 10, 2020

When met with a particularly large number of panels, a bot posting automatic transcriptions of webcomics may need to understand how to create a thread, or render the text as an image. A bot posting emoji mashups it creates using scalable-vector-graphics may have to first convert them to PNGs. Any bot producing artwork in general may want to incorporate a watermark to prevent theft.

Takeway: Plan with the nature and constraints of your social media platform in mind.

 

What do you care about?

Bot makers that maintain their project in the longer-term tend to pin a post to their page explaining how they operate; the motivation and purpose behind their bot, and the way in which they will differentiate their own posts from their bots’ (i.e. wrapping their own thoughts in brackets or preceding them with an emoji). Most are left well-enough-alone, as people following do so primarily for the bot’s content itself and nothing more. I found polls useful when tweaking things about my bot for which I wasn’t sure. Some accounts attract ire when deciding to monetise their work, others lose their following when their bot is left in a broken state for too long.

"👨‍💻 Do you prefer seeing positive or negative super fan reviews?"

- GameReviews.txt (@gamereviews_txt) July 18, 2020

While I still find enjoyment in simply curating it, building an audience of any size can create a positive feedback loop, and it’s extra fun to see your work appreciated. Most bot makers are hobbyists, and the need to take every precaution is low. Creating my own curation system took a significant amount of time, and there are yet more questions to answer: how do I securely share that curation system?

Takeway: Explain how you intend your bot to operate to potential viewers and be mindful of their motivations. Breaking and changing things is a useful learning experience that cannot be replaced.

Useful Resources

Example Case Study

Steven wants to create a bot to generate new rules for a drinking game. Every turn, players get to choose a new rule to add to a list – if anybody breaks any so far, they drink! But his group aren’t so creative when under-the-influence, so he finds the idea of letting a bot decide to be quite funny.

Thinking about the shape of the data first, he notices that a pattern in the rules is “When X, then Z” – “When somebody laughs, then stick your tongue out”. So, he writes a list of ‘when’s and ‘then’s, and a function to stick two random ones together.

It works, but it’s losing its appeal – he already knows all of the combinations. At first, he realises it’s going to be difficult to find a reliable source of information for ‘when’s and ‘then’s that would fit the context of a house party, so he asks his friends to help fill his list. It works for his goals, though he wonders if he should host an online form to let people anonymously submit scenarios – most people wouldn’t know how to submit a pull-request on his Github repository.

After trying it out, he has an idea, and inserts a Y – “When X, if Y, then Z” – “When somebody drinks, if they’re Male, drink twice”. This is more interesting, because parts of his ‘if’s can be dynamic – they can use a random number generate to choose between True/False, Male/Female/Other, or an age.

He makes a fun little app to act as a GUI to his bot. During his next party, when his friend spins the wheel, they get the same rule twice in a row – it kinda kills the mood. He realises that he needs a way to prevent that from happening. He makes the bot store a list in-memory of each rule’s ‘when’, ‘if’ and ‘then’; every time it it generates a new one, it checks it’s not the same as an old one. Whenever his app crashes, he notices that it loses knowledge of the previous rules, but accepts it.

Finally, wanting to spread the joy, Steven realises that if he were to post his rules on Twitter, a collection of assorted Tweets over time wouldn’t be very useful. In considering how people might want to interact with it, he has an idea: if you Tweet at the bot and include the word ‘game’ or ‘rule’, it will generate 100 rules for you – more than enough for a couple of games. Occasionally, he updates the scenarios and makes them slightly more dynamic – “When word W is mentioned…”. As he sees people using it, one asks if he could personalise the bot somehow, and he considers extending it to take a list of names – “When Josh burps, if you are the last person to touch your nose, then drink”.

In this example, Steven improved his bot over time. He tested it, and made decisions that prevented him from needing to do any complex database work – the final bot is stateless, could function forever, and responds to users via text on-demand. However, it may eventually become repetitive, and initially he had to be quite imaginative.