Posted Thursday, July 19th, 2018 • One Year Project
My undergraduate dissertation was to use insights derived from data-mining video game analytics to improve decision making in the early-access development lifecycle.
This was my final year dissertation project at University. Having just spent a year helping develop games, I wanted to understand the factors influencing the success of early-access products with living development cycles reliant on user-feedback. It was an opportunity to learn more about data-science, and combine my passion for gaming with web development.
My hypothesis was that early-access games with a higher cadence of updates incur higher review scores and player-count prior to full commercial release, creating a positive feedback loop of common success metrics. By validating this information, I could help improve decision-making during development.
Doing the project gave me experience:
“[My] paper covers the construction of a data-mining architecture sufficient to explore the software development lifecycle using engagement data, and furthermore quantifies some relationships between metrics during this time period.
Research is conducted towards … the practical application of business strategy in the context of the video games industry.
Through statistical analysis it is outlined that the data … can be used to both classify the type of ‘story’ a specific product has, and tune the process of active development to maximise the success criteria of a project.
The insights derived from analysis are placed into an industrial context through an evaluative survey, and intended to be used as a catalyst to provide teams undertaking rapid prototyping with justification and context for their decisions, with further application in predictive modelling.”
I built a data ingest pipeline using Python (Flask), Couchbase and RabbitMQ with which I could scrape information about Steam games tagged as “early-access”. The pipeline drew from a number of public APIs to retrieve information about CCU, user reviews and content-updates. When this was deployed (DigitalOcean), I used Jupyter to explore and graph the data for my paper.
Using this pipeline, I retrieved information about 977 games, with a total of 1,533,188 reviews. In short, although there were some ambiguities noted, with an evidenced understanding of factors influencing the success of previous ‘early access’ products, developers could use the results in order to plan their own release strategy. This may remove uncertainties in the process, e.g.
On average, 8% of reviews per product are submitted by users who have received the product for free, with software and casual game product genres receive the highest proportion of free copies (6% and 4% respectively).
While a higher total of updates is not necessary to achieving a positive user score, products that have a larger number of updates logged receive more positive recommendations. All products with more than 125 updates have a user score higher than 50 (positive), and conversely, products scored negatively rarely reach more than 100 updates.
At the end of the project, I presented my findings at a fair and sought feedback from a senior data scientist in the games industry:
“For the first time this large dataset could provide answers to some … questions.
The findings are in many cases a validation of my pre-existing expectations, which is valuable both as a reassurance that our existing and less formal or objective domain knowledge that we have been using to make decisions is accurate, as well was providing a methodology to maintain an objective view on the continued truth of this.
We could use [the] findings to make fundamentally different decisions about the path of product development/release based on early metrics.”
To conclude, I met all of my objectives and gained some valuable insights. It appears to present a good springboard for future investigations.