Predicting content popularity in fanfiction communities
In this post, Eurecat’s Computational Social Science team presents a model for predicting the popularity of content in fanfiction communities, developed as part of the work carried on in the Möbius project.
From customer intelligence to prosumer intelligence
One of the aims of the Möbius project is to guide the publishing sector in dealing with the emerging prosumer paradigm. While current practices of publishers are still mostly based on a vision of the consumer as a passive actor who will just buy or not buy the product, and more or less traditional marketing and recommendation approaches are followed in order to maximize sales, the potential of the prosumers as co-creators of content and in all the steps of the process is not fully taken into account; the enormous wealth of content and interactions generated by these online communities remains still untapped.
Taking advantage of the wealth of data created by prosumers, in the form of original content, reviews, feedback and interactions, implies being able to mine this data, make sense of it and extract actionable knowledge. To this end, the Eurecat team developed computational methods to analyse content and interactions in prosumer communities, based on established platforms where it is possible to access large amounts of data.
One of the needs detected in the conversations and focus groups with publishers is that of predicting content popularity, to be able to detect trends and identify content that has the potential for publishing.
Data scraping from a fanfiction platform: Archives of Our Own
To address this issue, the Eurecat team developed a model for predicting the popularity of works in fanfiction communities and applied it to the Archives of Our Own platform (AO3), an open-source fanfiction platform developed and maintained by fans, where users can publish works and review each other’s works. The website is one of the main references worldwide for fanfiction work, and as reported on its home page it currently hosts almost 9 million works created by the users, organized in over 40 thousand fandoms, with over 4 million registered users.

The dataset scraped from AO3 includes the complete interaction data for works from seven communities: Marvel, Harry Potter, Sherlock Holmes, Lord of the Rings, Percy Jackson, Twilight, and Warriors. For each of these communities, all the comments, replies, bookmarks and kudos (similar to likes) were retrieved.
Model for predicting the popularity of content
Among other analyses performed on this data, the Eurecat team developed a model to predict works that will become popular in the near future, based on the previous history and on current growth speed.
The popularity of work is defined as the number of distinct users that have written at least one comment on it; this is more accurate than considering the overall number of comments, which could be less relevant if due to the activity of just a few very active users. The aim is to identify works that will be in the top 1% according to this measure after 30 or 60 days.
The model developed uses logistic regression to predict whether work will become popular, based on the total feedback acquired until the current time, and the feedback variation over the last days. The prediction achieves good accuracy, with a precision of 0.79, recall of 0.90, and F1 score of 0.84.
The model was then re-adapted to study tags instead of works, considering the popularity of a tag as derived from the (normalized) popularity of the works to which the tag is assigned. With tags, the accuracy obtained is higher: precision of 0.85, recall of 0.91, and F1 score of 0.87.
Conclusion and next steps: Prosumer Intelligence Toolkit
With the first model one can identify works that are very likely to become popular, and therefore could be for example good candidates for being considered for publishing; with the second model, one is able to identify trending topics, i.e. keywords, categories, terms, topics, genres or subgenres that are growing in popularity and therefore represent promising fields for exploration.
This is one of several analyses that are being carried out leveraging data from established fanfiction communities. The next step consists in developing a Prosumer Intelligence Toolkit with interactive dashboards to show the potential of this kind of metrics to extract actionable knowledge and foster cooperation between prosumers and the publishing sector.
This work was carried out by the Computational Social Science team at Eurecat, whose work focuses on studying social interactions and collective behaviour on online platforms.