As the holiday season is approaching, also reflected in Google Trends (Figure 1), probably many of us are preparing to enjoy some recreational movie nights – but where to start? Here I analyze movies from the Internet Movie Database (IMDb ) that users related to Christmas.
For this, I collected all the movies from IMDb that contain the keyword Christmas added by the platform's users, and gained at least 10 votes since their release, comprising a dataset of 7,512 films.
First, I compared the movies based on their popularity (number of votes on IMDb) and listed the Top 20 in Figure 2. The figure shows that the IMDb community certainly thinks “Die Hard” is a Christmas movie with its ~740k votes, while the old-time classic “Home Alone” only catches the 5th position on the list. Surprisingly, none of the top three movies won an Oscar (although they were nominated). Out of the 20 movies, 11 were nominated for the prestigious award, and 4 won.
Next, I extracted the Top 5 keywords for each Christmas movie and constructed their co-occurrence network (Figure 3). The giant component consists of XXX keywords after applying a backbone filtering algorithm , and lets us explore the typical topics centered by keywords such as the general themes of Christmas Eve, Santa Claus, and winter, or different sorts of transportation methods, like reindeer and train. Less expected topics emerged as well, like talking animals or nudity.
Finally, I used these keywords to group the different movies together based on the number of keywords they shared. This resulted in a core network of 2,181 movies (Figure 4), where the stronger the connection between the two movies is, the more similar they are. Consequently, the most central nodes in this network are the most general regarding their topics, the ones being the most alike to the rest of their neighborhood. These most “general” movies are the largest nodes on the figure, and while some of them sound familiar (e.g. Rocky), most of them are quite unknown (e.g. The Christmas Goose) yet are promised to be a good mixture of Christmas-related topics. Therefore, a good strategy to map out the Christmas movies landscape is to pick the largest nodes, evaluate their topics, and then go for some of their highly-voted neighbors. Please find a searchable pdf movie-map here (or see the file attached below the blog post; you can also click on Figure 4 to access it).
 Network backboning with noisy data. Coscia, Michele and Neffke, Frank MH, 2017
Blog post by Milán Janosov