Christmas Movies

December 16, 2019

As the holiday season is approaching, also reflected in Google Trends (Figure 1), probably many of us are preparing to enjoy some recreational movie nights – but where to start? Here I analyze movies from the Internet Movie Database (IMDb [1]) that users related to Christmas.


Figure 1: The popularity of the keyword Christmas in Google search results (Source: www.google.com/trends https://trends.google.com/trends/explore?geo=US&q=Christmas).

For this, I collected all the movies from IMDb that contain the keyword Christmas added by the platform's users, and gained at least 10 votes since their release, comprising a dataset of 7,512 films.

First, I compared the movies based on their popularity (number of votes on IMDb) and listed the Top 20 in Figure 2. The figure shows that the IMDb community certainly thinks “Die Hard” is a Christmas movie with its ~740k votes, while the old-time classic “Home Alone” only catches the 5th position on the list. Surprisingly, none of the top three movies won an Oscar (although they were nominated). Out of the 20 movies, 11 were nominated for the prestigious award, and 4 won.


Figure 2: The Top 20 highest voted movies on IMDb tagged with the keyword Christmas. The colors encode whether a certain movie has won an Oscar, got nominated, or neither.

Next, I extracted the Top 5 keywords for each Christmas movie and constructed their co-occurrence network (Figure 3). The giant component consists of XXX keywords after applying a backbone filtering algorithm [2], and lets us explore the typical topics centered by keywords such as the general themes of Christmas Eve, Santa Claus, and winter, or different sorts of transportation methods, like reindeer and train. Less expected topics emerged as well, like talking animals or nudity.


Figure 3: Keyword co-occurrence network. Node size is proportional to the frequency (number of movies) each keyword was associated with, while the colors encode total popularity of the movies they occurred in.

Finally, I used these keywords to group the different movies together based on the number of keywords they shared. This resulted in a core network of 2,181 movies (Figure 4), where the stronger the connection between the two movies is, the more similar they are. Consequently, the most central nodes in this network are the most general regarding their topics, the ones being the most alike to the rest of their neighborhood. These most “general” movies are the largest nodes on the figure, and while some of them sound familiar (e.g. Rocky), most of them are quite unknown (e.g. The Christmas Goose) yet are promised to be a good mixture of Christmas-related topics. Therefore, a good strategy to map out the Christmas movies landscape is to pick the largest nodes, evaluate their topics, and then go for some of their highly-voted neighbors. Please find a searchable pdf movie-map here (or see the file attached below the blog post; you can also click on Figure 4 to access it).

Figure 4: Each node represents a Christmas movie, while the size of the nodes is proportional to how “general” the topic of a movie (set of keywords) is, while the color of the movies from red to green shows the increase in popularity.

[1] www.imdb.com

[2] Network backboning with noisy data. Coscia, Michele and Neffke, Frank MH, 2017

Blog post by Milán Janosov

Attachment: 

Share