There has been a lot of discussion about movies at box office in the last 2-3 years. The movie theater industry naturally took a hit during 2020 as many areas were closed or limited in capacity due to the COVID-19 virus.
In the post-COVID era, many tried to anticipated how the industry would recover. So far, we have seen websites like x and y report that certain genres of movies like Dramas and biographies have not been doing well.
Some of highlighted attention to superhero movies that have bombed, such as The Marvels and The Flash. Lastly, some have pointed out several Disney movies in general have also been shaky at box office like The Little Mermaid and Ant-Man and the Wasp: Quantumania.
So, what are the major trends? Let’s do some analysis.
Quick Analysis
Using data from the-numbers.com (top grossing movies of 2023), and an all-time movie budget list, we can display a variety of charts with some of the top grossing movies this year to give us some quick insight. (I calculated a rough profit estimate by taking the Domestic gross * 0.55 and the international gross * 0.40 to reflect how much of the revenue the movie studios take in over the life of the movie, as the rest goes to the theaters.)
I was able to compile and analyze 25 of the more noticeable movies that released this year (based on the US market) by combining the aforementioned gross list and budget list, and joining them using a fuzzy match operation in Excel.
The dataset I created from the above sources will account for the distributor (Disney, Warner Bros, etc.), the genre of the movie (Action, Comedy, Adventure, Drama, etc.) and their gross figures.
This is a snapshot of the excel table that I compiled by appending and merging tables.
As we can see (and this table will be upgraded every few weeks for upgraded box office totals), so far it looks like a lot of the box office failures (from a profit standpoint) belong to Disney, and that many of the highest grossing pictures are non-Disney productions. Let’s look at some charts I created to discover some other things.
As expected, the highest months of the year have come during the spring and summer. Early-April and mid-July are the highest months, and the gross during the fall has obviously been lower.
Right away we can see that it has not been a great year for Superhero movies at box office. Even adventure movies have struggled. Is this due to a decline in quality, or possible disinterest in superhero movies? Who knows.
Action movies have grossed a lot.
Dramas have been successful, especially because of Oppenheimer.
Sony has been having a successful year, relatively speaking, and Universal as well, especially internationally. However, Disney has been comparatively mediocre, especially compared to their usual standards.
Let’s look at profit.
It seems like it Adventure movies, Superhero movies, and Thrillers, etc. have not only struggled at box office, but have also turned in financial losses. Additionally, Disney, Paramount, and 20th Century Studios seem to be in the red (figuratively and literally) on the year.
So from our preliminary analyses, we see that Adventure and Superhero genre of movies are not fairing well at box office and are losing money, and that among distributors, Disney and Paramount are responsible for much of the biggest losses this year – and this sticks out for Disney more.
Let’s do some data science to see if we can find even more interesting tidbits.
Exploratory Data Analysis: Segmentation
We can already get some basic insight based on the charts above. We can tell that some genres and distributors seem to be incurring some financial losses and despite their films earning many hundreds of millions at box office. And that even some of their box office earnings are lower than others.
Let’s use some machine learning and exploratory data analysis to find out what’s going on. This won’t be perfect, as the sample of notable film releases for a single year will never produce a large dataset (and because there does not seem to be a clear consensus of line of when the post-COVID film era started), but it can be good enough.
I used K-Means Clustering – an algorithm – to segment the data into clusters that can give us some more useful information. I used Python, and libraries such as pandas, sklearn and matplotlib. (A Google Colab noteback containing the full segmentation process can be found here.)
I created a CSV table based on the Excel table shown earlier.
The columns are rather straightforward: the Genre is the story type, the Distributor is the film company that produced, financed, and distributed the movie, the Gross is how much revenue the movie earned at box office, etc.
Preprocessing Data
In order to make this work, I have to account for certain columns that are not necessary, or that need to be converted. For instance, I decided to drop the Release Date and Movie columns because as mentioned before, we already know that the general seasonal trends of the box office have remained (movies earning more in the summer), and because the Movie column is just the name of the movie, which in this case is irrelevant. (Each movie has a unique name, and this dataset does not include other columns or multiple years to account for the power of a movie title’s brand name or sequels.)
Additionally, the dataset has a “Column1” column which is just the original rank that was derived from the original tables, and a “Top Grossing Movies of 2023.Genre” column, which is similar to the “Genre” column but does not differentiate between regular adventure movies and superhero movies.
df = df.drop(['Column1', 'Movie', 'Release\r\nDate', 'Top Grossing Movies of 2023.Genre'],axis=1)
Here is a going to create a heatmap to see which variables have some sort of association with each other.
plt.figure(figsize=(12,8))
cmap = sns.diverging_palette(150, 300, as_cmap=True)
sns.heatmap(df.corr(), annot=True, fmt='0.2f', cmap=sns.cubehelix_palette(as_cmap=True))
plt.show()
As we can see, and expect, there is:
* A strong positive correlation between domestic gross totals, and worldwide gross titles. The stronger a movie does in the USA, the likelier it is to be successful worldwide.
* The effect is *slightly* higher for international gross. Obviously a movie has to be strong interntionally to give a chance to gross a large amount at box office. Although it is higher internationally, it says a lot that a movie that is sucessful in a single region – North America – is almost as influential on the worldwide box office as figures as every other region abroad.
* As one would also figure, there is a good correlation between domestic gross and profit – more so than international gross and worldwide gross. This is because, as mentioned before, studios collect a larger percentage of domestic gross revenues. So it is preferable for a movie financially to perform well domestically than internationally/worldwide.
Lastly, I am going to convert the categorical variables (like Genre) into a numerical variables to make the clustering process smoother:
#creating list of dummy columns
cat_cols_dummies=['Genre', 'Top Grossing Movies of 2023.Distributor']
#creating dummy variables and reassigning the data frame# your code
df =pd.get_dummies(data=df,columns=cat_cols_dummies,drop_first=True)
K-Means Clustering
We are going to use K-means clustering for a segmentation model. This will group the data into clusters to help see which major trends/groups are worth focusing on.
The following is a collection of box plots that we have created based on two clusters:
When analyzing these plots, we can see that the clusters can be grouped as such:
Cluster 0: Movies, defined by higher Worldwide gross, domestic gross, higher profit, c. ticket sales, lower production budget, and distribution from Lionsgate.
Cluster 1: Movies (mostly distributed by Disney), less tickets sold, lower worldwide gross, domestic gross, lower profit, higher production budget, and a lot of adventure movies.
Conclusion
So with our analyses, we can paint a picture. We can see that the 2023 Box Office has largely been the story of high-budget Disney blockbusters – especially superhero and action movies – losing money, with other distributors being relatively healthy in comparison. This exploratory data analysis tracks with much of the literature that has been written about the movie industry and box office this year.
Disney needs to work on lowering their production costs for their movies, that much is clear. Many of their big bombs this year – including ones not listed in this dataset – were marred with over-bloviated budgets, and troubled development full of reshoots, reliance on CGI in post-production, and other late additions. Examples of this include the troubled pre-production of The Marvels, and Ant-Man 3. Disney should narrow their ambitions and do a better job at pre-planning their stories and creating more cohesion.