Movie Analytics by Dominique Haughton Mark-David McLaughlin Kevin Mentzer & Changan Zhang

Movie Analytics by Dominique Haughton Mark-David McLaughlin Kevin Mentzer & Changan Zhang

Author:Dominique Haughton, Mark-David McLaughlin, Kevin Mentzer & Changan Zhang
Language: eng
Format: epub
Publisher: Springer International Publishing, Cham


Fig. 3Weekly attendance distribution per year and theater

4 Evolution of the Weekly Number of Movie Goers

To better visualize the evolution, dates have to be associated to data. To do so, we used the R package lubridate which allows for a very easy handling of dates and times. After a number indicating the week of the year has been added to every row in the data frame df.all, the functions ymd and weeks were used to set up the date of every row: ymd was used to transform the year into a date (the first day of the corresponding year) and weeks to add the proper number of weeks to this starting date.

library(lubridate)

library(scales)

# add a number indicating the week of the year (from 1 to 53)

df.all$week <-rep(1:53, nrow(df.all)/53)

# set up dates for every row

df.all$date <- ymd(paste0(as.numeric(as.character(df.all$year)),"0101")) +

weeks(df.all$week-1)

# attendance evolution per movie theater

ggplot(df.all, aes(x=date, group=theater, y=value, colour=theater)) +

geom_line() + xlab("date") + ylab("weekly attendance") +

scale_x_datetime(breaks = date_breaks("1 year"), labels=date_format("%Y")) +

labs(colour="movie theater")

# smoothed attendance evolution per movie theater

span.param <- c(0.05, 0.1, 0.2, 0.5)

p <- list()

for (ind in seq_along(span.param)) {

p[[ind]] <- ggplot(df.all, aes(x=date, group=theater, y=value, colour=theater)) +

geom_smooth(span=span.param[[ind]]) + xlab("date") +

ylab("weekly attendance") + labs(colour="movie theater") +

ggtitle(paste("smoothing parameter:",span.param[[ind]]))

}

# use the function at

# http://www.cookbook-r.com/Graphs/Multiple_graphs_on_one_page_%28ggplot2%29

# to display several ggplot in a single figure

multiplot(plotlist=p, cols=2)

The first obtained dates are:

> head(df.all$date)

[1] "2002-01-01 UTC" "2002-01-08 UTC" "2002-01-15 UTC" "2002-01-22 UTC" "2002-01-29 UTC" "2002-02-05 UTC"

Then, using the R package scales for improving the display of dates on the x axis, the evolution was drawn with geom_line (package ggplot): the data were grouped by theater, each one associated to a given color as shown in Fig. 4. However, this graph is still not very informative since attendances in movie theaters are highly variable and subject to the release of appealing new movies. Theater G seems to be the theater with the highest number of movie goers but it is also sometimes beaten by theaters A and C.

Fig. 4Evolution of the weekly attendance per movie theater



Download



Copyright Disclaimer:
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.