Rockbuster Stealth LLC

1. Project Goals

Motivation
Rockbuster Stealth LLC is a fictional movie rental company that used to have stores around the world. Facing stiff competition from streaming services, the Rockbuster Stealth management team is planning to use its existing movie licenses to launch an online video rental service to stay competitive.

Objective
Extract key insights from RockBuster’s operational database to help with the launch strategy for the new online video service.

Zoomed view of Entity Relationship Diagram — Zoomed view of ERD with fact tables, payment and rental

2. Key Questions

RockBuster Stealth LLC management had some questions that wanted answers for:

Which movies contributed the most/least to revenue gain?
What was the average rental duration for all videos?
Which countries are Rockbuster customers based in?
Where are customers with a high lifetime value based?
Do sales figures vary between geographic regions?

Besides these, and after some exploratory data analysis, I raised a set of extra questions:

How is the revenue distributed across the film catalog? What does this tell us?
Is there any film rating and/or category that stands out and deserves special marketing attention?
What’s our customers effective rental pattern in terms of duration? Do they comply with the rentals as per catalog?
How does each rental rate perform in relative terms? What else does this analysis uncovers?

3. Process

Rockbuster provided the Dataset with their operational data. Therefore, with exception to sourcing data, I developed a full life cycle of data analysis, from data cleaning to final presentation to stakeholders, including documenting the database itself, producing a data dictionary (link).
Using SQL for this process, I started from scratch installing Postrgesql - a relational database management system - and pgadmin4 which acts as user interface to deal with the database itself. There, I made all actions related with cleaning and summarizing data, and on a later stage developed the queries that were needed to answer the relevant business questions, using techniques such as table joins, subqueries, common table expressions and window functions.

Pgadmin4 web interface — Pgadmin4 view with a Query using CTEs and Window functions

All the queries and corresponding outputs were compiled in an excel file (insert link), to be shared with technical peers.

4. Key Insights from Data

Distribution of revenue across the film catalog

Distribution curve films vs revenue — This is probably one of the most interesting insights of the project!

Rental rates performance

Revenue Distribution per rental rate Rental rates legend On the left it's shown the revenue distribution for each of the rental rates defined in the catalog.
Rate in this context is simply the price at which the films are rented for the agreed rental duration. Since the number of rentals on each rate are well balanced, the average revenue per rental follows a very similar distribution. But the immediate question was:

With a similar number of rentals, shouldn’t a rate of $4.99 produce a revenue approximately 5 times better than $0.99?

Normalized Average Revenue In fact, if we normalize the average revenue per rental, i.e., if we divide the average revenue per rental by the value of the rental rate itself, we can quickly see that 0.99$ is the one that performs better.

But why does this happen?

The secret lies in the penalty policy. The customers keep their films in completely different patterns from what would be expectable just by looking at what is defined in the catalog, and they incur in penalties in case of delays. Looking deeper into the data, I’ve discovered that the penalties are a fixed amount (1$) per day of delay, which in turn influences more the lowest rental rate.

Payments versus delays — How amounts paid evolve with the days in delay

Ocupancy Ratio

Another metric developed was the Occupancy Ratio. It was measured the timespan between the first rental date and the last return date on the database, serving as the denominator for the ratio (100% of time). Then, I measured for each existing copy of the films, how much time they were effectively on rental (or on the other hand they were catching dust in our stores shelves). Finally, I've computed an average time of the several copies of each film. The division of the latter by the denominator gave us the film's Occupancy Ratio.

Occupancy Ratio distribution — The movies could be rented way more time than they are

Geographical Distributions

Several distribution maps were created to show:

Customers Distribution
Revenue Distribution
Revenue per customer Distribution
Top 5% spender customers Locations

Customers Distribution Map

Revenue Distribution Map

Revenue per Customer Distribution Map

Top 5% spenders Distribution Map

5. Results

Recommendations to the online video service launch strategy

Remove variable rental duration concept and apply one single duration for all rentals. Recommended 48H.
Eliminate or review entirely the penalty policy:
- On online service, physical return don't exist. At best, an extension of digital rental could be in place
- As demonstrated, it makes us lose money in higher rental rate films
Focus promotional efforts on the top 30% of films. They represent more than half of the revenue.
Expand the film catalog with films based in our top countries (e.g. Bollywood films, as India is the top country in customers and revenue)
Monitor dynamically occupancy ratio to:
- decide when to purchase additional licenses for top requested films
- propose discounted rentals on films with lower occupancy ratio
Focus geographical marketing efforts in the top revenue countries as well as in the top revenue per customer

Project Final Presentation (web view)

The full project documentation can be found on Github Repository, including both pdf and ppt versions of the presentation.

Rockbuster Stealth LLC Project

A data-driven approach to help with the
launch strategy of online video rental service