Analyzing Movie Ratings via SVD

In this notebook we're going to be working with a subset the MovieLens25M dataset. The original dataset (https://grouplens.org/datasets/movielens/25m/) contains
The dataset was generated on November 21st, 2019, so it is pretty current. In this activity, we're going to be using a reduced subset of this data that only includes popular movies (that have been rated at least 1000 times) and users that have rated lots of movies (at least 500). This leaves us with
The goals of this activity are threefold.
  1. To work with a different type of data than images or temperatures (here we will be working with ratings). Applying the tools you have learned in this module to different domains will help solidify your learning, help you see connections, and potentially get you excited for your module 1 project.
  2. To see how SVD can be used to examine the important trends in your data (since we had lots of practice with using the EVD on the overnight).
  3. To have some fun!
To get started, we're going to load the data and display a little bit of the data. Please see the comments in the code for some more information.
load('movielens25m.mat');
sizeOfMovies = size(movies)
sizeOfMovies = 1×2
3790 3
% the cell array `movies` is 3706 by 3. Each of the 3706 entries correspopnds to a
% particular movie, and along the second dimension the entries correspond to the movie ID,
% the movie title, and the movie genre
%
% Here we extract the information about the first movie in the dataset
[movieId, movieTitle, movieGenre] = movies{1,:}
movieId = int64 1
movieTitle = 'Toy Story (1995)'
movieGenre = 'Adventure|Animation|Children|Comedy|Fantasy'
ratingsSize = size(ratings)
ratingsSize = 1×2
9663 3790
% the matrix `ratings` is 6040 by 3706 and encodes the rating that a
% particular user (row) gave to a particular movie (column). The ratings
% are 1, 2, 3, 4, or 5 stars or the special value NaN (not a number) if the
% user didn't rate that particular movie.
%
% Let's look at the ratings that were given to the first movie in the
% dataset, which as we saw is Toy Story. We can do this using the histc
% function (we'll ignore missing values in this analysis)
possibleRatings = [0.5:0.5:5];
nRatings = histc(ratings(1,:), possibleRatings);
figure;
bar(possibleRatings, nRatings);
xlabel('Rating');
ylabel('Number of Users');
title(['Ratings for ', movieTitle])
Okay, yeah that was a pretty great movie. Let's check out a less good movie, Anaconda. Highly recommended!! Look at this cast https://www.imdb.com/title/tt0118615/fullcredits !!!
anacondaIndex = 850;
[movieId, movieTitle, movieGenre] = movies{anacondaIndex,:}
movieId = int64 1499
movieTitle = 'Anaconda (1997)'
movieGenre = 'Action|Adventure|Thriller'
nRatings = histc(ratings(anacondaIndex,:), possibleRatings);
figure;
bar(possibleRatings, nRatings);
xlabel('Rating');
ylabel('Number of Users');
title(['Ratings for ', movieTitle])

Cleaning up the Data

As you probably guessed, we're going to be applying SVD to this data. Before we start analyzing this data, we're going to do a few things to make the problem a bit easier to handle. First we're going to have to deal with the fact that we have a bunch of missing values in our ratings matrix (i.e., movies that particular users did not rate). The step of filling in missing values is called data imputation. There are many ways to do this, but we've chosen a particularly easy strategy of simply replacing any ratings with the average rating of that particular movie (e.g., if a user didn't rate Toy Story, we would fill it in with the average rating of Toy Story based on the other users in the dataset who actually rated that movie).
ratingsFilled = fillmissing(ratings, 'constant', nanmean(ratings));
As a final data cleaning step, we're going to subtract out the mean of each row. This will control for the fact that users vary considerably in how the numerical score they assign to movies (e.g., one user's 3 may be more comparable to amother user's 1).
ratingsMeanCentered = ratingsFilled - mean(ratingsFilled,2);

Framing the Problem Using SVD

Next, let's think about how SVD might help us to analyze this dataset. Suppose we compute the SVD of the matrix ratingsMeanCentered. Let's use to refer to the first left singular vector, to refer to the first right singular vector, and to refer to the first singular value (let's assume that the first pair of singular vectors has the largest singular vector).

Exercise

Before running any other code in this notebook, answer the following questions regarding the first pair of singular vectors.
  1. What are the sizes of and ? What do each of the dimensions of correspond to? How about each dimension of ?
  2. In 15.3.8 we talked about compressing the original matrix down to m + n + 1 values. If we think of as the compressed version of ratings data, how would we reconstruct the ratings data using (you essentially did this already, we're hoping you can recall this fact from earlier and apply it here).
  3. We can think of as encoding the dominant trend that explains the ratings of each movie. For this dataset, what might this correspond to?
  4. We can think of as encoding the dominant trend that explains the ratings by each usere. For this dataset, what might this correspond to? Keep in mind we have already subtracted out the mean of each user. It might be helpful to expand your formula from problem 2 to see how interact with each other.
Now we're going to compute the SVD. We'll just compute the 10 pairs of left and right singular vectors with the largest singular values.
[U, Sigma, V] = svds(ratingsMeanCentered, 10);

Examining the Right Singular Vectors

Now that we've computed our singular vectors, let's see if we can make sense of them. It turns out that the right singular vectors (the ones that have to do with movies) are generally more interpretable than the left singular vectors (the ones that have to do with users). We'll start out by looking at each right singular vector.

Exercise

Before running the code, think through the following question with your table-mates.
What might you do in order to make sense of what a particular right singular vector represents? Consider things like examining small or large values, looking for correlations, etc. There's not only one right answer, so throw out some ideas and try to think through what examining a particular aspect of the vector might tell you.
(we'll leave a little space to make it easier not to look at what we did)

Looking at Large and Small Values

One simple way to understand the right singular vectors is to look at the largest and smallest components of each vector. This will tell us which movies are either most strongly (positively) and most strongly (negatively) associated with this component. In the code below, we'll print out the title, genre, and component of the 10 movies that are most positively and most negatively associated with each right singular vector. Exercise: Based on these outputs, can you tell a story about what the singular vector represents?
for i = 1 : 10
disp(['Component ', num2str(i)]);
getHighAndLowMovies(V(:,i), movies)
end
Component 1
ans = 20×3 cell
 123
1'Usual Suspects, The (1995)''Crime|Mystery|Thriller'0.0309
2'12 Angry Men (1957)''Drama'0.0310
3'Seven Samurai (Shichinin no samurai) (1954)''Action|Adventure|Drama'0.0312
4'Pulp Fiction (1994)''Comedy|Crime|Drama|Thriller'0.0318
5'Godfather: Part II, The (1974)''Crime|Drama'0.0318
6'Band of Brothers (2001)''Action|Drama|War'0.0349
7'Godfather, The (1972)''Crime|Drama'0.0353
8'Shawshank Redemption, The (1994)''Crime|Drama'0.0353
9'Planet Earth (2006)''Documentary'0.0370
10'Planet Earth II (2016)''Documentary'0.0375
11'Epic Movie (2007)''Adventure|Comedy'-0.0626
12'Kazaam (1996)''Children|Comedy|Fantasy'-0.0599
13'Battlefield Earth (2000)''Action|Sci-Fi'-0.0598
14'Baby Geniuses (1999)''Comedy'-0.0597
15'Dumb and Dumberer: When Harry Met Lloyd (2003)''Comedy'-0.0554
16'Mighty Morphin Power Rangers: The Movie (1995)''Action|Children'-0.0542
17'Lawnmower Man 2: Beyond Cyberspace (1996)''Action|Sci-Fi|Thriller'-0.0538
18'Police Academy 6: City Under Siege (1989)''Comedy|Crime'-0.0527
19'Problem Child 2 (1991)''Comedy'-0.0526
20'Home Alone 3 (1997)''Children|Comedy'-0.0523
Component 2
ans = 20×3 cell
 123
1'Matrix Reloaded, The (2003)''Action|Adventure|Sci-Fi|Thriller|IMAX'0.0741
2'Jurassic Park (1993)''Action|Adventure|Sci-Fi|Thriller'0.0749
3'Star Wars: Episode II - Attack of the Clones (2002)''Action|Adventure|Sci-Fi|IMAX'0.0765
4'Titanic (1997)''Drama|Romance'0.0766
5'Shrek (2001)''Adventure|Animation|Children|Comedy|Fantasy|Romance'0.0770
6'Armageddon (1998)''Action|Romance|Sci-Fi|Thriller'0.0775
7'Men in Black (a.k.a. MIB) (1997)''Action|Comedy|Sci-Fi'0.0783
8'Forrest Gump (1994)''Comedy|Drama|Romance|War'0.0789
9'Star Wars: Episode I - The Phantom Menace (1999)''Action|Adventure|Sci-Fi'0.0858
10'Independence Day (a.k.a. ID4) (1996)''Action|Adventure|Sci-Fi|Thriller'0.0959
11'Solaris (Solyaris) (1972)''Drama|Mystery|Sci-Fi'-0.0167
12'Stuart Saves His Family (1995)''Comedy'-0.0167
13'Halloween III: Season of the Witch (1982)''Horror'-0.0166
14'Pink Flamingos (1972)''Comedy'-0.0165
15'Dead Ringers (1988)''Drama|Horror|Thriller'-0.0164
16'Even Cowgirls Get the Blues (1993)''Comedy|Romance'-0.0163
17'Mr. Wrong (1996)''Comedy'-0.0163
18'Alphaville (Alphaville, une Ã©trange aventure de Lemmy Caution) (1965)''Drama|Mystery|Romance|Sci-Fi|Thriller'-0.0163
19'Grand Illusion (La grande illusion) (1937)''Drama|War'-0.0162
20'Girl 6 (1996)''Comedy|Drama'-0.0161
Component 3
ans = 20×3 cell
 123
1'Thor (2011)''Action|Adventure|Drama|Fantasy|IMAX'0.0568
2'Iron Man (2008)''Action|Adventure|Sci-Fi'0.0572
3'Pirates of the Caribbean: Dead Man's Chest (2006)''Action|Adventure|Fantasy'0.0583
4'X-Men: The Last Stand (2006)''Action|Sci-Fi|Thriller'0.0616
5'Pirates of the Caribbean: At World's End (2007)''Action|Adventure|Comedy|Fantasy'0.0619
6'Avengers, The (2012)''Action|Adventure|Sci-Fi|IMAX'0.0636
7'Avatar (2009)''Action|Adventure|Sci-Fi|IMAX'0.0639
8'Iron Man 2 (2010)''Action|Adventure|Sci-Fi|Thriller|IMAX'0.0640
9'X-Men Origins: Wolverine (2009)''Action|Sci-Fi|Thriller'0.0656
10'Transformers (2007)''Action|Sci-Fi|Thriller|IMAX'0.0741
11'E.T. the Extra-Terrestrial (1982)''Children|Drama|Sci-Fi'-0.0559
12'Who Framed Roger Rabbit? (1988)''Adventure|Animation|Children|Comedy|Crime|Fantasy|Mystery'-0.0541
13'Big (1988)''Comedy|Drama|Fantasy|Romance'-0.0497
14'Jaws (1975)''Action|Horror'-0.0489
15'Ghostbusters (a.k.a. Ghost Busters) (1984)''Action|Comedy|Sci-Fi'-0.0488
16'Honey, I Shrunk the Kids (1989)''Adventure|Children|Comedy|Fantasy|Sci-Fi'-0.0465
17'Ghost (1990)''Comedy|Drama|Fantasy|Romance|Thriller'-0.0455
18'Dances with Wolves (1990)''Adventure|Drama|Western'-0.0452
19'Gremlins (1984)''Comedy|Horror'-0.0447
20'Beetlejuice (1988)''Comedy|Fantasy'-0.0444
Component 4
ans = 20×3 cell
 123
1'Air Force One (1997)''Action|Thriller'0.0539
2'Sleepless in Seattle (1993)''Comedy|Drama|Romance'0.0542
3'Pearl Harbor (2001)''Action|Drama|Romance|War'0.0543
4'You've Got Mail (1998)''Comedy|Romance'0.0565
5'Twister (1996)''Action|Adventure|Romance|Thriller'0.0594
6'Top Gun (1986)''Action|Romance'0.0599
7'Ghost (1990)''Comedy|Drama|Fantasy|Romance|Thriller'0.0600
8'Independence Day (a.k.a. ID4) (1996)''Action|Adventure|Sci-Fi|Thriller'0.0714
9'Pretty Woman (1990)''Comedy|Romance'0.0731
10'Armageddon (1998)''Action|Romance|Sci-Fi|Thriller'0.0805
11'Clockwork Orange, A (1971)''Crime|Drama|Sci-Fi|Thriller'-0.1208
12'Pulp Fiction (1994)''Comedy|Crime|Drama|Thriller'-0.1203
13'2001: A Space Odyssey (1968)''Adventure|Drama|Sci-Fi'-0.1052
14'Big Lebowski, The (1998)''Comedy|Crime'-0.0980
15'Being John Malkovich (1999)''Comedy|Drama|Fantasy'-0.0946
16'Shining, The (1980)''Horror'-0.0927
17'Royal Tenenbaums, The (2001)''Comedy|Drama'-0.0899
18'Fargo (1996)''Comedy|Crime|Drama|Thriller'-0.0896
19'Kill Bill: Vol. 1 (2003)''Action|Crime|Thriller'-0.0886
20'Taxi Driver (1976)''Crime|Drama|Thriller'-0.0853
Component 5
ans = 20×3 cell
 123
1'Matrix, The (1999)''Action|Sci-Fi|Thriller'0.0634
2'Fight Club (1999)''Action|Crime|Drama|Thriller'0.0639
3'Rock, The (1996)''Action|Adventure|Thriller'0.0645
4'Con Air (1997)''Action|Adventure|Thriller'0.0646
5'The Devil's Advocate (1997)''Drama|Mystery|Thriller'0.0651
6'Predator (1987)''Action|Sci-Fi|Thriller'0.0660
7'Die Hard: With a Vengeance (1995)''Action|Crime|Thriller'0.0698
8'Fifth Element, The (1997)''Action|Adventure|Comedy|Sci-Fi'0.0710
9'From Dusk Till Dawn (1996)''Action|Comedy|Horror|Thriller'0.0732
10'Starship Troopers (1997)''Action|Sci-Fi'0.0743
11'Babe (1995)''Children|Drama'-0.1251
12'Beauty and the Beast (1991)''Animation|Children|Fantasy|Musical|Romance|IMAX'-0.1108
13'Wizard of Oz, The (1939)''Adventure|Children|Fantasy|Musical'-0.0969
14'Toy Story 2 (1999)''Adventure|Animation|Children|Comedy|Fantasy'-0.0902
15'Toy Story (1995)''Adventure|Animation|Children|Comedy|Fantasy'-0.0892
16'Snow White and the Seven Dwarfs (1937)''Animation|Children|Drama|Fantasy|Musical'-0.0871
17'Little Mermaid, The (1989)''Animation|Children|Comedy|Musical|Romance'-0.0849
18'Chicken Run (2000)''Animation|Children|Comedy'-0.0827
19'Finding Nemo (2003)''Adventure|Animation|Children|Comedy'-0.0814
20'Mary Poppins (1964)''Children|Comedy|Fantasy|Musical'-0.0805
Component 6
ans = 20×3 cell
 123
1'Predator (1987)''Action|Sci-Fi|Thriller'0.0670
2'Back to the Future (1985)''Adventure|Comedy|Sci-Fi'0.0699
3'Die Hard (1988)''Action|Crime|Thriller'0.0713
4'Terminator, The (1984)''Action|Sci-Fi|Thriller'0.0728
5'Terminator 2: Judgment Day (1991)''Action|Sci-Fi'0.0764
6'Star Wars: Episode IV - A New Hope (1977)''Action|Adventure|Sci-Fi'0.0777
7'Raiders of the Lost Ark (Indiana Jones and the Raiders of the Lost Ark) (1981)''Action|Adventure'0.0851
8'Star Wars: Episode V - The Empire Strikes Back (1980)''Action|Adventure|Sci-Fi'0.0869
9'RoboCop (1987)''Action|Crime|Drama|Sci-Fi|Thriller'0.0870
10'Ghostbusters (a.k.a. Ghost Busters) (1984)''Action|Comedy|Sci-Fi'0.0904
11'American Beauty (1999)''Drama|Romance'-0.1038
12'Beautiful Mind, A (2001)''Drama|Romance'-0.0863
13'Crash (2004)''Crime|Drama'-0.0782
14'Good Will Hunting (1997)''Drama|Romance'-0.0740
15'Vanilla Sky (2001)''Mystery|Romance|Sci-Fi|Thriller'-0.0713
16'Forrest Gump (1994)''Comedy|Drama|Romance|War'-0.0709
17'American History X (1998)''Crime|Drama'-0.0691
18'As Good as It Gets (1997)''Comedy|Drama|Romance'-0.0678
19'Erin Brockovich (2000)''Drama'-0.0658
20'Dead Poets Society (1989)''Drama'-0.0658
Component 7
ans = 20×3 cell
 123
1'Billy Madison (1995)''Comedy'0.0849
2'Austin Powers in Goldmember (2002)''Comedy'0.0850
3'Scary Movie (2000)''Comedy|Horror'0.0876
4'Zoolander (2001)''Comedy'0.0877
5'Austin Powers: The Spy Who Shagged Me (1999)''Action|Adventure|Comedy'0.0932
6'Ace Ventura: When Nature Calls (1995)''Comedy'0.0937
7'Dumb & Dumber (Dumb and Dumber) (1994)''Adventure|Comedy'0.0948
8'Austin Powers: International Man of Mystery (1997)''Action|Adventure|Comedy'0.0962
9'Happy Gilmore (1996)''Comedy'0.1004
10'Ace Ventura: Pet Detective (1994)''Comedy'0.1009
11'Titanic (1997)''Drama|Romance'-0.0953
12'Saving Private Ryan (1998)''Action|Drama|War'-0.0876
13'Star Wars: Episode I - The Phantom Menace (1999)''Action|Adventure|Sci-Fi'-0.0788
14'Dances with Wolves (1990)''Adventure|Drama|Western'-0.0777
15'Braveheart (1995)''Action|Drama|War'-0.0751
16'E.T. the Extra-Terrestrial (1982)''Children|Drama|Sci-Fi'-0.0749
17'Schindler's List (1993)''Drama|War'-0.0745
18'Star Wars: Episode IV - A New Hope (1977)''Action|Adventure|Sci-Fi'-0.0730
19'Jaws (1975)''Action|Horror'-0.0707
20'Jurassic Park (1993)''Action|Adventure|Sci-Fi|Thriller'-0.0695
Component 8
ans = 20×3 cell
 123
1'Mrs. Doubtfire (1993)''Comedy|Drama'0.0615
2'Braveheart (1995)''Action|Drama|War'0.0620
3'Toy Story (1995)''Adventure|Animation|Children|Comedy|Fantasy'0.0649
4'Shawshank Redemption, The (1994)''Crime|Drama'0.0703
5'Titanic (1997)''Drama|Romance'0.0760
6'Home Alone (1990)''Children|Comedy'0.0766
7'Lion King, The (1994)''Adventure|Animation|Children|Drama|Musical|IMAX'0.0769
8'Jurassic Park (1993)''Action|Adventure|Sci-Fi|Thriller'0.0861
9'Back to the Future (1985)''Adventure|Comedy|Sci-Fi'0.0873
10'Forrest Gump (1994)''Comedy|Drama|Romance|War'0.1346
11'Lara Croft: Tomb Raider (2001)''Action|Adventure'-0.0782
12'Matrix Revolutions, The (2003)''Action|Adventure|Sci-Fi|Thriller|IMAX'-0.0741
13'Daredevil (2003)''Action|Crime'-0.0719
14'Charlie's Angels (2000)''Action|Comedy'-0.0700
15'Van Helsing (2004)''Action|Adventure|Fantasy|Horror'-0.0672
16'League of Extraordinary Gentlemen, The (a.k.a. LXG) (2003)''Action|Fantasy|Sci-Fi'-0.0670
17'Star Wars: Episode II - Attack of the Clones (2002)''Action|Adventure|Sci-Fi|IMAX'-0.0661
18'Fantastic Four (2005)''Action|Adventure|Sci-Fi'-0.0654
19'Matrix Reloaded, The (2003)''Action|Adventure|Sci-Fi|Thriller|IMAX'-0.0634
20'xXx (2002)''Action|Crime|Thriller'-0.0629
Component 9
ans = 20×3 cell
 123
1'Harry Potter and the Prisoner of Azkaban (2004)''Adventure|Fantasy|IMAX'0.0696
2'Spirited Away (Sen to Chihiro no kamikakushi) (2001)''Adventure|Animation|Fantasy'0.0696
3'Harry Potter and the Chamber of Secrets (2002)''Adventure|Fantasy'0.0773
4'Rocky Horror Picture Show, The (1975)''Comedy|Horror|Musical|Sci-Fi'0.0799
5'Nightmare Before Christmas, The (1993)''Animation|Children|Fantasy|Musical'0.0814
6'Harry Potter and the Sorcerer's Stone (a.k.a. Harry Potter and the Philosopher's Stone) (2001)''Adventure|Children|Fantasy'0.0850
7'Lord of the Rings: The Return of the King, The (2003)''Action|Adventure|Drama|Fantasy'0.0897
8'Lord of the Rings: The Two Towers, The (2002)''Adventure|Fantasy'0.0947
9'Lord of the Rings: The Fellowship of the Ring, The (2001)''Adventure|Fantasy'0.1043
10'Fifth Element, The (1997)''Action|Adventure|Comedy|Sci-Fi'0.1046
11'Dumb & Dumber (Dumb and Dumber) (1994)''Adventure|Comedy'-0.1263
12'There's Something About Mary (1998)''Comedy|Romance'-0.1177
13'American Pie (1999)''Comedy|Romance'-0.1175
14'Meet the Parents (2000)''Comedy'-0.0956
15'Austin Powers: International Man of Mystery (1997)''Action|Adventure|Comedy'-0.0907
16'Jaws (1975)''Action|Horror'-0.0817
17'Happy Gilmore (1996)''Comedy'-0.0767
18'Austin Powers: The Spy Who Shagged Me (1999)''Action|Adventure|Comedy'-0.0750
19'Rocky (1976)''Drama'-0.0738
20'Ace Ventura: Pet Detective (1994)''Comedy'-0.0735
Component 10
ans = 20×3 cell
 123
1'Princess Bride, The (1987)''Action|Adventure|Comedy|Fantasy|Romance'0.0837
2'Star Wars: Episode III - Revenge of the Sith (2005)''Action|Adventure|Sci-Fi'0.0949
3'Star Wars: Episode I - The Phantom Menace (1999)''Action|Adventure|Sci-Fi'0.0965
4'Star Wars: Episode II - Attack of the Clones (2002)''Action|Adventure|Sci-Fi|IMAX'0.1111
5'Lord of the Rings: The Return of the King, The (2003)''Action|Adventure|Drama|Fantasy'0.1140
6'Lord of the Rings: The Two Towers, The (2002)''Adventure|Fantasy'0.1245
7'Star Wars: Episode V - The Empire Strikes Back (1980)''Action|Adventure|Sci-Fi'0.1296
8'Lord of the Rings: The Fellowship of the Ring, The (2001)''Adventure|Fantasy'0.1328
9'Star Wars: Episode VI - Return of the Jedi (1983)''Action|Adventure|Sci-Fi'0.1422
10'Star Wars: Episode IV - A New Hope (1977)''Action|Adventure|Sci-Fi'0.1425
11'Titanic (1997)''Drama|Romance'-0.1113
12'Eyes Wide Shut (1999)''Drama|Mystery|Thriller'-0.0798
13'Home Alone (1990)''Children|Comedy'-0.0791
14'Home Alone 2: Lost in New York (1992)''Children|Comedy'-0.0784
15'Speed (1994)''Action|Romance|Thriller'-0.0714
16'Jumanji (1995)''Adventure|Children|Fantasy'-0.0708
17'Fly, The (1986)''Drama|Horror|Sci-Fi|Thriller'-0.0675
18'Face/Off (1997)''Action|Crime|Drama|Thriller'-0.0653
19'Final Destination (2000)''Drama|Thriller'-0.0644
20'American Psycho (2000)''Crime|Horror|Mystery|Thriller'-0.0639

Examining the Left Singular Vectors

Now we're going to check out the left singular vectors.

Exercise

Before running the code, think through the following question with your table-mates.
What might you do in order to make sense of what a particular left singular vector represents? Consider things like examining small or large values, looking for correlations, etc. There's not only one right answer, so throw out some ideas and try to think through what examining a particular aspect of the vector might tell you.
(we'll leave a little space to make it easier not to look at what we did)

Looking at Large and Small Values

Similarly to what we did for the right singular vectors, let's take a look at large (positive) and small (negative) components of each singular vector. Instead of looking at the top 10 and bottom 10, we're instead going to look at the single highest and single lowest component (each of which correspond to a user). For that user, we're going to show a sampling of movies that the user rated (focusing on the top 10 and bottom 10 ratings for that particular user). Exercise: Given what you know about the corresponding right singular vector, try to make sense of the users that are at either extreme of the left singular vectors.
for i = 1 : 10
[~, highestUserIndex] = max(U(:,i));
[~, lowestUserIndex] = min(U(:,i));
disp('');
disp(['Component ', num2str(i)]);
disp('The user with the largest component rated the following movies as high and low');
getHighAndLowUserRatings(highestUserIndex, movies, ratings)
disp('The user with the smallest (probably negative) component rated the following movies as high and low');
getHighAndLowUserRatings(lowestUserIndex, movies, ratings)
end
Component 1
The user with the largest component rated the following movies as high and low
ans = 20×2 cell
 12
1'Thor: Ragnarok (2017)'5
2'Louis C.K.: Live at The Comedy Store (2015)'5
3'Tomorrowland (2015)'5
4'Inside Out (2015)'5
5'Creed (2015)'5
6'Hunt for the Wilderpeople (2016)'5
7'Planet Earth (2006)'5
8'The Lego Batman Movie (2017)'5
9'Baby Driver (2017)'5
10'The Shape of Water (2017)'5
11'Dracula: Dead and Loving It (1995)'0.5000
12'Cutthroat Island (1995)'0.5000
13'Four Rooms (1995)'0.5000
14'Mortal Kombat (1995)'0.5000
15'Don't Be a Menace to South Central While Drinking Your Juice in the Hood (1996)'0.5000
16'Two if by Sea (1996)'0.5000
17'Bio-Dome (1996)'0.5000
18'Lawnmower Man 2: Beyond Cyberspace (1996)'0.5000
19'Fair Game (1995)'0.5000
20'Mary Reilly (1996)'0.5000
The user with the smallest (probably negative) component rated the following movies as high and low
ans = 20×2 cell
 12
1'Law Abiding Citizen (2009)'5
2'Avatar (2009)'5
3'Sherlock Holmes (2009)'5
4'Shutter Island (2010)'5
5'Inception (2010)'5
6'Expendables, The (2010)'5
7'127 Hours (2010)'5
8'Iron Man 3 (2013)'5
9'Interstellar (2014)'5
10'Furious 7 (2015)'5
11'Father of the Bride Part II (1995)'0.5000
12'Heat (1995)'0.5000
13'Sudden Death (1995)'0.5000
14'GoldenEye (1995)'0.5000
15'Dracula: Dead and Loving It (1995)'0.5000
16'Cutthroat Island (1995)'0.5000
17'Get Shorty (1995)'0.5000
18'Babe (1995)'0.5000
19'Bed of Roses (1996)'0.5000
20'Hate (Haine, La) (1995)'0.5000
Component 2
The user with the largest component rated the following movies as high and low
ans = 20×2 cell
 12
1'The Intern (2015)'5
2'Straight Outta Compton (2015)'5
3'Everest (2015)'5
4'Hotel Transylvania 2 (2015)'5
5'Creed (2015)'5
6'Finding Dory (2016)'5
7'Captain Fantastic (2016)'5
8'Hacksaw Ridge (2016)'5
9'Arrival (2016)'5
10'Rogue One: A Star Wars Story (2016)'5
11'Doom (2005)'0.5000
12'Diving Bell and the Butterfly, The (Scaphandre et le papillon, Le) (2007)'0.5000
13'M*A*S*H (a.k.a. MASH) (1970)'1.5000
14'Blair Witch Project, The (1999)'2
15'Get Him to the Greek (2010)'2
16'Puss in Boots (2011)'2
17'Cat in the Hat, The (2003)'3
18'Terminal, The (2004)'3
19'Charlie and the Chocolate Factory (2005)'3
20'Bewitched (2005)'3
The user with the smallest (probably negative) component rated the following movies as high and low
ans = 20×2 cell
 12
1'Matrix, The (1999)'5
2'Eyes Wide Shut (1999)'5
3'Lord of the Rings: The Fellowship of the Ring, The (2001)'5
4'Kill Bill: Vol. 1 (2003)'5
5'Kill Bill: Vol. 2 (2004)'5
6'No Country for Old Men (2007)'5
7'There Will Be Blood (2007)'5
8'Grand Budapest Hotel, The (2014)'5
9'Interstellar (2014)'5
10'Dunkirk (2017)'5
11'Grumpier Old Men (1995)'0.5000
12'Babe (1995)'0.5000
13'Mortal Kombat (1995)'0.5000
14'Batman Forever (1995)'0.5000
15'Casper (1995)'0.5000
16'Congo (1995)'0.5000
17'Judge Dredd (1995)'0.5000
18'Demolition Man (1993)'0.5000
19'Last Action Hero (1993)'0.5000
20'Dead Man (1995)'0.5000
Component 3
The user with the largest component rated the following movies as high and low
ans = 20×2 cell
 12
1'Star Wars: Episode V - The Empire Strikes Back (1980)'5
2'Raiders of the Lost Ark (Indiana Jones and the Raiders of the Lost Ark) (1981)'5
3'Lawrence of Arabia (1962)'5
4'Star Wars: Episode VI - Return of the Jedi (1983)'5
5'Groundhog Day (1993)'5
6'Ben-Hur (1959)'5
7'Hunt for Red October, The (1990)'5
8'Saving Private Ryan (1998)'5
9'Ronin (1998)'5
10'Goldfinger (1964)'5
11'Grumpier Old Men (1995)'1
12'Waiting to Exhale (1995)'1
13'Father of the Bride Part II (1995)'1
14'Heat (1995)'1
15'Sudden Death (1995)'1
16'Dracula: Dead and Loving It (1995)'1
17'Cutthroat Island (1995)'1
18'Money Train (1995)'1
19'Get Shorty (1995)'1
20'Copycat (1995)'1
The user with the smallest (probably negative) component rated the following movies as high and low
ans = 20×2 cell
 12
1'2046 (2004)'5
2'Old Boy (2003)'5
3'Apocalypto (2006)'5
4'Pan's Labyrinth (Laberinto del fauno, El) (2006)'5
5'There Will Be Blood (2007)'5
6'Let the Right One In (LÃ¥t den rätte komma in) (2008)'5
7'I Saw the Devil (Akmareul boatda) (2010)'5
8'Mission: Impossible - Ghost Protocol (2011)'5
9'Mission: Impossible - Rogue Nation (2015)'5
10'Mad Max: Fury Road (2015)'5
11'Congo (1995)'0.5000
12'Coneheads (1993)'0.5000
13'Demolition Man (1993)'0.5000
14'RoboCop 3 (1993)'0.5000
15'Barb Wire (1996)'0.5000
16'Jack (1996)'0.5000
17'Nutty Professor, The (1996)'0.5000
18'Batman & Robin (1997)'0.5000
19'Spawn (1997)'0.5000
20'Flubber (1997)'0.5000
Component 4
The user with the largest component rated the following movies as high and low
ans = 20×2 cell
 12
1'Mission: Impossible - Ghost Protocol (2011)'5
2'Impossible, The (Imposible, Lo) (2012)'5
3'Star Trek Into Darkness (2013)'5
4'Man of Steel (2013)'5
5'Godzilla (2014)'5
6'Guardians of the Galaxy (2014)'5
7'Star Wars: Episode VII - The Force Awakens (2015)'5
8'The Age of Adaline (2015)'5
9'The Man from U.N.C.L.E. (2015)'5
10'Dunkirk (2017)'5
11'Heat (1995)'0.5000
12'Nixon (1995)'0.5000
13'Casino (1995)'0.5000
14'Get Shorty (1995)'0.5000
15'Copycat (1995)'0.5000
16'To Die For (1995)'0.5000
17'Seven (a.k.a. Se7en) (1995)'0.5000
18'Usual Suspects, The (1995)'0.5000
19'Mighty Aphrodite (1995)'0.5000
20'From Dusk Till Dawn (1996)'0.5000
The user with the smallest (probably negative) component rated the following movies as high and low
ans = 20×2 cell
 12
1'Primer (2004)'5
2'Sideways (2004)'5
3'Incredibles, The (2004)'5
4'Battle of Algiers, The (La battaglia di Algeri) (1966)'5
5'Aviator, The (2004)'5
6'Sin City (2005)'5
7'Batman Begins (2005)'5
8'Constant Gardener, The (2005)'5
9'Lord of War (2005)'5
10'Weather Man, The (2005)'5
11'Dracula: Dead and Loving It (1995)'0.5000
12'Cutthroat Island (1995)'0.5000
13'Mr. Holland's Opus (1995)'0.5000
14'Bio-Dome (1996)'0.5000
15'Screamers (1995)'0.5000
16'Happy Gilmore (1996)'0.5000
17'Muppet Treasure Island (1996)'0.5000
18'Braveheart (1995)'0.5000
19'Down Periscope (1996)'0.5000
20'Bad Boys (1995)'0.5000
Component 5
The user with the largest component rated the following movies as high and low
ans = 20×2 cell
 12
1'Wolf of Wall Street, The (2013)'5
2'American Hustle (2013)'5
3'Interstellar (2014)'5
4'The Expendables 3 (2014)'5
5'John Wick (2014)'5
6'Nightcrawler (2014)'5
7'Mad Max: Fury Road (2015)'5
8'The Hateful Eight (2015)'5
9'Big Short, The (2015)'5
10'John Wick: Chapter Two (2017)'5
11'Toy Story (1995)'0.5000
12'Grumpier Old Men (1995)'0.5000
13'Father of the Bride Part II (1995)'0.5000
14'Sabrina (1995)'0.5000
15'Nixon (1995)'0.5000
16'Get Shorty (1995)'0.5000
17'Babe (1995)'0.5000
18'Pocahontas (1995)'0.5000
19'Mr. Holland's Opus (1995)'0.5000
20'Bio-Dome (1996)'0.5000
The user with the smallest (probably negative) component rated the following movies as high and low
ans = 20×2 cell
 12
1'Shrek 2 (2004)'5
2'Before Sunset (2004)'5
3'Finding Neverland (2004)'5
4'Charlie Brown Christmas, A (1965)'5
5'Million Dollar Baby (2004)'5
6'Hotel Rwanda (2004)'5
7'Notes on a Scandal (2006)'5
8'How the Grinch Stole Christmas! (1966)'5
9'Juno (2007)'5
10'Toy Story 3 (2010)'5
11'Natural Born Killers (1994)'0.5000
12'Stargate (1994)'0.5000
13'Ace Ventura: Pet Detective (1994)'0.5000
14'Mask, The (1994)'0.5000
15'Threesome (1994)'0.5000
16'Bad Taste (1987)'0.5000
17'Beneath the Planet of the Apes (1970)'0.5000
18'Hellbound: Hellraiser II (1988)'0.5000
19'Legend of Drunken Master, The (Jui kuen II) (1994)'0.5000
20'My Neighbor Totoro (Tonari no Totoro) (1988)'0.5000
Component 6
The user with the largest component rated the following movies as high and low
ans = 20×2 cell
 12
1'Grand Budapest Hotel, The (2014)'5
2'Captain America: The Winter Soldier (2014)'5
3'Predestination (2014)'5
4'John Wick (2014)'5
5'Big Hero 6 (2014)'5
6'Kingsman: The Secret Service (2015)'5
7'Deadpool (2016)'5
8'Tomorrowland (2015)'5
9'Inside Out (2015)'5
10'Sicario (2015)'5
11'Jumanji (1995)'0.5000
12'Grumpier Old Men (1995)'0.5000
13'Waiting to Exhale (1995)'0.5000
14'Tom and Huck (1995)'0.5000
15'Sudden Death (1995)'0.5000
16'Nixon (1995)'0.5000
17'Cutthroat Island (1995)'0.5000
18'Ace Ventura: When Nature Calls (1995)'0.5000
19'Othello (1995)'0.5000
20'Now and Then (1995)'0.5000
The user with the smallest (probably negative) component rated the following movies as high and low
ans = 20×2 cell
 12
1'Seven (a.k.a. Se7en) (1995)'5
2'Silence of the Lambs, The (1991)'5
3'Pretty Woman (1990)'5
4'E.T. the Extra-Terrestrial (1982)'5
5'Blair Witch Project, The (1999)'5
6'Boys Don't Cry (1999)'5
7'Fight Club (1999)'5
8'Coyote Ugly (2000)'5
9'Memento (2000)'5
10'Donnie Darko (2001)'5
11'Judge Dredd (1995)'0.5000
12'Naked Gun 33 1/3: The Final Insult (1994)'0.5000
13'Hot Shots! Part Deux (1993)'0.5000
14'Kingpin (1996)'0.5000
15'Maltese Falcon, The (1941)'0.5000
16'Ninotchka (1939)'0.5000
17'Jean de Florette (1986)'0.5000
18'Willow (1988)'0.5000
19'Airplane! (1980)'0.5000
20'Airplane II: The Sequel (1982)'0.5000
Component 7
The user with the largest component rated the following movies as high and low
ans = 20×2 cell
 12
1'Big Hero 6 (2014)'5
2'The Hobbit: The Battle of the Five Armies (2014)'5
3'Kingsman: The Secret Service (2015)'5
4'Mad Max: Fury Road (2015)'5
5'Star Wars: Episode VII - The Force Awakens (2015)'5
6'Avengers: Age of Ultron (2015)'5
7'Furious 7 (2015)'5
8'Kung Fury (2015)'5
9'Spectre (2015)'5
10'The Man from U.N.C.L.E. (2015)'5
11'There Will Be Blood (2007)'2.5000
12'Beautician and the Beast, The (1997)'3
13'Spice World (1997)'3
14'Cube (1997)'3
15'Blame It on Rio (1984)'3
16'Go (1999)'3
17'Adaptation (2002)'3
18'25th Hour (2002)'3
19'Barton Fink (1991)'3
20'Pieces of April (2003)'3
The user with the smallest (probably negative) component rated the following movies as high and low
ans = 20×2 cell
 12
1'Grand Budapest Hotel, The (2014)'5
2'X-Men: Days of Future Past (2014)'5
3'Ex Machina (2015)'5
4'Avengers: Age of Ultron (2015)'5
5'Avengers: Infinity War - Part I (2018)'5
6'Avengers: Infinity War - Part II (2019)'5
7'X-Men: Apocalypse (2016)'5
8'The Hateful Eight (2015)'5
9'The Handmaiden (2016)'5
10'Mission: Impossible - Fallout (2018)'5
11'Toy Story (1995)'0.5000
12'Sense and Sensibility (1995)'0.5000
13'Ace Ventura: When Nature Calls (1995)'0.5000
14'Get Shorty (1995)'0.5000
15'Babe (1995)'0.5000
16'Clueless (1995)'0.5000
17'To Die For (1995)'0.5000
18'Pocahontas (1995)'0.5000
19'Mighty Aphrodite (1995)'0.5000
20'Batman Forever (1995)'0.5000
Component 8
The user with the largest component rated the following movies as high and low
ans = 20×2 cell
 12
1'Wolf of Wall Street, The (2013)'4.5000
2'Whiplash (2014)'4.5000
3'The Revenant (2015)'4.5000
4'Rogue One: A Star Wars Story (2016)'4.5000
5'Casino (1995)'5
6'Star Wars: Episode IV - A New Hope (1977)'5
7'Carlito's Way (1993)'5
8'Star Wars: Episode V - The Empire Strikes Back (1980)'5
9'Goodfellas (1990)'5
10'Donnie Brasco (1997)'5
11'Toy Story (1995)'0.5000
12'Grumpier Old Men (1995)'0.5000
13'Father of the Bride Part II (1995)'0.5000
14'Sabrina (1995)'0.5000
15'Dracula: Dead and Loving It (1995)'0.5000
16'Sense and Sensibility (1995)'0.5000
17'Get Shorty (1995)'0.5000
18'Babe (1995)'0.5000
19'Mortal Kombat (1995)'0.5000
20'To Die For (1995)'0.5000
The user with the smallest (probably negative) component rated the following movies as high and low
ans = 20×2 cell
 12
1'Thor: Ragnarok (2017)'5
2'Guardians of the Galaxy 2 (2017)'5
3'Captain America: Civil War (2016)'5
4'Doctor Strange (2016)'5
5'X-Men: Apocalypse (2016)'5
6'Untitled Spider-Man Reboot (2017)'5
7'Batman v Superman: Dawn of Justice (2016)'5
8'The Man from U.N.C.L.E. (2015)'5
9'Wonder Woman (2017)'5
10'Incredibles 2 (2018)'5
11'Bullets Over Broadway (1994)'0.5000
12'Trainspotting (1996)'0.5000
13'Perfect Storm, The (2000)'0.5000
14'Requiem for a Dream (2000)'0.5000
15'Holiday, The (2006)'0.5000
16'Bridge to Terabithia (2007)'0.5000
17'Road, The (2009)'0.5000
18'(500) Days of Summer (2009)'0.5000
19'Time Traveler's Wife, The (2009)'0.5000
20'Kids Are All Right, The (2010)'0.5000
Component 9
The user with the largest component rated the following movies as high and low
ans = 20×2 cell
 12
1'Twilight Saga: New Moon, The (2009)'5
2'Princess and the Frog, The (2009)'5
3'Avatar (2009)'5
4'How to Train Your Dragon (2010)'5
5'Clash of the Titans (2010)'5
6'Inception (2010)'5
7'Tron: Legacy (2010)'5
8'Source Code (2011)'5
9'Puss in Boots (2011)'5
10'Prometheus (2012)'5
11'Usual Suspects, The (1995)'0.5000
12'Fair Game (1995)'0.5000
13'Young Poisoner's Handbook, The (1995)'0.5000
14'Jury Duty (1995)'0.5000
15'One Flew Over the Cuckoo's Nest (1975)'0.5000
16'Dead Poets Society (1989)'0.5000
17'Shall We Dance? (Shall We Dansu?) (1996)'0.5000
18'Out of the Past (1947)'0.5000
19'Police Academy 3: Back in Training (1986)'0.5000
20'Police Academy 4: Citizens on Patrol (1987)'0.5000
The user with the smallest (probably negative) component rated the following movies as high and low
ans = 20×2 cell
 12
1'City Slickers (1991)'5
2'48 Hrs. (1982)'5
3'Smokey and the Bandit (1977)'5
4'Mystic River (2003)'5
5'Presumed Innocent (1990)'5
6'History of Violence, A (2005)'5
7'No Country for Old Men (2007)'5
8'Taken (2008)'5
9'True Grit (2010)'5
10'Hidden Figures (2016)'5
11'From Dusk Till Dawn (1996)'0.5000
12'Showgirls (1995)'0.5000
13'Tommy Boy (1995)'0.5000
14'Widows' Peak (1994)'0.5000
15'Last Emperor, The (1987)'0.5000
16'Lupin III: The Castle Of Cagliostro (Rupan sansei: Kariosutoro no shiro) (1979)'0.5000
17'Shoot 'Em Up (2007)'0.5000
18'Sorcerer's Apprentice, The (2010)'0.5000
19'Into the Woods (2014)'0.5000
20'Straight Outta Compton (2015)'0.5000
Component 10
The user with the largest component rated the following movies as high and low
ans = 20×2 cell
 12
1'Lookout, The (2007)'5
2'Transformers (2007)'5
3'Stardust (2007)'5
4'Bourne Ultimatum, The (2007)'5
5'Superbad (2007)'5
6'Dan in Real Life (2007)'5
7'Enchanted (2007)'5
8'Juno (2007)'5
9'In Bruges (2008)'5
10'Forgetting Sarah Marshall (2008)'5
11'Ace Ventura: When Nature Calls (1995)'0.5000
12'Copycat (1995)'0.5000
13'Congo (1995)'0.5000
14'Beverly Hillbillies, The (1993)'0.5000
15'City Slickers II: The Legend of Curly's Gold (1994)'0.5000
16'Fatal Instinct (1993)'0.5000
17'RoboCop 3 (1993)'0.5000
18'Super Mario Bros. (1993)'0.5000
19'Space Jam (1996)'0.5000
20'Home Alone 2: Lost in New York (1992)'0.5000
The user with the smallest (probably negative) component rated the following movies as high and low
ans = 20×2 cell
 12
1'X-Men: The Last Stand (2006)'5
2'Pursuit of Happyness, The (2006)'5
3'Step Up (2006)'5
4'Illusionist, The (2006)'5
5'Idiocracy (2006)'5
6'Prestige, The (2006)'5
7'Blood Diamond (2006)'5
8'Shooter (2007)'5
9'Fracture (2007)'5
10'Live Free or Die Hard (2007)'5
11'City of Lost Children, The (Cité des enfants perdus, La) (1995)'0.5000
12'Shanghai Triad (Yao a yao yao dao waipo qiao) (1995)'0.5000
13'Postman, The (Postino, Il) (1994)'0.5000
14'French Twist (Gazon maudit) (1995)'0.5000
15'Misérables, Les (1995)'0.5000
16'Antonia's Line (Antonia) (1995)'0.5000
17'Hate (Haine, La) (1995)'0.5000
18'Rumble in the Bronx (Hont faan kui) (1995)'0.5000
19'Beauty of the Day (Belle de jour) (1967)'0.5000
20'Umbrellas of Cherbourg, The (Parapluies de Cherbourg, Les) (1964)'0.5000

Next Steps

To give you a sense of where you might take this in a project, here are some things you might investigate next with this dataset.
  1. We didn't really look at how you would use the SVD to make recommendations. It turns out the SVD can be used to come up with good guesses for the missing values in the original ratings matrix (the NaNs) and you can then provide recommendations based tailored for a praticular user.
  2. We didn't quantify how well the svd predicted the ratings. In order to do that, you could divide the ratings into a training and test set and see how well your SVD model can predict the test ratings (i.e., a rating set that wasn't used to compute the SVD).
  3. We filled in the missing values with the means of each movie, but there are variants of SVD that can handle the missing values directly (they do entail tradeoffs). You could inverstigate how one of those methods would work on this data.
function movieExtremes = getHighAndLowMovies(v, movies)
% return a cell array with the most positive and most negative
% components of the right singular vector v.
nHighLow = 10;
movieExtremes = cell(nHighLow*2, 3);
[c, indices] = sort(v);
movieExtremes(1:nHighLow,1) = movies(indices(end-(nHighLow-1):end),2);
movieExtremes(1:nHighLow,2) = movies(indices(end-(nHighLow-1):end),3);
movieExtremes(1:nHighLow,3) = num2cell(c(end-(nHighLow-1):end));
movieExtremes(1+nHighLow:end,1) = movies(indices(1:nHighLow),2);
movieExtremes(1+nHighLow:end,2) = movies(indices(1:nHighLow),3);
movieExtremes(1+nHighLow:end,3) = num2cell(c(1:nHighLow));
end
function userRatings = getHighAndLowUserRatings(userIndex, movies, ratings)
% return a cell array with the most positive and most negative reviews
% given by the specified user
nHighLow = 10;
userRatings = cell(nHighLow*2,2);
[r, indices] = sort(ratings(userIndex,:));
% filter out NaNs
indices = indices(~isnan(r));
r = r(~isnan(r));
userRatings(1:nHighLow,1) = movies(indices(end-(nHighLow-1):end),2);
userRatings(1:nHighLow,2) = num2cell(r(end-(nHighLow-1):end));
userRatings(1+nHighLow:end,1) = movies(indices(1:nHighLow),2);
userRatings(1+nHighLow:end,2) = num2cell(r(1:nHighLow));
end