🎥 Netflix Visualized 🍿


Director
Stats

Median # Connections: 0

Highest # Connections: 0

How does my dashboard answer the questions?

  1. Your boss wants to know the number of titles per genre on Netflix.
  2. The bar graph at the top displays the number of titles of each film or TV Show on Netflix, with the scale of the bars proportional to the number of titles. This is ordered by highest to lowest, so you can easily see the ranking of genres and the top genres represented on Netflix. The exact number is provided next to the bar for clarity.

  3. Your boss wants to understand the average runtime of movies by release year.
  4. I created a line graph/scatter plot hybrid that plots the average runtime of Netflix films released per year, in the order of release year so you can assess how the average runtime has changed over time. The dots represent the average runtime of a particular year, and can be hovered over for the exact runtime data, allowing you to easily investigate the average runtime of movies by release year.

  5. We want to learn about the cast and directors.
  6. The network graph allows you to explore the connections between actors for all films on Netflix directed by a particular director. Without the director filter, the data set is far too large for a user to look through and learn from, so the filter allows you to learn about the actors that starred in that director’s films, as well as all the connections between the actors (where a connection between actors A and B is defined as a film that A and B both starred in). You can hover over links to learn about the connection, and hover over the nodes to learn about the actor & their films. Directors with fewer films were omitted for sake of clarity.

How did D3 improve these visualizations?

  1. Tooltips allow you to show additional information only when you want to see it, creating greater access to information with less-crowded and easier-to-read graphs.
  2. D3 allows the graphs to be interactive, meaning that the user can choose a director on the dropdown, and have the graph live-update to that change. This opens up many possibilities and allows your users to better engage with the data!
  3. D3 allows you to create scalings and special styling that make the graphs not only more aesthetically pleasing, but also help display your data more effectively. For example, the sizes of my nodes on the network graph is reflective of the number of connections they have, giving users a quick sense of which actors have the most connections.

Why would you choose not to use D3 for your visualization?

  1. Time. D3 takes much longer to code than quick matplotlib visualizations. When you quickly want to visualize your data to see what it looks like, or try a few different graphs to see what works, D3 won’t be the best option!
  2. Simplicity. D3 excels at interactive visualizations, but when you just need a simple graph that displays the gender breakdown of a student body or the distribution of test scores, you can create a perfectly clear and simple graph that represents your data well using Excel or matplotlib.
  3. Knowledge. D3 is difficult to both learn and master. If you aren’t familiar with JavaScript or coding in general, it will be easier and more efficient to use automatic graph generation tools such as Excel.

Give two different ways in which graphs may confuse or mislead viewers. What are ways to avoid this or fix these issues?

  1. Omitting the baseline will confuse viewers because it will skew their sense of scale and over exaggerate the differences between numbers. This can be fixed by adding the baseline in, so, for example, starting the axis at 0% rather than 60%, so you can see the entire 0-100% scale.
  2. Going against convention will mislead viewers because many people will not read the legend first; they will look at the graph and get an immediate, incorrect impression. People are used to certain conventions (like small circles on a map representing low numbers of COVID-19 cases), so if you do the opposite, it can lead to confusion. You can avoid this by researching common conventions ahead of time, and then using those conventions to the best of your abilities in your graphs.

Example of good visualization

  • This visualization is excellent because it transforms a complicated topic (relationships between many Syrian groups) into a simple, intuitive table that lets you A) easily find the relationship between two groups, and B) gives you an overall impression of each group’s relationships (you can look at any column to see whether they have more angry faces or smiley faces). This visualization benefits from its clean and simple grid view, its intuitive emojis, and its interactivity (you can hover over any emoji to highlight the two Syrian groups that it represents and learn a bit more about their history).

Example of bad visualization

  • This visualization is poor because it cuts the circle equally in half, giving readers an immediate impression of equality, while it actually intends to show the inequality in COVID-19 deaths among gender. Even after looking at the percentages and realizing that they are unequal, it is still hard to gauge what 63% vs 37% looks like (which is the point of data visualization!!). This visualization would have fared better with a simple pie chart that clearly demonstrates the portion of COVID-19 deaths that were male vs the portion that were female.

Extra Credit

  1. I implemented a dynamic stats calculation for the network graph. I dynamically calculate both the median number of connections and the top number of connections for the set of nodes displayed in the network. When the user updates the selected director, the statistics are recalculated and displayed. These statistics are important to give users an overview of the data they are seeing, with context on the median and the top number.
  2. I implemented a flow-type network graph to address question 3.