FINAL- MMC4936

Standard

Problem:

The problem I chose to take on for my final project in Visualization of Big Data, was comparing the safety rates of four major Florida colleges in comparison of the number of students enrolled.

“Does the number of students enrolled have any impact on the crime rate for that school?”

Description:

The colleges I chose to compare consisted of the following schools:

  • USF (of course)
  • UCF
  • UF
  • FSU

I decided on these four schools because the first two are a locally known rivals, while the other two schools are very large, more recognized schools.

I went onto data.gov and figured out the data set. The data compiled was very informative. I found most of the data on this webpage:

Screen Shot 2016-11-30 at 7.40.38 PM.png

I then worked into compiling that data given into a bar graph so it would be easier to understand. Below is a bar graph comparing the number of students enrolled in each college for visual comparison.

screen-shot-2016-11-30-at-7-48-27-pm

****Related Work: After figuring out my problem, description, and finally, gathering all information needed/ data, I remembered our instruction of plot.ly. Plot.ly was my main application for making graphs to use for the final project.

Possible Outcomes:

  • The greater the number of students, the greater number of crimes.
    • Known formally as Complementary 
  • The greater the number of students, less crime would be committed.
    • Known formally as Substitutes 
  • The number of students has absolutely no impact on the amount of crime committed.
    • Known formally as Independent

Below is a graph I composed from some of the information found on this website.

screen-shot-2016-11-30-at-7-53-38-pm

Along the right side of the graph, you will find the legend. The graph features the statistics of the crimes rape, burglary, motor vehicle theft, and aggravated assault.

RESULT:

While the data was not necessarily conclusive to being “complementary”, I do feel as though there is a strong lean towards it being complementary, rather than something else. UCF had the largest number of students, by almost 10,000, yet FSU took the cake for crimes due to having the highest number of burglaries. The evidence does prove to be complimentary for USF. It had the lowest number of total crimes calculated, and USF had second to last on the enrollment list, but only about 700 students.

**Final screenshot of my written out work**

screen-shot-2016-11-30-at-8-13-47-pm

GGPLOT2

Standard

After playing around with some codes, I *FINALLY* figured out how to make a bar graph. I messed with the code placement, and was able to edit which side the Good & Bad data I had inputted! Below is the final graph with all of the coding I had done.

Screen Shot 2016-11-23 at 7.40.01 PM.png

 

Module 8- Plot.ly

Standard

I used Plot.ly to run statistical analysis on the data set that we were provided. This is one of my first experiences with trying to find any relationships through statistical analysis of data. I found that it was somewhat confusing.

After entering the data into Plot.ly I ended up with this correlation, after LOTS of playing with the examples.

columncorrelationgrades

I ran descriptive statistics on the grades column, after editing the chart so that I could get some some type of answer, and I ran a column correlation between grades and popularity. I wanted to see if there was a relationship between the amount of people who answered that they cared most about grades and those who cared most about popularity. The test yielded the result of .64 which I read as a 64% correlation between the two columns. I ran these sort of analysis on all of the data and some columns showed more of a relationship then others, but overall I’m not entirely sure what the relationships mean. I think that 64% of people prefer popularity over grades.

Overall this module was a good exercise in starting to understand statistical analysis and running data through the Plot.ly program.

Module # 13 Visualization Design Schema

Standard

This week for our assignment, I chose to edit this pie chart I had previously made, “Facebook Followers of Celebs- 2016.”

screen-shot-2016-10-18-at-2-53-43-pm

I decided to pick this one to revise, because it was simple, and had some of the key components talked about in the Evergreen and Emery (2014) PDF. After reading through some of the things that make for a good visualization, I have since made some corrections to my graph. The first major correct, was fixing the color scheme. Someone who may be looking at a graph of this sort, especially one about celebrities and Facebook, would be a high schooler, more than likely, a girl. So the first thing I did was make my graph look better to a younger female. I also changed my title from “Facebook Followers of Celebs- 2016” to “Visualization of Popular Celeb’s Facebook Followers in 2016”. I felt that by changing the title, it would allow me to save myself from using a subtitle. I also thought it would look better if instead of keeping the pie graph whole, just break up the “slices” by giving them a little white space, to signify that each slice is a different celebrity, and show that not everything in the pie graph is related. This is not a full representation of every celebrity on Facebook, just a select few. Also, by leaving space between the slices, it allowed the graph to flow better with the background, something mentioned in our reading. Below is my corrected graph, “Visualization of Popular Celeb’s Facebook Followers in 2016”.

screen-shot-2016-11-13-at-12-06-24-pm

I also felt it was appropriate, for this graph, to include the name, the number of followers, and the percent in comparison to everyone else included in the graph. I think that the Evergreen and Emery strategy for data visualization is brilliant. I like that they use numbers to calculate if you have a good chart or not. I think that this would be very helpful to many people, myself included, when making a data visualization. I do not agree with all of the statements, such as avoiding colors for colorblindness (I find red and green to be very strong colors) and I don’t agree with not having a 3D data visualization. Every visualization is different, but overall, I liked their checklist!

 

Module # 12 Animation and Shiny in R

Standard

I chose to do option #2 for this assignment. I googled examples in RStudio Shiny and was lead to this example. The example provided below shows the interactive animation made by someone, to showcase the bus routes in Minneapolis.

screen-shot-2016-11-01-at-6-33-10-pm

In this example, on the right side of the page, you can scroll through the different routes, and only show certain busses (Northbound, Southbound, Eastbound, Westbound). A key on the bottom shows the color of each bus, and where the stops are along the route. When you change the route, as seen in example below, you can see where those buses are located.

screen-shot-2016-11-01-at-6-38-23-pm

Another example I found included a graph done on telephones by region (below).

screen-shot-2016-11-01-at-6-42-54-pm

This example has a menu that will change the entire bar graph, by showcasing different regions. On the right side of the page, you can see what was changed in the coding.

Screen Shot 2016-11-01 at 6.46.21 PM.png

After looking at two of the examples, I can completely see how this would play an essential part in real life. The bus route example could be used in an app on a smartphone, so one would be able to see if the bus will be coming soon. The animation of RStudio Shiny is very intriguing and complex!

Module #9: Intro to Rstudio

Standard

This was my first attempt at ever using Rstudio. After looking up tutorials online, this really is not that difficult to understand. It brings me back to the coding I used to do for Myspace. I chose to make both a pie chart and a bar graph. Here is a screen shot of the code I used to make the bar graph, as well as a snap of the graph itself.

screen-shot-2016-11-01-at-12-03-35-pmscreen-shot-2016-11-01-at-12-03-42-pm

I had more fun making the pie chart, simply due to the fact I figured out that there is a rainbow option! As you see below I used code col=rainbow to get the rainbow pattern.

screen-shot-2016-11-01-at-12-00-23-pmscreen-shot-2016-11-01-at-12-00-08-pm

Using Rstudio has been a fun way for me to produce graphs! The codes I found online through R is Not So Hard! tutorials have proved to be very user friendly.

Module #7- Plot.ly

Standard

This week our assignment was to explore using the Plot.ly software. Overall, I thought this was very similar to messing with the Google Fusion Assignment, but I found Plot.ly to be much smoother and easier to operate around on.

I was surprised how easily finding the Mean, Median, and Standard Deviation was.

Mean: FB- 55303632.375 Twitter-36042208.5

Median: FB- 57963191 Twitter- 37133201

Standard Deviation: FB- 15979901.476981508 Twitter- 7783594.278588524

Screen Shot 2016-10-18 at 2.53.43 PM.png

Here is the link to the graph above I made of the Facebook Followers. I made a pie graph that shows the comparative percentages of Facebook followers. It appears that Rihanna has the most Facebook followers. This came as a surprise to me. I would have thought Taylor Swift or Obama would have beaten Rihanna, although Rihanna is internationally popular.

Screen Shot 2016-10-18 at 3.02.28 PM.png

After analyzing this graph (here is the link), it is clear to see that Justin Bieber has the most Twitter followers, followed closely by Katy Perry. This chart (bar graph) was much easier to make. I even figured out how to mess around with the colors and fonts! I was SHOCKED to see Rihanna down so far in Twitter followers, when she had the MOST Facebook followers. It just goes to show that each social media platform can attract a different crowd of a celeb’s followers. Some may be only on one platform, either Twitter or Facebook. I personally rarely use my Twitter other than for my job. Facebook is my “go-to” social media platform.

I really enjoyed this project and see myself using Plot.ly for visual explanation of online research I do at my job, in the near future!