Big Data as Big Brother

Yesterday and today, it seems like one of the biggest news items is the Big Data of Big Brother. 

The argument goes like this:

The U.S. government is aggregating the same kind of social media and online user data that private companies use to understand their customers' sentiments (or potential customers) for the express purpose of counter terrorism, reporting on potential defense threats, and generally trying to figure out who the "bad guys" in the world are...

Using data from companies such as Apple, Google, Facebook, Microsoft, Skype, Yahoo, YouTube, and others, the National Security Agency is able to obtain all types of data (1). Now, when one considers the total size of this data it's clear to see that this exits the world of statistical analysis and enters the world of Data Science and Big Data. 

To give you an idea of the size of the data, "The amount of data in question is enormous. For example, the U.S. wireless-communications trade association C.T.I.A. estimates that as of December, 2012, there were over three hundred and twenty-six million wireless-subscriber connections, which use 2.3 trillion minutes of call time a year. Facebook has a billion users; eight hundred and thirty-three million of them are international. On average, over three hundred million photos are uploaded to Facebook’s servers per day. YouTube handles seventy-two hours of video uploads per minute." (1)

Indeed Big Brother is now in the business of Big Data.

The technical things the government is doing with Big Data involve:

- Machine Learning
- Visualization
- Sentiment Analysis
- Cluster Analysis

I've talked about Sentiment Analysis before with regards to using Twitter's API; however, what are these other technical things?

I'll leave that for next time.



