What is Big Data?

With the phrase ‘Big Data’ seemingly everywhere, it’s important to be aware of what it is and how it’s used.

Finding a definition to encompass the real meaning of Big Data requires a little more research than a passing glance at dictionary.com – every source describes it in their own way. However, the two core concepts behind the idea of Big Data remain unchanging across all of these perspectives: (1) volume and (2) utilization.

So, we established that Big Data obviously refers to a lot of data. But how much exactly is a lot? Philip Ashlock, Chief Architect of Data.gov, says Big Data is comprised of “… datasets with a byte size that seems fairly large relative to our frame of reference using files on a desktop PC (e.g., larger than a terabyte) …”

If you aren’t familiar with terms for data sizes, here is something that might help put things in perspective.

usb

Figure 1. 4GB USB Drive

That little USB stick can hold 4 gigabytes of data. A hard drive in the average desktop computer may store anywhere between 500 GB to 1 Terabyte, which consists of 1024 gigabytes. For the visual representation, you would need more than 250 USB drives to achieve that amount of digital storage.

And Big Data is greater than that still.

But that’s only one part of it.

A number of professionals in the tech industry refer to the idea that Big Data is not just about the specific size of the information sets, but their context as well. In 2001, Doug Laney described it as the three V’s: Volume, Velocity, and Variety. This means that beyond the massive quantity of information in the dataset, it must also be rapidly changing and varying in formats to measure up to the term ‘Big Data.’

Ultimately, Big Data is a collection of information that requires high-level programming abilities and specific methodology to glean anything of value from it. It’s still very much a growing industry, many professionals in it are still trying to devise new ways to analyze it.

That begs the question, what can create such a magnitude of information that it needs special processing? The answer is interesting: it’s people and the world around us – all of the things that make this Earth our reality.

Hilary Mason, founder of Fast Forward Labs, said “Big data is just the ability to gather information and query it in such a way that we are able to learn things about the world that were previously inaccessible to us.”

From a business perspective, Big Data can be utilized to find out root causes of issues and defects in real time, it can generate coupons at the point of sale based on a customer’s shopping habits, it can recalculate entire risk portfolios in minutes, and finally it can detect fraudulent behaviour before it affects your organization.

It doesn’t just stop there; Big Data has its uses for almost every industry.

Financial institutions can use Big Data to understand customers and minimize risk and fraud by detecting behaviour patterns. Educators can analyze Big Data to keep track of students, catch those who are at risk, and possibly create a better system for evaluation. Insurance companies may analyze Big Data and create policy based upon the information about people that it provides. Healthcare organizations may use it to manage patient records, treatment plans, and prescription information. Retailers can find out better ways to market to consumers, and manufacturers can boost quality output while keeping down waste.

An interesting way that Big Data can be used for security is to predict biological warfare attacks in real time. For example, New York City monitors the data created from their healthcare systems, e.g. emergency room admittances, to evaluate the spread of biological endemics. Based on their data collection, they can determine whether or not a significant number of people checking into the emergency room with similar symptoms is an organic spread of a virus or something that was systemic and methodical in its targeting.

However, that isn’t to say Big Data is without its faults. Our Centry CTO, Dave Ehman, said that the nature of the information that Big Data collects highlights an issue with privacy. For example, Google might know information about you based on your searches and Facebook targets you with ads based on your purchases online.

Big data can also present a significant risk if organizations do not take steps to secure and protect the massive amount of information. It could even create problems with competition, for example if a company collected data that might give them insights into their competitors’ methods, they might use it to get a leg up.

Big Data has the capacity to be immensely beneficial, hazardous, or almost irrelevant, and this is all dependent upon its handling. Proper analysis of Big Data can provide solutions to organizations and daily life, whereas pushing the envelope too much may result in compromised security. The information may even be overlooked if technological innovation cannot keep up with the demand to analyze the volume of the datasets. Above all, it represents an incredible resource to help institutions learn more about the world around them.

Sources

Centry CTO Dave Ehman

https://datascience.berkeley.edu/what-is-big-data/

https://www.sas.com/en_us/insights/big-data/what-is-big-data.html#

https://www.ibm.com/big-data/us/en/

This article was written by Kristina Weber, Content Supervisor of Centry Ltd.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s