I just Googled “Big Data” and I got
20,000,000 results. about two years ago, there was virtually nothing and now there is huge
unprecedented hype.
Big data is not a single technology or a
shortlist of vendors. It’s a loose collection of evolving tools, techniques and
talent. In practice, big data can
be divided into three categories: storage, processing and analytics.
1) Storage
Large-scale
data processing operations access data in a way that traditional file systems
are not designed for. Data tends to be written and read in large batches,
multiple megabytes at once. Efficiency is a higher priority than features like
directories that help organize information in a user-friendly way. Cloud is a very vague term, but there’s
been a real change in the availability of computing resources. Rather than the
purchase or long-term leasing of a physical machine that used to be the norm,
now it’s much more common to rent computers that are being run as virtual
instances. This makes it economical for the provider to offer very short-term
rentals of flexible numbers of machines, which is ideal for a lot of data
processing applications. Enterprise data is traditionally stored
in relational databases which are structured in tables that can join with other
tables in a carefully defined way. Big data strains this approach because there
is too much data to fit easily into big enterprise databases and many uses
require faster processing and analysis. Big data storage differs significantly
from relational databases because it stores data that has not been mapped to a
particular format or structure. By not being contained to such structure, the
data is available much more rapidly for use.
2) Processing
Processing: Mastering
the proper tools for efficient analysis under different conditions (different
data sets, varied business environments, etc.). Although current web analysts
we are undoubtedly experts at leveraging web analytics tools, most lack some
broader expertise in business intelligence and statistical analysis tools such
as Tableau, SAS, Cognos and such.
Processing big data means collecting and moving it into storage or other systems in an organized way. Big data needs to be distributed across a number of different hardware locations and is generally not in a predefined format so it requires its own approach to processing.
Batch
processing is working with data that sits in a constellation of database
clusters which are spread across hundreds or thousands of different pieces of
hardware. There are a number of frameworks to execute batch processing of data
including MapReduce and Spark. Real-time processing works on data that is “in
motion,” potentially at or near the point of data capture. Think of a marketer
being able to process behavior data from a website visitor in the moment to
serve that same visitor relevant ads, promotions or content throughout his or
her site visit.
3) Analytics
Developing
expertise in unstructured data analysis such as social media, call center logs
and emails. From the perspective of Processing, the goal should be to identify
and master some of the most appropriate tools in this space, be it social media
sentiment analysis or more sophisticated platforms.
Adoption
of specialized big data tools is still growing. Yet many analytics techniques
can and do make use of big data stores, generally by transforming the data into
structured formats first. One area of growing interest in big data analysis is
machine learning, which uses software to find patterns within large amounts of
data in ways that don’t rely on explicit programming and can surpass human
capabilities. And there is a clear opportunity for digital analysts to develop an
expertise in areas of dash boarding and more broadly, data visualization.
No comments:
Post a Comment