The curse of big data

The technology marketing hype machine shouting "Big Data" has drowned out the fact that more actionable, valuable insights are likely to be found in small versus large data sets. There are a number of reasons for this phenomenon, but a major reason is the curse of big data. "Big data" means large data sets that have different properties from small data sets and requires special data science methods to differentiate signal from noise to extract meaning and requires special compute systems and power.
The curse of big data is described by Vincent Granville herehttp://www.analyticbridge.com/profiles/blogs/the-curse-of-big-data.Put simply, you will find more "statistically significant" relationships in larger data sets.  "Statistically significant" means a statistical assessment of whether observations reflect a pattern rather than just chance and may or may not be meaningful. because  big data produces more correlations and patterns between data - yet also produces much more noise than signal. The number of false positives will rise significantly. In other words, more correlations without causation leading to an illusion of reality. If you are from the generation familiar with the  terms MegaByte, GigaByte, and TerraByte, prepare yourselves for a whole new vocabulary, the likes of which your predecessors probably could not imagine. Soon terms like PetaByte, ExaByte, and ZettaByte will be as common as the aforementioned, with benefits and difficulties to match their magnitude.“Big Data” is not just a data silo, but rather this term when all the relevant part of data is used for a specific purpose. Although there should be clear boundaries between data segments that belong to specific objectives, this very concept is misleading and can undermine potential opportunities. For example, scientists working on human genome data may improve their analysis if they could take the entire content (publications) on Medline (or Pubmed) and analyze it in conjunction with the human genome data. However, this requires natural language processing (semantic) technology combined with bioinformatics algorithms, which is an unusual coupling at best.“Big Data” can be cursed by the fact that when you search for patterns in very, very large data sets with billions or trillions of data points and thousands of metrics, you are bound to identify coincidences that have no predictive power - even worse, the strongest sources might be
  • Peer to peer communication (text messaging, chat lines, digital phone calls)
  • Social Networking (Facebook, Twitter)
  • Data from scientific measurements and experiments :astronomy, physics, genetics
  • Business (e-commerce, stock markets, business intelligence, marketing, advertising)




No comments:

Post a Comment