I have had course to listen to a number of
debates about the difference between Data Analytics, Data Analysis, Data
Mining, Data Science, Machine Learning, Data Engineer and Big Data but if I
have to summarize machine learning in one sentence, I would say it is a
collection of algorithms and techniques used to design systems that learn from
data.
Now with the exception of big data they're all subsets of mathematics.
The names just change to reflect the domain or as a branding exercise to try
and say they're doing something different when at a fundamental level they're mostly
not.
Again the algorithms of ML are very general in the
sense usually they have a strong mathematical and statistical basis that does
not take into account domain knowledge and data pre-processing.
Here
are the key differences:
1. Software engineering.
There are many people in the technology
industry doing data science at scale who would simply call themselves software
engineers. There are also a significant number of engineers who transitioned
internally into data-focused roles, and have invested significant time
improving their statistics skills. I see quite a few people with this
background who have math degrees.
2. Quantitative advanced degrees.
Many data scientists transitioned from a MS or
PhD in Statistics, Electrical Engineering, Physics, Mechanical Engineering,
Bioinformatics, Chemical Engineering, or similar into a data science role.
These disciplines have strong common foundations and utilize many overlapping
techniques.
3. Data analysis.
A growing number of data scientists worked
first in data analysis and/or predictive modeling, and then picked up machine
learning and improved software engineering skills required to move into a data
science role. These individuals tend to also have quantitative backgrounds.
Data Analysis, Data Mining, Machine Learning and Mathematical Modeling are
tools: means towards an end. Analytics, Business Intelligence, Econometrics and
Artificial Intelligence are application areas: domains that use the tools above
(and others) to produce results within its subject. Among them, Analytics is
probably a more generic term (i.e. non domain-specific).
No comments:
Post a Comment