Data Science: the art of enhancing data
Data centricity is becoming a priority for all companies. Information is now available anytime anywhere; it represents an unaccountable asset that empowers whoever possesses it.
Data science is a cross-cutting discipline. It refers to the science of manipulating data using techniques and methods from several disciplines such as computer science, mathematics, and statistics, to transform a piece of data into useful information to achieve certain goals.
Data science was first mentioned when the amount of data produced by the information society became so large that new techniques and new methodologies for storage, processing and analysis were required.
The main object of Data Science is big data. Big data is everywhere and is generated and exchanged by connected devices and related software (Internet of Things). Its importance is associated not so much to the quantity, but mainly to the use that is made of it. Data are managed to obtain information that can:
- Reduce costs
- Shorten timelines
- Develop new products and optimize offerings
- Make informed decisions
According to some data reported by the Big Data & Business Analytics (MIP) observatory, in Italy banks have been the first sector in terms of analytics market share (28%), followed by manufacturing (24%), telco and media (14%), services (8%), large-scale retail trade (7.5%), insurance (7%), utilities (6.5%), PA and healthcare (5%).
The growing trend of Big Data and the 3 V model
According to Doug Laney, the data growth model is three-dimensional: as time goes by, data increases in volume, velocity, and variety.
- Volume represents the amount structured or unstructured data that are generated from heterogeneous sources such as databases, sensors, email, social media.
- Variety, many different types of unstructured or semi-structured data such as text, web server logs, images, video, audio, computational elements have been added to the structured data contained in databases.
- Velocity, with which new data are generated and to which correspond short times of collection and analysis.
Subsequently, Laney's model was redefined with the addition of two other important variables:
- Veracity, which is used to ensure the reliability of data analysis results that underlie business decisions.
- Variability, which is the wide variety of formats and provenance, which can lead to errors in understanding data at the time of interpretation.
Big Data is defined "the new oil”. This expression is used to highlight an invaluable source of value, which some define as the sixth V of the model.
In order to extract value from Big Data, challenging projects are required for data collection and analysis. These projects should be preceded by an assessment of the actual value brought to the business.
During the pandemic, the "hunger for data" has increased. That is the reason why a strong data-driven culture will be required.