The fundamental 3 “Vs” of data have been taught for decades: volume, velocity, and variety. Volume is the easiest to understand is simply the amount of data that is typically measured in multitudes of bytes: megabyte (MB), gigabyte (GB), terabyte (TB), and petabyte (PB). (EMC Education Services, 2015). I am currently engaged with a client that generates 1 TB of data week for its logistics department and nearly all of it needs to be analyzed. The solution presented to the client is to utilize Azure Data Factory to create pipelines to ingest all of the data into a blob storage account. From there, it will be used by the necessary services for further analysis – Power BI, Machine Learning, Data Warehouse, and Cognitive Services as examples of further tools that can be applied onto the data to extract insights.
Next, velocity is defined as the speed at which the data needs to be processed for consumption or landing (EMC Education Services, 2015). The typical velocity terms are, from least to highest speed: batch, periodic, near-real-time, and real-time. My experience has been with all four. Most clients are performing batch processes overnight or periodic updates throughout the day (i.e. every hour) so that data is ready for consumption when the business day begins – in these scenarios, analysts are usually performing historical reporting. I have worked with a handful of sales departments that require near-real time analysis of the sales figures throughout the day via dashboards and KPIs. As for real-time, with the current increase in IoT devices – roughly 50 billion by 2020, it is critical for organizations to react to sensor data within seconds (Srinivasan & Deepak, 2019).
Finally, variety is referring to the different forms that data can exist: tables, databases, photos, web pages, audio, video, document, mobile, and more (Rouse, 2013). My experience here too has crossed all of the data forms. Performing analysis on simple tabular and database data is the easiest, as simple excel and SQL commands can return results from the data. However, audio and video become more complex, and I will share an example from these sources in a later section of this paper.
Additional “Vs” have been added over-time and are considered important factors of consideration for leaders of organizations. Veracity is the most common fourth “V” and is defined by data in doubt (Subramanian, 2014). Data in doubt means that uncertainty arises from data due to inconsistencies – this is evident since 1 in 3 do not trust the data they are presented with by their employees. I have seen this across nearly every organization I have worked at, and I would suggest 1 in 3 is a conservative estimate.
While not as common, variability and value must also be considered key considerations for analytics (Demirkan & Dul, 2014). Variability refers to the ability for data to be interpreted in a multitude of ways, as some situations may require a different approach on how to handle data. I'll share an example from a bio-pharmaceutical client I engaged while at Deloitte. Due to different biological samples, tests, models, and scientists, the ecosystem of questions was highly variable – and data scientists must be aware of the types of scenarios they need to prep the data for consumption. Value can refer to the richness that exists when different, complex sources are combined to form deep analytic applications (Demirkan & Dul, 2014).
For example, social media data is very important for all enterprises to collect and analyze in order to monitor the customer experience. By ingesting consumer data from a wide set of complex data sources, they can add value to the organization by creating a true picture of the customer experience and make improvements by adapting quickly. This was a problem we worked on at a project at Qlik with many healthcare clients.
Resources
Demirkan , H., & Dul, B. (2014). The Data Economy: Why do so many analytics projects fail? Retrieved from http://analytics-magazine.org/the-data-economy-why-do-so-many-analytics-projects-fail/
EMC Education Services (Editor). (2015). Big Data, Data Science and Big Data Analytics: Discovering, Analyzing, Visualizing and Presenting Data. (1st Edition). John Wiley & Sons P&T: Hoboken, NJ
Rouse, M. (2013, February). What is 3Vs (volume, variety and velocity) ? - Definition from WhatIs.com. Retrieved from https://whatis.techtarget.com/definition/3Vs
Srinivasan, & Deepak. (2019, April 4). All You Need to Know About IoT in Real Time Analytics. Retrieved from https://www.latentview.com/blog/all-you-need-to-know-about-real-time-analytics-and-iot/
Subramanian, L. V. (2014). Big Data and Veracity Challenges. Big Data and Veracity Challenges. Kolkata, IN: IBM.
Comments