Unstructured data is simply data that does not fit into a structured manner – such as a table of records with rows and columns: text files, multimedia files, financial data, real estate paperwork, and more. (EMC Education Services, 2015). All companies are gathering an increasing number of unstructured data points every day and 80-90% of the total growth of data is coming from these non-structured sources.
My work with clients has been across many industries and each client has had use cases for how to build a platform where unstructured data can provide meaningful value. Financial Services and Insurance companies ingest a considerable amount of unstructured data. These clients are ingesting an incredible amount of unstructured files in the forms of text files. The Microsoft approach has been to ingest these and dump all of the files into a blob storage with a defined hierarchical folder structure to assist in organizing the unstructured data for down-stream analytics.
On one particular project in my past, a leading telecommunications company, had the majority of its unstructured data came in the form of multimedia, text files, and audio files. Deloitte Consulting was engaged to architect a solution that would move its on-premise infrastructure onto Amazon Web Services. We presented worst, expected, and best-case scenarios for the organization's 5-year total cost of ownership if they transitioned onto the cloud and decommissioned on-premise hardware and software. For the audio and video files, they need cognitive services to add accessibility functionalities. Another example I can provide from this client is that of the audio generated from customers using the remote control to speak to it. These audio files were small, but the volume of files continually grew as more and more people spoke commands to the remote’s microphone. These audio files were transcribed into text, which was saved as JSON files in blob storage for down-stream processing.
Resource
EMC Education Services (Editor). (2015). Big Data, Data Science and Big Data Analytics: Discovering, Analyzing, Visualizing and Presenting Data. (1st Edition). John Wiley & Sons P&T: Hoboken, NJ
Comments