Tuesday, March 1, 2016

Structured & Unstructured Data/ Data Warehouse

Structured & Unstructured Data/ DW

Recent advances in the field of business intelligence and analytics definitely calls for the research and analysis of the highly complex domain of unstructured data as compared to the conventional forms of structured data. To start the discussion, it will be great to define these two disparate yet integrated domains of data for the purpose of laying down the foundation for this discussion.


Structured data:
The most common form of data that is organized in the form of tables (Rows and Columns), has well-defined data types (Alpha, numeric, alphanumeric) without any ambiguities in precision. This form of data is quite easy to be loaded into analytic tools without or with very little pre-processing. The analysis with such a kind of data is quite easy in terms of effort required. The data here is ready to be understood by the computer/machine.
Irony here is that though this has been used over decades, yet it consists of only about 20% of data that is available to develop valuable business insights.

Unstructured Data:
This form of data can be considered as nearly opposite to what structured data is. It consists of unorganized data that is void of well- defined data or data types ready to be comprehended by machine/ computers. About 80% of valuable data is composed of this type. The most common examples can be emails, word documents, PDF, social media data etc. A lot of effort is required to be put in terms of extract, transform, and load to make this data ready for analysis. It can be considered as data that is readily human comprehensible but not machine ready. Recent developments in technology have opened up this unexplored domain for valuable analysis.
Summary:

·      Present state of data and data warehousing for analysis

The ‘industrial revolution of data’ as they coin it, is what the organizations are witnessing globally today. Moreover, the proactive ones have already started putting their resources to work in an effort to gain the maximum out of it. The Big Data includes all forms of data ranging from unstructured, semi-structured to structured data. The development of tools and techniques is in parallel with the unveiling of the exabytes of data, storage techniques are facing a challenge to cope up with the times. As an example, the famous retail giant Walmart handles more than one million customer transactions on hourly basis that are fed into the databases storing more than 2.5 petabytes. Other technology endeavors such as IoT, Cloud implementation, security enhancements are complimenting this abundance of data today. Yet, as per the industry experts, this is just the beginning of an era that is yet to witness full- blown caliber of analytics to assist in decision- making for organizations.


Without any doubts, Data Warehousing is the perfect answer to the analysis of the kind of data available today. The fundamentals of the concept still remain unaltered, however development and research is a pervasive process to leverage the powerful abilities of DW with other technology advancements. There are tools available today that can harness data from even unstructured data based on a set of rules/ logic and fed into the DW. A key example for today’s state of DW can be witnessed from the way large retail giants like Amazon are able to offer customized shopping recommendations from the browsing pattern of their customers. The elements of dynamism and real- time data are the gold coins, DW has the ability to dig them out in an efficient manner.

·      Limitations of DW for different data types:
DW might have evolved to a great extent as far as the structured data is concerned but it still has limitations when it comes to unstructured data- one with more potential. One of the key concerns for analysis of unstructured data from a DW is the cost/ benefit ratio given the enormous resources and effort required to pre-process data to make it DW ready. Not only the processing, but the ballooning effect in the databases on account of massive amounts of data to be handled is another concern. Managers are unaware of the results to expect after putting in the massive amount of effort in handling these kinds of data and whether or not this will enable the profit drive for their organizations. The latency on account of pre-processing of data might hinder the ultimate objective of real time analytics. A lot needs to be done to leverage the data warehousing capabilities with the raw data available to gain full- blown advantages and this is a pattern witnessed in almost all the technologies built so far.

·      Future Trends in DW:

(1)    More Capable Data Warehouses
The burgeoning of data volume and types calls for fine-tuned DW’s that are versatile enough to handle these advancements. With memory requirements increasing on the warehouses, cost factor would need to be watched out for. A proportionate increase in cost of new system with increase in memory requirements does not justify well. Capability also needs to be enhanced in the areas of over-the-cloud deployment, mobile technology enabled systems that have a location independent appeal to the most avid viewers who are the managers of organizations to get a sneak- peek whenever, wherever required.

(2)    Physical and logical consolidation for cost control
As discussed above, to control the ballooning effect in cost, systems would require more of logical than of physical enhancements. The logical enhancements might come in the form of virtual systems, tightly synced databases that have the ability to reflect the real-time situations without any significant latencies in order to gain the most out of the data warehouses. A significant effort needs to be put in this regard.

(3)    Real- time analytics
The huge warehouses need to have the caliber to give insights to real-time updates from the system. Time has always been an invaluable resource to any organization, given the shrinking nature of window period for accommodating latencies in analysis and prediction today, efforts towards reduction of processing times from the system would need to be pushed in an integrated manner.

(4)    Cloud Integration
The developments in the cloud technology and its allied advantages call for a complimenting effect from the data warehousing front as well. Given the advantages of cost reduction and mobile access to data from anywhere, next step for DW’s is definitely directed in this direction. Moreover, the fundamentals of data warehousing have the most significant advantages aligned from this horizon. Leveraging cloud with DW is both a win-win situation and the demand of the times.

(5)    Beyond dashboards and reports- integrating day-to-day activities
The dashboard and reporting abilities from the DW that are instrumental in strategy formulation for driving the organization towards achievement of business objectives are far too limited to harness the full potential of the DW’s. Real advantages lie in the promising value it has in guiding everyday work of the organizations today. Things as small as budgeting of cafeterias in the organization would be tackled both qualitatively and quantitatively if the data is fed in appropriate manner.

References:

1 comment:

  1. Transforming the traditional data warehouse into an efficient unstructured data warehouse requires additional skills from the analyst, architect, designer, and developer. This book will prepare you to successfully implement an unstructured data warehouse and, through clear explanations, examples, and case studies, you will learn new techniques and tips to successfully obtain and analyze text.

    informatica training in chennai

    ReplyDelete