Structured & Unstructured
Data/ DW
Recent advances in the field of
business intelligence and analytics definitely calls for the research and
analysis of the highly complex domain of unstructured data as compared to the
conventional forms of structured data. To start the discussion, it will be
great to define these two disparate yet integrated domains of data for the
purpose of laying down the foundation for this discussion.
Structured data:
The most common form of data that is organized in the form of tables
(Rows and Columns), has well-defined data types (Alpha, numeric, alphanumeric)
without any ambiguities in precision. This form of data is quite easy to be
loaded into analytic tools without or with very little pre-processing. The
analysis with such a kind of data is quite easy in terms of effort required.
The data here is ready to be understood by the computer/machine.
Irony here is that though this has been used over decades, yet it
consists of only about 20% of data that is available to develop valuable
business insights.
Unstructured Data:
This form of data can be considered as nearly opposite to what structured
data is. It consists of unorganized data that is void of well- defined data or
data types ready to be comprehended by machine/ computers. About 80% of
valuable data is composed of this type. The most common examples can be emails,
word documents, PDF, social media data etc. A lot of effort is required to be put
in terms of extract, transform, and load to make this data ready for analysis.
It can be considered as data that is readily human comprehensible but not
machine ready. Recent developments in technology have opened up this unexplored
domain for valuable analysis.
Summary:
· Present state of data and data warehousing for analysis
The ‘industrial revolution of data’ as they coin it, is what the
organizations are witnessing globally today. Moreover, the proactive ones have
already started putting their resources to work in an effort to gain the
maximum out of it. The Big Data includes all forms of data ranging from
unstructured, semi-structured to structured data. The development of tools and
techniques is in parallel with the unveiling of the exabytes of data, storage
techniques are facing a challenge to cope up with the times. As an example, the
famous retail giant Walmart handles more than one million customer transactions
on hourly basis that are fed into the databases storing more than 2.5 petabytes.
Other technology endeavors such as IoT, Cloud implementation, security
enhancements are complimenting this abundance of data today. Yet, as per the
industry experts, this is just the beginning of an era that is yet to witness
full- blown caliber of analytics to assist in decision- making for
organizations.
Without any doubts, Data Warehousing is the perfect answer to the
analysis of the kind of data available today. The fundamentals of the concept
still remain unaltered, however development and research is a pervasive process
to leverage the powerful abilities of DW with other technology advancements.
There are tools available today that can harness data from even unstructured
data based on a set of rules/ logic and fed into the DW. A key example for today’s
state of DW can be witnessed from the way large retail giants like Amazon are
able to offer customized shopping recommendations from the browsing pattern of
their customers. The elements of dynamism and real- time data are the gold
coins, DW has the ability to dig them out in an efficient manner.
· Limitations of DW for different data types:
DW might have evolved to a great
extent as far as the structured data is concerned but it still has limitations
when it comes to unstructured data- one with more potential. One of the key
concerns for analysis of unstructured data from a DW is the cost/ benefit ratio
given the enormous resources and effort required to pre-process data to make it
DW ready. Not only the processing, but the ballooning effect in the databases
on account of massive amounts of data to be handled is another concern.
Managers are unaware of the results to expect after putting in the massive
amount of effort in handling these kinds of data and whether or not this will
enable the profit drive for their organizations. The latency on account of
pre-processing of data might hinder the ultimate objective of real time
analytics. A lot needs to be done to leverage the data warehousing capabilities
with the raw data available to gain full- blown advantages and this is a
pattern witnessed in almost all the technologies built so far.
· Future Trends in DW:
(1) More Capable Data Warehouses
The burgeoning of data volume and types calls for fine-tuned DW’s that
are versatile enough to handle these advancements. With memory requirements
increasing on the warehouses, cost factor would need to be watched out for. A
proportionate increase in cost of new system with increase in memory
requirements does not justify well. Capability also needs to be enhanced in the
areas of over-the-cloud deployment, mobile technology enabled systems that have
a location independent appeal to the most avid viewers who are the managers of
organizations to get a sneak- peek whenever, wherever required.
(2) Physical and logical consolidation for cost
control
As discussed above, to control the ballooning effect in cost, systems
would require more of logical than of physical enhancements. The logical
enhancements might come in the form of virtual systems, tightly synced
databases that have the ability to reflect the real-time situations without any
significant latencies in order to gain the most out of the data warehouses. A
significant effort needs to be put in this regard.
(3) Real- time analytics
The huge warehouses need to have the caliber to give insights to
real-time updates from the system. Time has always been an invaluable resource
to any organization, given the shrinking nature of window period for accommodating
latencies in analysis and prediction today, efforts towards reduction of
processing times from the system would need to be pushed in an integrated
manner.
(4) Cloud Integration
The developments in the cloud technology and its allied advantages call
for a complimenting effect from the data warehousing front as well. Given the
advantages of cost reduction and mobile access to data from anywhere, next step
for DW’s is definitely directed in this direction. Moreover, the fundamentals
of data warehousing have the most significant advantages aligned from this
horizon. Leveraging cloud with DW is both a win-win situation and the demand of
the times.
(5) Beyond dashboards and reports- integrating
day-to-day activities
The
dashboard and reporting abilities from the DW that are instrumental in strategy
formulation for driving the organization towards achievement of business
objectives are far too limited to harness the full potential of the DW’s. Real
advantages lie in the promising value it has in guiding everyday work of the
organizations today. Things as small as budgeting of cafeterias in the
organization would be tackled both qualitatively and quantitatively if the data
is fed in appropriate manner.
References:
Transforming the traditional data warehouse into an efficient unstructured data warehouse requires additional skills from the analyst, architect, designer, and developer. This book will prepare you to successfully implement an unstructured data warehouse and, through clear explanations, examples, and case studies, you will learn new techniques and tips to successfully obtain and analyze text.
ReplyDeleteinformatica training in chennai