Data Warehousing — A Basic Understanding
--
Today, most businesses thrive because of one particular factor. Yes, the quality of products, service, efficiency and various factors are crucial to ensure the future of a business. A factor that can possibly ensure success of a business today, that can decide what route a business should take to increase profits, what area to focus on and so on. That single crucial factor is — Data. Data is almost a base-ground on which all successful international businesses stand on.
This severe importance of data exists as data can tell us what customers or clients want. What items they prefer more, what faults a business can have from their perspective, which items or services are not focused on, etc. These are all essential for all businesses today. And with huge datasets, there is always unnecessary sections that do not prove useful. These datasets need cleaning. And with that in mind, this is where Data Warehousing proves useful.
A Data Warehouse is formed by collecting data from multiple heterogeneous sources and integrating them. This is done with some specific goals in mind, to make the data –
- Compatible for decision-making.
- Compatible or ad/hoc queries.
- Easier to process.
- Easier to understand!
The data from the various sources are sent to the data warehouse for OLAP, or online analytical processing, and structural analysis. This makes the storing data more space and cost efficient, and saves analysts a lot of trouble by cleaning the data to keep only the useful portions. This also saves time for the analysts or programmers who will use this data for future purposes.
The use of Data Warehouses:
- Extraction: Collecting data from various sources to store them. The condition remains that they should be heterogeneous.
- Cleaning: Removing erroneous portions of the data and correcting them, or removing unimportant or useless parts of the data to keep only the clean, useful segments.
- Transformation: Converting the data to a warehouse format. This makes it easier to mine the data for more useful outputs.
- Loading: To sort, summarize, consolidate, check the integrity and build partitions. This makes the data easier to put to use and process. Also saves time and memory.
- Refreshing: The warehouse should be in sync with the sources. To consistently update the warehouse with newer additions of data.
The key features of a Data Warehouse are:
- Volatility: It should be non-volatile regarding it’s data. This means that it should not allow any changes to the data that it stores. Otherwise, it may have effects on any system, algorithms or software that makes use of the data in the data warehouse.
- Time:
Time orientation is important for data warehousing. The data warehouse must contain large volumes of data collected over long periods of time to identify patterns or trends of significance that can be implemented for future uses. - Integration:
The collected data must be structured, and have orientation to the subject or goal in mind. It should stay relevant and consistent to the steadiness that should be accomplished with specific goals in mind.
These concepts are not seemingly too difficult to understand, but are absolutely essential before choosing to dive more deep into the world of Data Mining, Data Warehousing or Data Engineering. Below are links that have been included for better, more hands-on understanding of Data Warehousing.
References: