DATAWAREHOUSING





Hi All ,

I am discussing about fundamentals of Data warehousing :

Any database which uses ER modelling database concept is called as OLTP system or Operational Database (OPDB). Here company business operations are insert/update/delete/select statement. Because of update and delete statement the database maintains only the current data.

Data Warehouse:
A Data warehouse is a database concept which maintains current data and historical data for reporting and analysis.
Analysis:
Report to report comparison is called as analysis. Analysis leads to planning.
If the plans are executed properly the company gets good profits.


Warehouse Rules (For Master data):
Dimensional modelling is the database theory which maintains current data and historical data using the following 6 rules.
MASTER DATA: uniquely identified data in any database is called as master data. Any table which maintains repeated data is called as TRANSACTION DATA. Master data is also called as possible changing data. A few properties of master data can be changed as the time changes. These changes can be captured in dimensional modelling theory.
RULE#1: Source primary key should not be primary key in warehouse.
RULE#2: Warehouse should be added with separate primary key as SID (Surrogate ID).
Surrogate key is not the actual key, it is the key used as instead of actual source primary key.
RULE#3: Warehouse should be added with a ‘FLAG’ column to indicate data is current or historical.
RULE#4: The dimensional table should be added with ‘VERSION’ column for versioning.
RULE#5: Target table should be added with ‘START_DATE’ column to indicate when data is loaded.
RULE#6: Target table should be added with ‘END_DATE’ column to indicate when data is modified.





About  ROLAP, MOLAP, DOLAP,OLTP,DWH..  :

1 .What is ROLAP, MOLAP, and DOLAP...?

         ROLAP (Relational OLAP), MOLAP (Multidimensional OLAP), and DOLAP (Desktop OLAP). In these three OLAP architectures, the interface to the analytic layer is typically the same; what is quite different is how the data is physically stored.
         In MOLAP, the premise is that online analytical processing is best implemented by storing the data multidimensionally; that is, data must be stored multidimensionally in order to be viewed in a multidimensional manner.
         In ROLAP, architects believe to store the data in the relational model; for instance, OLAP capabilities are best provided against the relational database.
         DOLAP, is a variation that exists to provide portability for the OLAP user. It creates multidimensional datasets that can be transferred from server to desktop, requiring only the DOLAP software to exist on the target system. This provides significant advantages to portable computer users, such as salespeople who are frequently on the road and do not have direct access to their office server.

2 .What is an MDDB? and What is the difference between MDDBs and RDBMSs?

         Multidimensional Database There are two primary technologies that are used for storing the data used in OLAP applications.
         These two technologies are multidimensional databases (MDDB) and relational databases (RDBMS). The major difference
         between MDDBs and RDBMSs is in how they store data. Relational databases store their data in a series of tables and
         columns. Multidimensional databases, on the other hand, store their data in a large multidimensional arrays.
         For example, in an MDDB world, you might refer to a sales figure as Sales with Date, Product, and Location coordinates of
         12-1-2001, Car, and south, respectively.
        
         Advantages of MDDB:

         Retrieval is very fast because
·         The data corresponding to any combination of dimension members can be retrieved with a single I/O.
·         Data is clustered compactly in a multidimensional array.
·         Values are caluculated ahead of time.
·         The index is small and can therefore usually reside completely in memory.
         Storage is very efficient because
·         The blocks contain only data.
·         A single index locates the block corresponding to a combination of sparse dimension numbers.

3. What is Mapplet and how do u create Mapplet?

       A mapplet is a reusable object that represents a set of transformations. It allows you to reuse transformation logic and can contain as many transformations as you need. Create a mapplet when you want to use a standardized set of transformation logic in several mappings. For example, if you have a several fact tables that require a series of dimension keys, you can create a mapplet containing a series of Lookup transformations to find each dimension key. You can then use the mapplet in each fact table mapping, rather than recreate the same lookup logic in each mapping.

         To create a new mapplet:
         1. In the Mapplet Designer, choose Mapplets-Create Mapplet.
         2. Enter a descriptive mapplet name.
             The recommended naming convention for mapplets is mpltMappletName.
         3. Click OK.
             The Mapping Designer creates a new mapplet in the Mapplet Designer.
         4. Choose Repository-Save.

4 . What is the difference between OLTP & OLAP?

  OLTP stand for Online Transaction Processing. This is standard, normalized database structure. OLTP is designed for Transactions, which means that inserts, updates, and deletes must be fast. Imagine a call center that takes orders. Call takers are continually taking calls and entering orders that may contain numerous items. Each order and each item must be inserted into a database. Since the performance of database is critical, we want to maximize the speed of inserts (and updates and deletes). To maximize performance, we typically try to hold as few records in the database as possible.

OLAP stands for Online Analytical Processing. OLAP is a term that means many things to many people. Here, we will use the term OLAP and Star Schema pretty much interchangeably. We will assume that star schema database is an OLAP system.( This is not the same thing that Microsoft calls OLAP; they extend OLAP to mean the cube structures built using their product, OLAP Services). Here, we will assume that any system of read-only, historical, aggregated data is an OLAP system.

OLTP system is basically application orientation (eg, purchase order it is functionality of an application)
Where as in DWH concern is subject orient (subject in the sense custorer, product, item, time)
  
OLTP
·     Application Oriented
·     Used to run business
·     Detailed data
·     Current up to date
·     Isolated Data
·     Repetitive access
·     Clerical User
·     Performance Sensitive
·     Few Records accessed at a time (tens)
·     Read/Update Access
·     No data redundancy
·     Database Size 100MB-100 GB
DWH
·     Subject Oriented
·     Used to analyze business
·     Summarized and refined
·     Snapshot data
·     Integrated Data
·     Ad-hoc access
·     Knowledge User
·     Performance relaxed
·     Large volumes accessed at a time(millions)
·     Mostly Read (Batch Update)
·     Redundancy present
·     Database Size 100 GB - few terabytes


A data warehouse(or mart) is way of storing data for later retrieval. This retrieval is almost always used to support decision-making in the organization. That is why many data warehouses are considered to be DSS (Decision-Support Systems).

Both a data warehouse and a data mart are storage mechanisms for read-only, historical, aggregated data.

By read-only, we mean that the person looking at the data won’t be changing it. If a user wants at the sales yesterday for a certain product, they should not have the ability to change that number.

The “historical” part may just be a few minutes old, but usually it is at least a day old.A data warehouse usually holds data that goes back a certain period in time, such as five years. In contrast, standard OLTP systems usually only hold data as long as it is “current” or active. An order table, for example, may move orders to an archive table once they have been completed, shipped, and received by the customer.When we say that data warehouses and data marts hold aggregated data, we need to stress that there are many levels of aggregation in a typical data warehouse.
               
5 . If data source is in the form of Excel Spread sheet then how do use?

        PowerMart and PowerCenter treat a Microsoft Excel source as a relational database,  not a  flat file. Like relational sources,  
         the Designer uses ODBC to import a Microsoft Excel  source. You do not need database permissions to import Microsoft
         Excel sources.
          To import an Excel source definition, you need to complete the following tasks:
  • Install the Microsoft Excel ODBC driver on your system.
  • Create a Microsoft Excel ODBC data source for each source file in the ODBC 32-bit Administrator.
  • Prepare Microsoft Excel spreadsheets by defining ranges and formatting columns of numeric data.
  • Import the source definitions in the Designer.
  Once you define ranges and format cells, you can import the ranges in the Designer. Ranges display as source definitions   
  when you import the source.

5 . Which db is RDBMS and which is MDDB can u name them?

       MDDB ex. Oracle Express Server(OES), Essbase by Hyperion Software, Powerplay by Cognos and
         RDBMS ex. Oracle , SQL Server …etc.

6. What is difference between view and materialized view?
Views contains query whenever execute views it has read from base table
Where as M views loading or replicated takes place only once which gives you better query performance

Refresh m views 1.on commit and 2. on demand
(Complete, never, fast, force)

7. What is bitmap index why it’s used for DWH?
a bitmap for each key value replaces a list of rowids. Bitmap index more efficient for data warehousing because low cardinality, low updates, very efficient for where class

8. What is star schema? And what is snowflake schema?
The center of the star consists of a large fact table and the points of the star are the dimension tables.

Snowflake schemas normalized dimension tables to eliminate redundancy. That is, the
Dimension data has been grouped into multiple tables instead of one large table.

Star schema contains demoralized dimension tables and fact table, each primary key values in dimension table associated with foreign key of fact tables.
Here a fact table contains all business measures (normally numeric data) and foreign key values, and dimension tables has details about the subject area.

Snowflake schema basically a normalized dimension tables to reduce redundancy in the dimension tables

9 .Why need staging area database for DWH?
Staging area needs to clean operational data before loading into data warehouse.
Cleaning in the sense your merging data which comes from different source

10. What are the steps to create a database in manually?
create os service and create init file and start data base no mount stage then give create data base command.

11 .Why need data warehouse?

A single, complete and consistent store of data obtained from a variety of different sources made available to end users in a what they can understand and use in a business context.

A process of transforming data into information and making it available to users in a timely enough manner to make a difference Information

Technique for assembling and managing data from various sources for the purpose of answering business questions. Thus making decisions that were not previous possible


12. What is difference between data mart and data warehouse?

A data mart designed for a particular line of business, such as sales, marketing, or finance.

Where as data warehouse is enterprise-wide/organizational

The data flow of data warehouse depending on the approach

13 What is the significance of surrogate key?
Surrogate key used in slowly changing dimension table to track old and new values and it’s derived from primary key.

14 .What is slowly changing dimension. What kind of scd used in your project?
Dimension attribute values may change constantly over the time. (Say for example customer dimension has customer_id,name, and address) customer address may change over time.
How will you handle this situation?
There are 3 types, one is we can overwrite the existing record, second one is create additional new record at the time of change with the new attribute values.
Third one is create new field to keep new values in the original dimension table.

15 .What is difference between primary key and unique key constraints?
Primary key maintains uniqueness and not null values
Where as unique constrains maintain unique values and null values

16 .What are the types of index? And is the type of index used in your project?
Bitmap index, B-tree index, Function based index, reverse key and composite index.
We used Bitmap index in our project for better performance.

17. A table have 3 partitions but I want to update in 3rd partitions how will you do?

Specify partition name in the update statement. Say for example
Update employee partition(name) a, set a.empno=10 where ename=’Ashok’

18 .Write a query to find out 5th max salary? In Oracle, DB2, SQL Server

Select (list the columns you want) from (select salary from employee order by salary)
Where rownum<5

19 .When you give an update statement how undo/rollback segment will work/what are the steps?

Oracle keep old values in undo segment and new values in redo entries. When you say rollback it replace old values from undo segment. When you say commit erase the undo segment values and keep new vales in permanent


20.When you give an update statement how memory flow will happen and how oracles allocate memory for that?

Oracle first checks in Shared sql area whether same Sql statement is available if it is there it uses. Otherwise allocate memory in shared sql area and then create run time memory in Private sql area to create parse tree and execution plan. Once it completed stored in the shared sql area wherein previously allocated memory


                                                                                    



No comments:

Post a Comment