Hi All ,
I am discussing about fundamentals of Data
warehousing :
Any database which uses ER modelling database
concept is called as OLTP system or Operational Database (OPDB). Here company
business operations are insert/update/delete/select statement. Because of
update and delete statement the database maintains only the current data.
Data Warehouse:
A
Data warehouse is a database concept which maintains current data and
historical data for reporting and analysis.
Analysis:
Report
to report comparison is called as analysis. Analysis leads to planning.
If
the plans are executed properly the company gets good profits.
Warehouse
Rules (For Master data):
Dimensional
modelling is the database theory which maintains current data and historical
data using the following 6 rules.
MASTER
DATA: uniquely identified data in any database is called as master data. Any
table which maintains repeated data is called as TRANSACTION DATA. Master data
is also called as possible changing data. A few
properties of master data can be changed as the time changes. These changes can
be captured in dimensional modelling theory.
RULE#1: Source primary key should not be primary
key in warehouse.
RULE#2: Warehouse should be added with separate
primary key as SID (Surrogate ID).
Surrogate
key is not the actual key, it is the key
used as instead of actual source primary key.
RULE#3: Warehouse should be added with a ‘FLAG’
column to indicate data is current or historical.
RULE#4: The dimensional table should be added with
‘VERSION’ column for versioning.
RULE#5: Target table should be added with ‘START_DATE’
column to indicate when data is loaded.
RULE#6: Target table should be added with ‘END_DATE’ column to
indicate when data is modified.
About ROLAP, MOLAP, DOLAP,OLTP,DWH.. :
1 .What is ROLAP, MOLAP, and DOLAP...?
ROLAP (Relational
OLAP), MOLAP (Multidimensional OLAP), and DOLAP (Desktop
OLAP). In these three OLAP architectures, the interface to the analytic layer
is typically the same; what is quite different is how the data is physically
stored.
In MOLAP, the
premise is that online analytical processing is best implemented by storing the
data multidimensionally; that is, data must be stored multidimensionally in
order to be viewed in a multidimensional manner.
In ROLAP, architects
believe to store the data in the relational model; for instance, OLAP
capabilities are best provided against the relational database.
DOLAP, is a
variation that exists to provide portability for the OLAP user. It creates
multidimensional datasets that can be transferred from server to desktop,
requiring only the DOLAP software to exist on the target system. This provides
significant advantages to portable computer users, such as salespeople who are
frequently on the road and do not have direct access to their office server.
2 .What is an MDDB? and What is the difference between MDDBs and
RDBMSs?
Multidimensional Database There
are two primary technologies that are used for storing the data used in OLAP
applications.
These two
technologies are multidimensional databases (MDDB) and relational databases
(RDBMS). The major difference
between MDDBs and
RDBMSs is in how they store data. Relational databases store
their data in a series of tables and
columns. Multidimensional
databases, on the other hand, store their data in a large multidimensional
arrays.
For example,
in an MDDB world, you might refer to a sales figure as Sales with Date,
Product, and Location coordinates of
12-1-2001, Car,
and south, respectively.
Advantages of
MDDB:
Retrieval is very
fast because
· The
data corresponding to any combination of dimension members can be retrieved
with a single I/O.
· Data
is clustered compactly in a multidimensional array.
· Values
are caluculated ahead of time.
· The
index is small and can therefore usually reside completely in memory.
Storage is very efficient because
· The
blocks contain only data.
· A
single index locates the block corresponding to a combination of sparse
dimension numbers.
3. What is Mapplet and how do u create Mapplet?
A mapplet is a reusable object that represents a set of
transformations. It allows you to reuse transformation logic and can contain as
many transformations as you need. Create a mapplet when you want to use a
standardized set of transformation logic in several mappings. For example,
if you have a several fact tables that require a series of dimension keys, you
can create a mapplet containing a series of Lookup transformations to find each
dimension key. You can then use the mapplet in each fact table mapping, rather
than recreate the same lookup logic in each mapping.
To create a new mapplet:
1. In the Mapplet Designer, choose Mapplets-Create Mapplet.
2. Enter a descriptive mapplet name.
The recommended naming convention for mapplets is mpltMappletName.
3. Click OK.
The Mapping Designer creates a new mapplet in the Mapplet Designer.
4. Choose Repository-Save.
4 . What is the difference between OLTP &
OLAP?
OLTP stand for Online Transaction Processing. This is standard,
normalized database structure. OLTP is designed for Transactions, which means
that inserts, updates, and deletes must be fast. Imagine a call center that
takes orders. Call takers are continually taking calls and entering orders that
may contain numerous items. Each order and each item must be inserted into a
database. Since the performance of database is critical, we want to maximize
the speed of inserts (and updates and deletes). To maximize
performance, we typically try to hold as few records in the database as
possible.
OLAP stands for
Online Analytical Processing. OLAP is a term that means many things to many
people. Here, we will use the term OLAP and Star Schema pretty much
interchangeably. We will assume that star schema database is an OLAP
system.( This is not the same thing that Microsoft calls
OLAP; they extend OLAP to mean the cube structures built using their
product, OLAP Services). Here, we will assume that any system
of read-only, historical, aggregated data is an OLAP system.
OLTP system is basically application orientation
(eg, purchase order it is functionality of an application)
Where as in DWH concern is subject orient
(subject in the sense custorer, product, item, time)
OLTP
· Application Oriented
· Used to run business
· Detailed data
· Current up to date
· Isolated Data
· Repetitive access
· Clerical User
· Performance Sensitive
· Few Records accessed at
a time (tens)
· Read/Update Access
· No data redundancy
· Database Size 100MB-100
GB
DWH
· Subject Oriented
· Used to analyze
business
· Summarized and refined
· Snapshot data
· Integrated Data
· Ad-hoc access
· Knowledge User
· Performance relaxed
· Large volumes accessed
at a time(millions)
· Mostly Read (Batch
Update)
· Redundancy present
· Database Size 100 GB -
few terabytes
A data warehouse(or mart) is way of storing data
for later retrieval. This retrieval is almost always used to support
decision-making in the organization. That is why many data warehouses are
considered to be DSS (Decision-Support Systems).
Both a data warehouse and a data mart are storage
mechanisms for read-only, historical, aggregated data.
By read-only, we mean that the person
looking at the data won’t be changing it. If a user wants at the sales
yesterday for a certain product, they should not have the ability to change
that number.
The “historical” part may just
be a few minutes old, but usually it is at least a day old.A data warehouse
usually holds data that goes back a certain period in time, such as five years.
In contrast, standard OLTP systems usually only hold data as long as it is
“current” or active. An order table, for example, may move orders to an archive
table once they have been completed, shipped, and received by the customer.When
we say that data warehouses and data marts hold aggregated data, we
need to stress that there are many levels of aggregation in a
typical data warehouse.
5 . If data source is in the form of Excel Spread
sheet then how do use?
PowerMart and PowerCenter treat a Microsoft Excel
source as a relational database, not a flat file. Like relational
sources,
the Designer uses ODBC to import a Microsoft Excel source. You do not
need database permissions to import Microsoft
Excel sources.
To
import an Excel source definition, you need to complete the following tasks:
- Install the Microsoft Excel ODBC driver on your system.
- Create a Microsoft Excel ODBC data source for each
source file in the ODBC 32-bit Administrator.
- Prepare Microsoft Excel spreadsheets by defining ranges
and formatting columns of numeric data.
- Import the source definitions in the Designer.
Once you define ranges and format cells,
you can import the ranges in the Designer. Ranges display as source
definitions
when you import the source.
5 . Which db is RDBMS and which is MDDB can u
name them?
MDDB ex. Oracle Express
Server(OES), Essbase by Hyperion Software, Powerplay by Cognos and
RDBMS ex. Oracle
, SQL Server …etc.
6. What is difference between view and
materialized view?
Views contains query whenever execute views it
has read from base table
Where as M views loading or replicated takes
place only once which gives you better query performance
Refresh m views 1.on commit and 2. on demand
(Complete, never, fast, force)
7. What is bitmap index why it’s used for DWH?
a bitmap for each key value replaces a list of
rowids. Bitmap index more efficient for data warehousing because low
cardinality, low updates, very efficient for where class
8. What is star schema? And what is snowflake
schema?
The center of the star consists of a large fact
table and the points of the star are the dimension tables.
Snowflake schemas normalized dimension tables to
eliminate redundancy. That is, the
Dimension data has been grouped into multiple
tables instead of one large table.
Star schema contains demoralized dimension tables
and fact table, each primary key values in dimension table associated with
foreign key of fact tables.
Here a fact table contains all business measures
(normally numeric data) and foreign key values, and dimension tables has
details about the subject area.
Snowflake schema basically a normalized dimension
tables to reduce redundancy in the dimension tables
9 .Why need staging area database for DWH?
Staging area needs to clean operational data
before loading into data warehouse.
Cleaning in the sense your merging data which
comes from different source
10. What are the steps to create a database in
manually?
create os service and create init file and start
data base no mount stage then give create data base command.
11 .Why need data warehouse?
A single, complete and consistent store of data
obtained from a variety of different sources made available to end users in a
what they can understand and use in a business context.
A process of transforming data into information
and making it available to users in a timely enough manner to make a difference
Information
Technique for assembling and managing data from
various sources for the purpose of answering business questions. Thus making
decisions that were not previous possible
12. What is difference between data mart and data
warehouse?
A data mart designed for a particular line of
business, such as sales, marketing, or finance.
Where as data warehouse is enterprise-wide/organizational
The data flow of data warehouse depending on the
approach
13 What is the significance of surrogate key?
Surrogate key used in slowly changing dimension
table to track old and new values and it’s derived from primary key.
14 .What is slowly changing dimension. What kind
of scd used in your project?
Dimension attribute values may change constantly
over the time. (Say for example customer dimension has customer_id,name, and
address) customer address may change over time.
How will you handle this situation?
There are 3 types, one is we can overwrite the
existing record, second one is create additional new record at the time of
change with the new attribute values.
Third one is create new field to keep new values
in the original dimension table.
15 .What is difference between primary key and
unique key constraints?
Primary key maintains uniqueness and not null
values
Where as unique constrains maintain unique values
and null values
16 .What are the types of index? And is the type
of index used in your project?
Bitmap index, B-tree index, Function based index,
reverse key and composite index.
We used Bitmap index in our project for better
performance.
17. A table have 3 partitions but I want to
update in 3rd partitions how will you do?
Specify partition name in the update statement.
Say for example
Update employee partition(name) a, set a.empno=10
where ename=’Ashok’
18 .Write a query to find out 5th max salary? In
Oracle, DB2, SQL Server
Select (list the columns you want) from (select
salary from employee order by salary)
Where rownum<5
19 .When you give an update statement how
undo/rollback segment will work/what are the steps?
Oracle keep old values in undo segment and new
values in redo entries. When you say rollback it replace old values from undo
segment. When you say commit erase the undo segment values and keep new vales
in permanent
20.When you give an update statement how memory
flow will happen and how oracles allocate memory for that?
Oracle first checks in Shared sql area whether
same Sql statement is available if it is there it uses. Otherwise allocate
memory in shared sql area and then create run time memory in Private sql area
to create parse tree and execution plan. Once it completed stored in the shared
sql area wherein previously allocated memory
No comments:
Post a Comment