Thursday, 26 June 2014

Types of Schemas in dimensional modeling:


 Below are the types of schema in dimensional modeling with their advantages and disadvantages:-

Star schema :-
A logical structure that has a fact table containing factual data in the center, surrounded by dimension tables containing reference data, which can be denormalized

Advantage
Improve query performance for often-used data
Easy to define
Reduces number of physical joins

Disadvantage
High overhead of redundant data
Inappropriate when additional data is not accessed often.

Snowflake schema
A variant of the star schema where dimension tables do not contain denormalized data
It allows dimensions to have dimensions

Advantage
Provides best performance when queries involve aggregation

Disadvantage
Maintenance is complicated
Number of tables involve are more
More joins are needed

Starflake schema
A hybrid structure that contains a mixture of star and snowflake schema

What is Datamart ? why we need it? Types of Datamarts


Definition of datamart:–

Datamart is a subset of a Data Warehouse that supports the requirements of a particular department or business function.
Data Mart is specialized single-line of business warehouses. Data Mart collects data for a single department or a specific group of people.


Below are the reasons to create Data Marts :-

1. Provide users access to data the need to analyze most often
2. Provide data in a common format for a group of users
3. Improve end user response time
4. Provide appropriately structured data based on the requirements of end user access
    tools
5.Less complex implementation and setup
6.Reduced cost
7.Easier to support users


Types of Datamarts


There are three types of Datamarts mentioned below :-

1. Multidimensional Datamarts
2. Relational Datamarts
3. Networked Datamarts


Multidimensional Datamarts:-

Contain only numeric data
Maintain structure as data enters the framework
Only changed cells are updated and not the others
Contain sparsely populated matrices

Relational Data Marts :-

Contains numeric and textual both the data
Support star schemas/snow flake schema
Maintain structured data
Employ indexes while creating datamarts

Networked Data Marts:-

Data needs to be cleansed and standardized
Distributed data marts can be accessed from a single workstation
A standard metadata directory must exist in it
 

Data Mining


Data mining is a process of extracting hidden trends within a datawarehouse.
Example –An insurance datawarehouse can be used to mine data to identify the most high risk buildings to insure in a certain geographical area .
Data Mining also involves analyzing moderate to large amounts of detailed historical data to detect behavioral patterns that are then used to predict future behavior.
Using Data Mining, organization can able to find the answer of the questions like-Which customers are most likely to leave?

Advantages of Data Ware Housing


Below are the advantages of data warehousing:-
Increase customer profitability
Increase customer profitability
Cost effective decision making
Manage customer and business partner relationships
Manage risk, assets and liabilities
Integrate inventory, operations and manufacturing
Improving turnaround time for data access and reporting to locate, access, and analyze information ,for example –To link multiple locations and geographies
Identify developing trends and reduce time to market
Strategic advantage over competitors

Data Warehouse lifecycle


Below is the Data Warehouse project lifecycle:-
• Project Planning
• Business Requirement Definition(SRS)
• Technical Architecture Design
• Dimensional Modeling
• Physical Design (Data Base designing)
• Data Staging Design and Development (ETL)
• Analytic Application Specification Design and Development (OLAP)
• Testing and Production Deployment
• Maintenance and Support

Components of datawarehouse?

Below are the components of a data warehouse:-

Source databases/systems (Example - Legacy, relational, text or external sources)
Data extraction/transformation/load (ETL) tool (Example –Informatica, Datastage)
Data warehouse maintenance and administration tools
Data modeling tool or interface to external data models (Example –Erwin)
Warehouse databases (Example-Teradata, Oracle, and MySql)
End-user data access and analysis/reporting tools (Example –OBIEE, Business Objects)

What is datawarehousing ?

Data Warehousing is a process to properly assemble and manage data from various sources to answer business questions not previously possible or known. A Data Warehouse has a centrally located logical architecture which minimizes data synchronization hence provides a single view of the business. A Data Warehousing is a subject-oriented, integrated, time-variant, and non-volatile collection of data in support of management’s decision-making process.

Global Temporary tables in teradata

About GLOBAL TEMPORARY: 

You are working in a Teradata database application. 
You have to CREATE a table and insert some records for performance analysis. 
The table has several columns and you have to DELETE the content once you have done with your job. 
But you don't want to drop the table since it is being used daily for the analysis. 

What can be done for this? 

Teradata gives GLOBAL TEMPORARY option with CREATE TABLE syntax. 

What special ? 

If you create a table with GLOBAL TEMPORARY option, the life of the CONTENTS of the table will be only for the current session. 
They will be automatically deleted by Teradata manager once session expires. (Only the content not definition) 

Syntax: 
  1. CREATE GLOBAL TEMPORARY TABLE table_1
  2. (
  3. column1 datatype,
  4. column2 datatype,
  5. .
  6. .
  7. columnN datatype
  8. );



  1. CREATE GLOBAL TEMPORARY TABLE employee
  2. (
  3. ID INT,
  4. Name VARCHAR(20)
  5. );


Interesting about GLOBAL TEMPORARY: 

Now, you are creating a GLOBAL TEMPORARY table and inserting some records successfully into it. 
SELECT * from the GLOBAL TEMPORARY table. Shocked ? ... 
Yes. You cannot see the records because Teradata becomes strict to GLOBAL TEMPORARY tables by incorporating ON COMMIT DELETE ROWS by default to add more meaning to TEMPORARY. 

so, if you want to see the records, you have to explicitly write ON COMMIT PRESERVE ROWS immediately after CREATE TABLE syntax. 

  1. CREATE GLOBAL TEMPORARY TABLE table_1
  2. (
  3. column1 datatype,
  4. column2 datatype,
  5. .
  6. .
  7. columnN datatype
  8. ) ON COMMIT PRESERVE ROWS;



More about GLOBAL TEMPORARY: 

1. You can have 2000 volatile tables in an active session. 
2. REFERENTIAL constraints are not allowed. 



Here GLOBAL TEMPORARY says, DO and GO.. i will PRESERVE and take care... 

..

Volatile table in teradata

You are working in a Teradata database application. You have to CREATE a table and insert some records for performance analysis. You have to DROP the table once you have done with your job. If this is the case, will you think a smart way for this?

Yeah. Teradata gives VOLATILE option with CREATE TABLE syntax. 

What special ?

If you create table with VOLATILE option, the life of the table will be only for the current session.
It will be automatically dropped by Teradata manager once session expires. (Both definition and content)

Syntax:
  1. /* Volatile Table Syntax - Teradata */
  2. CREATE VOLATILE TABLE table_1
  3. (
  4. column1 datatype,
  5. column2 datatype,
  6. .
  7. .
  8. columnN datatype
  9. );



  1. CREATE VOLATILE TABLE employee
  2. (
  3. ID INT,
  4. Name VARCHAR(20)
  5. );


Interesting about VOLATILE:

Now, you are creating a VOLATILE table and inserting some records successfully into it.
SELECT * from the volatile table. Shocked ? ...
Yes. You cannot see the records because Teradata becomes strict to VOLATILE tables by incorporating ON COMMIT DELETE ROWS by default to add more meaning to VOLATILE.

so, if you want to see the records you have to explicitly write ON COMMIT PRESERVE ROWS immediately after CREATE TABLE syntax.

  1. CREATE VOLATILE TABLE table_1
  2. (
  3. column1 datatype,
  4. column2 datatype,
  5. .
  6. .
  7. columnN datatype
  8. ) ON COMMIT PRESERVE ROWS;



More about VOLATILE:

1. You can have 1000 volatile tables in an active session.
2. CHECK and REFERENTIAL constraints are not allowed.
3. DEFAULT clause not allowed.


Here VOLATILE says, DO and GO.. i will take care...

COALESCE

COALESCE is used to check if the argument is NULL, if it is NULL then it takes the default value.

It will check for NOT NULL values sequentially in the list and it will return the first NOT NULL value.

Syntax:
  1. COALESCE(Argument list,['default value'])

Argument list - May be columns of a table or expressions

COALESCE returns NULL if all its arguments evaluate to null. 
Otherwise, it returns the value of the first non-null argument in the given expression list.
Each COALESCE function must have at least two operands. Either two columns or one column with one default value.

It is possible to have multiple columns under coalesce like below:
  1. COALESCE(col1, col2, col3, 0)

The above code says that if col1 is null then it will check col2.
If col2 is null then it will check for col3.
If col3 is also null, then 0 will be the output.

If coalesce encounters any of the column having not null value then the column value will be returned by the query. (Left to right priority)

Example : 1
If the country takes NULL value, 'US' will be supplied by Teradata
  1. SELECT COALESCE(Country,'US') as Output

Output
CountryOutput
ItalyItaly
NULLUS

Example: 2
The following example returns the home phone number of the named individual (if present), or office phone if HomePhone is null, or CellPhone if present and both home and office phone values are null. Returns NULL if all three values are null.

  1. SELECT
  2. Name,
  3. COALESCE (HomePhone, OfficePhone, cellPhone) FROM PhoneDirectory;

When a default value is provided, default value will be returned if all the input fields are null
  1. SELECT
  2. Name,
  3. COALESCE (HomePhone, OfficePhone, cellPhone,'No Phone') FROM PhoneDirectory;


Example:3
Whenever you combine COALESCE with non-comparison operators, such as || or * or +, you must enclose the entire COALESCE function and its operands in parentheses.

  1. SELECT (COALESCE(100,0)) * 5;

Identity Column in teradata

IDENTITY column can be set on Teradata table. It can be done while creating a table and you can not alter once you created.
You can not say that the numbers generated by IDENTITY will always be in sequence. This is because the identity column values are generated on an amp-local basis
Syntax:
  1. Column_name INTEGER GENERATED ALWAYS AS IDENTITY
  2. (START WITH value1
  3. INCREMENT BY value2
  4. MINVALUE value3
  5. MAXVALUE value4
  6. NO CYCLE)


Example:
  1. CREATE TABLE tbl_emp
  2. (id INTEGER GENERATED ALWAYS AS IDENTITY
  3. (START WITH 1
  4. INCREMENT BY 1
  5. MINVALUE 0
  6. MAXVALUE 1000000
  7. NO CYCLE),
  8. Name VARCHAR(20),
  9. Phone VARCHAR(10));

Sparse join index

SPARSE join index is being created in Teradata as a result of filtering the rows while joining with the multiple tables.
They are a type of Join Index which contains a WHERE clause that reduces the number of rows which would otherwise be included in the index. 
All types of join indexes, including single table, multitable, simple or aggregate can be sparse. 

Example code:
  1. CREATE JOIN INDEX ind_employee AS
  2. SELECT A.DEPTno , b.sal
  3. FROM
  4. dept A
  5. INNER JOIN
  6. employee B
  7. ON A.deptno=B.dept
  8. WHERE A.deptno=20 --- sparse index
  9. PRIMARY INDEX (sal);


Note:
Join Index never allows a Unique Index to be created.

Explanation:
--> Sal is explicitly mentioned as NUPI
--> A filter clause is applied on Dept.DeptNO = 20, which reduces the number of rows ( an Ideal case of Sparse Index )

how To find join indexes that are defined on your table

  1. SELECT * FROM dbc.indices
  2. WHERE indextype='J'
  3. AND databasename='your_database_name'
  4. AND tablename='your_table_name'
  5. ORDER BY indexname,columnposition;

Query to find Skew factor of particular table

  1. SELECT
  2. TABLENAME,
  3. SUM(CURRENTPERM) /(1024*1024) AS CURRENTPERM,
  4. (100 - (AVG(CURRENTPERM)/MAX(CURRENTPERM)*100)) AS SKEWFACTOR
  5. FROM
  6. DBC.TABLESIZE
  7. WHERE DATABASENAME= <DATABASENAME>
  8. AND
  9. TABLENAME =<TABLENAME>
  10. GROUP BY 1;

Monday, 10 March 2014

What are the different return codes(severity errors) in Teradata utilities?

Ans:

There are 3 basic return codes (severity errors) in teradata utilities.

0 - success
4 - Warning
8 - User error
12 - System error
16 - system error   


Please note that apart from this there are separate error codes for each of the error  returned from sql queries.