Wednesday, 16 July 2014

How to free some space in Teradata Database

We may come across situation in which we may be encountering errors related with non availability of Storage Space in Teradata. For example:
“NO MORE ROOM IN DATABASE”
or

“NO MORE SPOOL SPACE IN DATABASE”
How to free some space in Teradata Database
How to free some space in Teradata Database
So what one should do in such cases? We are talking about situations when we really have Storage Space shortage in our Teradata Environment. Adding more physical Storage Space is not the best option available all the time. So what one must do in order to free some existing occupied space ? Sharing below few possible methods of releasing some occupied space:
  1. Implementing Single Value or Multi-Value Compression is considered as a very good practice for occupying much less space for same amount of data. In some cases, it can really show huge space saving.
  2. Dropping of unwanted Secondary Indexes will result in removing Secondary Index Sub-table hence releasing some much required Storage space.
  3. Purging of some DBC tables like ACCESSLOG or DBQL tables may free some space. These tables should not hold data for infinite period. Some purging strategy should be followed for these tables.
  4. Try identifying intermediary work tables and deleting them. If the Work Tables are used only for intermediary calculations then implement purging strategy for such tables once the work is done.
  5. Deleting unwanted Journals may also save some space.
  6. If the tables are created with FALLBACK option and the data is not that critical and can be build again in DataWareHouse , then change the tables to NO FALLBACK. This will help in saving huge amount of data as it requires twice the table size with FALLBACK option enabled.
  7. If you are facing issues due to non-availability of SPOOL Space, then try optimizing your SQL queries , remove any PRODUCT JOIN and collect stats on columns participating in Joins conditions. Implementing Compression may also help in overcoming SPOOL Space error.
  8. Sometimes we create back-up of tables before doing any table loading in order to verify the data with old back up tables. If data is correct and we don’t need back-up tables then such tables should be dropped.
  9. If there are some tables which are not used anymore , try Archiving them ; move such tables to other inexpensive Disk hence saving some space in actual disk.
  10. If nothing is working for you, consult your DBA for increasing the Space allocated to any Database.
Which method you prefer generally and why for making some space in Teradata database? If I miss any method here, feel free to add by leaving a comment.

Thursday, 10 July 2014

Real time issue in Teradata

This error will happen when you update the target with multiple rows from the source.
If you defined primary index field for your target and if you defined those fileds in update query condition, this error will be obvious one.

For example,

  1. UPDATE tbl_emp
  2. FROM
  3. (SELECT
  4. id,
  5. name
  6. FROM tbl_dept) DEPT
  7. SET name = DEPT.name
  8. WHERE
  9. tbl_emp.id = DEPT.id;


In the above code, if tbl_dept has non-unique values and tbl_emp is a set table, then the error will appear.
Solution:
You can render the query with DISTINCT in the select query.

  1. UPDATE tbl_emp
  2. FROM
  3. (SELECT DISTNCT
  4. id,
  5. name
  6. FROM tbl_dept) DEPT
  7. SET name = DEPT.name
  8. WHERE
  9. tbl_emp.id = DEPT.id;

Thursday, 26 June 2014

Operational DataStore(ODS)


An ODS stands for -Operational DataStore.
ODS is a database which is designed to integrate data from multiple sources for additional operations on the data like - cleaning, resolving redundancy and checking against business rules for integrity etc.
Operational data store is a real time integrated data store in the organization to manage the business process and their results.
Data Warehouse is used for strategic decision making whereas ODS is used to day to day decision making.

Fact tables

Fact -A fact table typically has two types of columns, foreign keys to dimension tables and measures those that contain numeric facts. A fact table can contain fact’s data on detail or aggregated level.
Types of Facts -
Additive:
Additive facts are facts that can be summed up through all of the dimensions in the fact table. A sales fact is a good example for additive fact.
Semi-Additive:
Semi-additive facts are facts that can be summed up for some of the dimensions in the fact table, but not the others.
Eg: Daily balances fact can be summed up through the customers dimension but not through the time dimension.
Non-Additive:
Non-additive facts are facts that cannot be summed up for any of the dimensions present in the fact table.
Eg: Facts which have percentages, ratios calculated.

Factless Fact Table:
 In the real world, it is possible to have a fact table that contains no measures or facts. These tables are called “Factless Fact tables”.
Eg: A fact table which has only product key and date key is a factless fact. There are no measures in this table. But still you can get the number products sold over a period of time.
Based on the above classifications, fact tables are categorized into two:
Cumulative:
This type of fact table describes what has happened over a period of time. For example, this fact table may describe the total sales by product by store by day. The facts for this type of fact tables are mostly additive facts. The first example presented here is a cumulative fact table.
Snapshot:
This type of fact table describes the state of things in a particular instance of time, and usually includes more semi-additive and non-additive facts. The second example presented here is a snapshot fact table.

Dimension tables in datawarehouse

Dimension -
A dimension table typically has two types of columns, primary keys to fact tables and textual\descreptive data.
Types of Dimensions -
Slowly Changing Dimensions:
 Attributes of a dimension that would undergo changes over time. It depends on the business requirement whether particular attribute history of changes should be preserved in the data warehouse. This is called a Slowly Changing Attribute and a dimension containing such an attribute is called a Slowly Changing Dimension.

Rapidly Changing Dimensions:
A dimension attribute that changes frequently is a Rapidly Changing Attribute. If you don’t need to track the changes, the Rapidly Changing Attribute is no problem, but if you do need to track the changes, using a standard Slowly Changing Dimension technique can result in a huge inflation of the size of the dimension. One solution is to move the attribute to its own dimension, with a separate foreign key in the fact table. This new dimension is called a Rapidly Changing Dimension.

Junk Dimensions:
A junk dimension is a single table with a combination of different and unrelated attributes to avoid having a large number of foreign keys in the fact table. Junk dimensions are often created to manage the foreign keys created by Rapidly Changing Dimensions.

Inferred Dimensions:
While loading fact records, a dimension record may not yet be ready. One solution is to generate an surrogate key with Null for all the other attributes. This should technically be called an inferred member, but is often called an inferred dimension.

Conformed Dimensions:
A Dimension that is used in multiple locations is called a conformed dimension. A conformed dimension may be used with multiple fact tables in a single database, or across multiple data marts or data warehouses.

Degenerate Dimensions:
 A degenerate dimension is when the dimension attribute is stored as part of fact table, and not in a separate dimension table. These are essentially dimension keys for which there are no other attributes. In a data warehouse, these are often used as the result of a drill through query to analyze the source of an aggregated number in a report. You can use these values to trace back to transactions in the OLTP system.

Role Playing Dimensions:
A role-playing dimension is one where the same dimension key — along with its associated attributes — can be joined to more than one foreign key in the fact table. For example, a fact table may include foreign keys for both Ship Date and Delivery Date. But the same date dimension attributes apply to each foreign key, so you can join the same dimension table to both foreign keys. Here the date dimension is taking multiple roles to map ship date as well as delivery date, and hence the name of Role Playing dimension.

Shrunken Dimensions:
A shrunken dimension is a subset of another dimension. For example, the Orders fact table may include a foreign key for Product, but the Target fact table may include a foreign key only for ProductCategory, which is in the Product table, but much less granular. Creating a smaller dimension table, with ProductCategory as its primary key, is one way of dealing with this situation of heterogeneous grain. If the Product dimension is snowflaked, there is probably already a separate table for ProductCategory, which can serve as the Shrunken Dimension.

Static Dimensions:
Static dimensions are not extracted from the original data source, but are created within the context of the data warehouse. A static dimension can be loaded manually — for example with Status codes — or it can be generated by a procedure, such as a Date or Time dimension.

What is a staging area? Why we need it? What is the purpose of a staging area?


Staging area is place where we the hold temporary tables on data warehouse server.
Staging tables are connected to work area or fact tables.
We basically need staging area to hold the data for cleansing(removing spaces/nulls or to provide expected date format etc.)
and merging purpose, before loading the data into data warehouse/data mart.

Types of Schemas in dimensional modeling:


 Below are the types of schema in dimensional modeling with their advantages and disadvantages:-

Star schema :-
A logical structure that has a fact table containing factual data in the center, surrounded by dimension tables containing reference data, which can be denormalized

Advantage
Improve query performance for often-used data
Easy to define
Reduces number of physical joins

Disadvantage
High overhead of redundant data
Inappropriate when additional data is not accessed often.

Snowflake schema
A variant of the star schema where dimension tables do not contain denormalized data
It allows dimensions to have dimensions

Advantage
Provides best performance when queries involve aggregation

Disadvantage
Maintenance is complicated
Number of tables involve are more
More joins are needed

Starflake schema
A hybrid structure that contains a mixture of star and snowflake schema

What is Datamart ? why we need it? Types of Datamarts


Definition of datamart:–

Datamart is a subset of a Data Warehouse that supports the requirements of a particular department or business function.
Data Mart is specialized single-line of business warehouses. Data Mart collects data for a single department or a specific group of people.


Below are the reasons to create Data Marts :-

1. Provide users access to data the need to analyze most often
2. Provide data in a common format for a group of users
3. Improve end user response time
4. Provide appropriately structured data based on the requirements of end user access
    tools
5.Less complex implementation and setup
6.Reduced cost
7.Easier to support users


Types of Datamarts


There are three types of Datamarts mentioned below :-

1. Multidimensional Datamarts
2. Relational Datamarts
3. Networked Datamarts


Multidimensional Datamarts:-

Contain only numeric data
Maintain structure as data enters the framework
Only changed cells are updated and not the others
Contain sparsely populated matrices

Relational Data Marts :-

Contains numeric and textual both the data
Support star schemas/snow flake schema
Maintain structured data
Employ indexes while creating datamarts

Networked Data Marts:-

Data needs to be cleansed and standardized
Distributed data marts can be accessed from a single workstation
A standard metadata directory must exist in it
 

Data Mining


Data mining is a process of extracting hidden trends within a datawarehouse.
Example –An insurance datawarehouse can be used to mine data to identify the most high risk buildings to insure in a certain geographical area .
Data Mining also involves analyzing moderate to large amounts of detailed historical data to detect behavioral patterns that are then used to predict future behavior.
Using Data Mining, organization can able to find the answer of the questions like-Which customers are most likely to leave?

Advantages of Data Ware Housing


Below are the advantages of data warehousing:-
Increase customer profitability
Increase customer profitability
Cost effective decision making
Manage customer and business partner relationships
Manage risk, assets and liabilities
Integrate inventory, operations and manufacturing
Improving turnaround time for data access and reporting to locate, access, and analyze information ,for example –To link multiple locations and geographies
Identify developing trends and reduce time to market
Strategic advantage over competitors

Data Warehouse lifecycle


Below is the Data Warehouse project lifecycle:-
• Project Planning
• Business Requirement Definition(SRS)
• Technical Architecture Design
• Dimensional Modeling
• Physical Design (Data Base designing)
• Data Staging Design and Development (ETL)
• Analytic Application Specification Design and Development (OLAP)
• Testing and Production Deployment
• Maintenance and Support

Components of datawarehouse?

Below are the components of a data warehouse:-

Source databases/systems (Example - Legacy, relational, text or external sources)
Data extraction/transformation/load (ETL) tool (Example –Informatica, Datastage)
Data warehouse maintenance and administration tools
Data modeling tool or interface to external data models (Example –Erwin)
Warehouse databases (Example-Teradata, Oracle, and MySql)
End-user data access and analysis/reporting tools (Example –OBIEE, Business Objects)

What is datawarehousing ?

Data Warehousing is a process to properly assemble and manage data from various sources to answer business questions not previously possible or known. A Data Warehouse has a centrally located logical architecture which minimizes data synchronization hence provides a single view of the business. A Data Warehousing is a subject-oriented, integrated, time-variant, and non-volatile collection of data in support of management’s decision-making process.

Global Temporary tables in teradata

About GLOBAL TEMPORARY: 

You are working in a Teradata database application. 
You have to CREATE a table and insert some records for performance analysis. 
The table has several columns and you have to DELETE the content once you have done with your job. 
But you don't want to drop the table since it is being used daily for the analysis. 

What can be done for this? 

Teradata gives GLOBAL TEMPORARY option with CREATE TABLE syntax. 

What special ? 

If you create a table with GLOBAL TEMPORARY option, the life of the CONTENTS of the table will be only for the current session. 
They will be automatically deleted by Teradata manager once session expires. (Only the content not definition) 

Syntax: 
  1. CREATE GLOBAL TEMPORARY TABLE table_1
  2. (
  3. column1 datatype,
  4. column2 datatype,
  5. .
  6. .
  7. columnN datatype
  8. );



  1. CREATE GLOBAL TEMPORARY TABLE employee
  2. (
  3. ID INT,
  4. Name VARCHAR(20)
  5. );


Interesting about GLOBAL TEMPORARY: 

Now, you are creating a GLOBAL TEMPORARY table and inserting some records successfully into it. 
SELECT * from the GLOBAL TEMPORARY table. Shocked ? ... 
Yes. You cannot see the records because Teradata becomes strict to GLOBAL TEMPORARY tables by incorporating ON COMMIT DELETE ROWS by default to add more meaning to TEMPORARY. 

so, if you want to see the records, you have to explicitly write ON COMMIT PRESERVE ROWS immediately after CREATE TABLE syntax. 

  1. CREATE GLOBAL TEMPORARY TABLE table_1
  2. (
  3. column1 datatype,
  4. column2 datatype,
  5. .
  6. .
  7. columnN datatype
  8. ) ON COMMIT PRESERVE ROWS;



More about GLOBAL TEMPORARY: 

1. You can have 2000 volatile tables in an active session. 
2. REFERENTIAL constraints are not allowed. 



Here GLOBAL TEMPORARY says, DO and GO.. i will PRESERVE and take care... 

..

Volatile table in teradata

You are working in a Teradata database application. You have to CREATE a table and insert some records for performance analysis. You have to DROP the table once you have done with your job. If this is the case, will you think a smart way for this?

Yeah. Teradata gives VOLATILE option with CREATE TABLE syntax. 

What special ?

If you create table with VOLATILE option, the life of the table will be only for the current session.
It will be automatically dropped by Teradata manager once session expires. (Both definition and content)

Syntax:
  1. /* Volatile Table Syntax - Teradata */
  2. CREATE VOLATILE TABLE table_1
  3. (
  4. column1 datatype,
  5. column2 datatype,
  6. .
  7. .
  8. columnN datatype
  9. );



  1. CREATE VOLATILE TABLE employee
  2. (
  3. ID INT,
  4. Name VARCHAR(20)
  5. );


Interesting about VOLATILE:

Now, you are creating a VOLATILE table and inserting some records successfully into it.
SELECT * from the volatile table. Shocked ? ...
Yes. You cannot see the records because Teradata becomes strict to VOLATILE tables by incorporating ON COMMIT DELETE ROWS by default to add more meaning to VOLATILE.

so, if you want to see the records you have to explicitly write ON COMMIT PRESERVE ROWS immediately after CREATE TABLE syntax.

  1. CREATE VOLATILE TABLE table_1
  2. (
  3. column1 datatype,
  4. column2 datatype,
  5. .
  6. .
  7. columnN datatype
  8. ) ON COMMIT PRESERVE ROWS;



More about VOLATILE:

1. You can have 1000 volatile tables in an active session.
2. CHECK and REFERENTIAL constraints are not allowed.
3. DEFAULT clause not allowed.


Here VOLATILE says, DO and GO.. i will take care...