Teradata Material

Thursday, 26 June 2014

Identity Column in teradata

IDENTITY column can be set on Teradata table. It can be done while creating a table and you can not alter once you created.
You can not say that the numbers generated by IDENTITY will always be in sequence. This is because the identity column values are generated on an amp-local basis
Syntax:


Column_name INTEGER GENERATED ALWAYS AS IDENTITY
(START WITH value1
INCREMENT BY value2
MINVALUE value3
MAXVALUE value4
NO CYCLE)

Example:


CREATE TABLE tbl_emp
(id INTEGER GENERATED ALWAYS AS IDENTITY
(START WITH 1
INCREMENT BY 1
MINVALUE 0
MAXVALUE 1000000
NO CYCLE),
Name VARCHAR(20),
Phone VARCHAR(10));

Sparse join index

SPARSE join index is being created in Teradata as a result of filtering the rows while joining with the multiple tables.
They are a type of Join Index which contains a WHERE clause that reduces the number of rows which would otherwise be included in the index.
All types of join indexes, including single table, multitable, simple or aggregate can be sparse.

Example code:


CREATE JOIN INDEX ind_employee AS
SELECT  A.DEPTno  ,  b.sal
FROM 
dept  A
INNER JOIN
 employee B
ON  A.deptno=B.dept
WHERE A.deptno=20   --- sparse index
PRIMARY INDEX (sal);

Note:
Join Index never allows a Unique Index to be created.

Explanation:
--> Sal is explicitly mentioned as NUPI
--> A filter clause is applied on Dept.DeptNO = 20, which reduces the number of rows ( an Ideal case of Sparse Index )

how To find join indexes that are defined on your table

SELECT * FROM dbc.indices
WHERE indextype='J'
AND databasename='your_database_name'
AND tablename='your_table_name'
ORDER BY indexname,columnposition;

Query to find Skew factor of particular table

SELECT
TABLENAME,
SUM(CURRENTPERM) /(1024*1024) AS CURRENTPERM,
(100 - (AVG(CURRENTPERM)/MAX(CURRENTPERM)*100)) AS SKEWFACTOR
FROM
DBC.TABLESIZE
WHERE DATABASENAME= <DATABASENAME>
AND
TABLENAME =<TABLENAME>
GROUP BY 1;

Monday, 10 March 2014

What are the different return codes(severity errors) in Teradata utilities?

Ans:

There are 3 basic return codes (severity errors) in teradata utilities.

0 - success

4 - Warning

8 - User error

12 - System error
16 - system error

Please note that apart from this there are separate error codes for each of the error returned from sql queries.

What is explain in teradata?

The EXPLAIN facility is a teradata extension that provides you with an "ENGLISH" translation of the

steps chosen by the optimizer to execute an SQL statement. It may be used on any valid teradata database with a preface called "EXPLAIN".

The following is an example:-

EXPLAIN select last_name first_name FROM employees;

The EXPLAIN parses the SQL statement but does not execute it.
This provides the designer with an "execution strategy".
The execution strategy provides what an optimizer does but not why it chooses them.
The EXPLAIN facility is used to analyze all joins and complex queries.

How to make sure Optimizer chooses NUSI over Full Table Scan (FTS)?

Answer:

A optimizer would prefer FTS over NUSI, when there are no Statistics defined on NUSI columns.
It would prefer scanning for full table instead of going for Subtables in NUSI since optimized does not have information about subtables of NUSI

It is always suggested to collect statistics whenever NUSI columns are defined on the table.

Verify whether index is being used by checking in Explain plan.

When should the statistics be collected?

Ans:

Here are some excellent guidelines on when to collect statistics:

· All Non-Unique indices

· Non-index join columns

· The Primary Index of small tables

· Primary Index of a Join Index

· Secondary Indices defined on any join index

· Join index columns that frequently appear on any additional join index columns that frequently appear in WHERE search conditions

· Columns that frequently appear in WHERE search conditions or in the WHERE clause of joins.

How to find out list of indexes in Teradata?

IndexType	Description
P	Nonpartitioned Primary
Q	Partitioned Primary
S	Secondary
J	join index
N	hash index
K	primary key
U	unique constraint
V	value ordered secondary
H	hash ordered ALL covering secondary
O	valued ordered ALL covering secondary
I	ordering column of a composite secondary index
M	Multi column statistics
D	Derived column partition statistics
1	field1 column of a join or hash index
2	field2 column of a join or hash index

SELECT

databasename, tablename, columnname, indextype, indexnumber, indexname

FROM

dbc.indices

ORDER BY

databasename, tablename, indexnumber;

How does indexing improve query performance?

Answers:

Indexing is a way to physically reorganize the records to enable some frequently used queries to run faster.

The index can be used as a pointer to the large table. It helps to locate the required row quickly and then return it back to the user.

The frequently used queries need not hit a large table for data. They can get what they want from the index itself. - cover queries.

Index comes with the overhead of maintenance. Teradata maintains its index by itself. Each time an insert/update/delete is done on the table the indexes will also need to be updated and maintained.

Indexes cannot be accessed directly by users. Only the optimizer has access to the index.

What is the difference between MultiLoad & Fastload interns of Performance?

Answers:

If you want to load, empty table then you use the fastload, so it will very useful than the MultiLoad ,because fastload performs the loading of the data in 2phase and its no need a work table for loading the data .

So it is faster as well as it follows the below steps to load the data in the table

Phase1 - It moves all the records to the entire AMP first without any hashing

Phase2 - After giving end loading command, Amp will hashes the record and send it to the appropriate AMPS.

MultiLoad:

It does the loading in the 5 phases

Phase1 - It will get the import file and checks the script

Phase2 - It reads the record from the base table and store in the work table

Phase3 - In this acquisition phase it locks the table header

Phase4 - In the DML operation will done in the tables

Phase5 - In this table locks will be released and work tables will be dropped.

What is a clique?

A clique is a set of Teradata nodes that share a common set of disk arrays which are connected in daisy chain network to each disk array controller.

Cliques provide data accessibility if a node fails for any reason, Proms are distributed across all nodes in the system. Large multiple node systems will have clique mechanisms associated with them

How Teradata makes sure that there are no duplicate rows being inserted when it’s a SET table?

Answers:

Teradata will redirect the new inserted row as per its PI to the target AMP (on the basis of its row hash value), and if it find same row hash value in that AMP (hash synonyms) then it start comparing the whole row, and find out if duplicate.

If it’s a duplicate it silently skips it without throwing any error.

How to view every column and the columns contained in indexes in Teradata?

Answers:

Following query describes each column in the Teradata RDBMS

SELECT * FROM DBC.TVFields;

Following query describes columns contained in indexes in the Teradata RDBMS

SELECT * FROM DBC.Indexes;

How to select first N Records in Teradata?

Answers:

To select N records in Teradata you can use RANK function. Query syntax would be as follows

SELECT BOOK_NAME, BOOK_COUNT, RANK(BOOK_COUNT) A FROM LIBRARY QUALIFY A <= 10;

Where does TD store transient journal?

Ans:

In perm space -> dbc.transientjournal

But that special table can grow over dbc's perm limit until the whole system runs out of perm space.

What are the advantages and dis-advantages of secondary Indexes?

Answer:

Advantages:

1. A secondary index might be created and dropped dynamically

2.A table may have up to 32 secondary indexes.

3. Secondary index can be created on any column. .Either Unique or Non-Unique

4. It is used as alternate path or Least frequently used cases. ex. defining SI on non indexed column can improve the performance, if it is used in join or filter condition of a given query.

5. Collecting Statistics on SI columns make sure Optimizer choses SI if it is better than doing Full Table Scans

Disadvantages

1. Since Sub tables are to be created, there is always an overhead for additional spaces.

2. They require additional I/Os to maintain their sub tables.

3. The Optimizer may, or may not, use a NUSI, depending on its selectivity.

4. If the base table is Fallback, the secondary index sub table is Fallback as well.

5. If statistics are not collected accordingly, then the optimizer would go for Full Table Scan.

What are the difference types of temporary tables in Teradata?

Answers:

a. Global temporary tables

b. Volatile temporary tables

c. Derived tables

Global Temporary tables (GTT) –

1. When they are created, its definition goes into Data Dictionary.

2. When materialized data goes in temp space.

3. That's why, data is active up to the session ends, and definition will remain there up-to its not dropped using Drop table statement. If dropped from some other session then its should be Drop table all;

4. You can collect stats on GTT.

5. Defined with the CREATE GLOBAL TEMPORARY TABLE sql

Volatile Temporary tables (VTT) -

1. Local to a session (deleted automatically when the session terminates)

2. Table Definition is stored in System cache .A permanent table definition is stored in the DBC data dictionary database (DBC.Temptables) .

3. Data is stored in spool space.

4. That’s why; data and table definition both are active only up to session ends.

5. No collect stats for VTT.If you are using volatile table, you can not put the default values on column level (while creating table)

6. Created by the CREATE VOLATILE TABLE sql statement

Derived tables

1 Derived tables are local to an SQL query.

2 Not included in the DBC data dictionary database, the definition is kept in cache.

3 They are specified on a query level with an AS keyword in an sql statement

How do you find out number of AMP's in the Given system

Answer

1.running following query in queryman

Select HASHAMP () +1;

2. We can find out complete configuration details of nodes and amps in configuration screen of Performance monitor

How many error tables are there in fload and Mload and what is their significance/use?

Answers:

Fload uses 2 error tables

ET TABLE 1: where format of data is not correct.

ET TABLE 2: violations of UPI

It maintains only error field name, error code and data parcel only.

Mload also uses 2 error tables (ET and UV), 1 work table and 1 log table

1. ET TABLE - Data error

MultiLoad uses the ET table, also called the Acquisition Phase error table, to store data errors found during the acquisition phase of a MultiLoad import task.

2. UV TABLE - UPI violations

MultiLoad uses the UV table, also called the Application Phase error table, to store data errors found during the application phase of a MultiLoad import or delete task

Apart from error tables, it also has work and log tables

3. WORK TABLE - WT

Mload loads the selected records in the work table

4. LOG TABLE

A log table maintains record of all checkpoints related to the load job, it is essential/mandatory to specify a log table in mload job. This table will be useful in case you have a job abort or restart due to any reason.

Pages

Thursday, 26 June 2014

Monday, 10 March 2014