Teradata Material

Monday, 10 March 2014

How many error tables are there in fload and Mload and what is their significance/use?

Answers:

Fload uses 2 error tables

ET TABLE 1: where format of data is not correct.

ET TABLE 2: violations of UPI

It maintains only error field name, error code and data parcel only.

Mload also uses 2 error tables (ET and UV), 1 work table and 1 log table

1. ET TABLE - Data error

MultiLoad uses the ET table, also called the Acquisition Phase error table, to store data errors found during the acquisition phase of a MultiLoad import task.

2. UV TABLE - UPI violations

MultiLoad uses the UV table, also called the Application Phase error table, to store data errors found during the application phase of a MultiLoad import or delete task

Apart from error tables, it also has work and log tables

3. WORK TABLE - WT

Mload loads the selected records in the work table

4. LOG TABLE

A log table maintains record of all checkpoints related to the load job, it is essential/mandatory to specify a log table in mload job. This table will be useful in case you have a job abort or restart due to any reason.

What are set tables and multiset tables in Teradata?Explain with an appropriate example?

Answers:
1) Set tables cannot accept duplicate at row level not Index or key level.
Example of rows for set table:
R1 c2 c3 c4 ..cn
1 2 3 4 ... 9

Accepted
1 2 3 4 ... 9
Duplicate is Rejected
2 1 2 4 ... 9
3 2 4 4 ... 9
4 3 4 4 ... 9

2) Multi set Tables can accept
duplicate at row level not Index or key level.Exmaple of rows for multi set table:
R1 c2 c3 c4 .. cn
1 2 3 4 ... 9
1 2 3 4 ... 9
Duplicate is Accepted
2 1 2 4 ...9
3 2 4 4 ...9
3 2 4 4 ...9
Duplicate is Accepted

What is residual condition in explain plan ?

What is residual condition in explain plan ?
Answer

It is a condition which help u to reduce the number used for join condition. Residual condition does not help in locating a row

Why are AMPs and PEs called as vprocs ?

Answer

AMPs and PEs are implemented as “virtual processors - vprocs”.
They run under the control of PDE and their number is software configurable.
AMPs are associated with “virtual disks – vdisks” which are associated with logical units (LUNs) within a disk array

Answer2:
Vprocs:Virtual process From PE to AMP (This is the network root via MSP(message passing layer),The processing data will store in Disks(These are Physical disks),Each Amp have too many P.disks,to migrate these P.disks The Bynet Network maintains Virtual disks.These V.disks will responsible for data migration.hence they are called as Virtual Process(VPROCS).

What is Teradata Virtual storage?

Answer

This concept is introduced in TD12. It does the following tasks
- maintains information on frequency of data access
- tracks data storage task on physical media
- migrating frequently used data to fast disks and less frequently used data to slower disks
- allocating cyclinders from storage to individual AMPs

Which is faster – MultiLoad delete or Delete command?

Answer

MultiLoad delete is faster then normal Delete command, since the deletion happens in data blocks of 64Kbytes, where as delete command deletes data row by row. Transient journal maintains entries only for Delete command since Teradata utilities doesn’t support Transient journal loading.

What do High confidence, Low confidence and No confidence mean in EXPLAIN plan?

Answer

Explain gives the execution strategy means what are the different steps that the query will go through.
HIGH CONFIDENCE: Statistics are collected.
LOW CONFIDENCE: Statistics are not collected. But the where condition is having the condition on indexed column. Then estimations can be based on sampling.
NO CONFIDENCE: Statistics are not collected and the condition is on non indexed column.

What is the default join strategy in Teradata???

Answer
# 1
The Teradata Cost based optimizer will decide the join strategy based on the optimum path. The common strategies followed are from Merge, Hash & Nested Join

Answer
# 2
Three strategies followed by optimizer are:
1. Duplication of rows of one table to every amp
--> This one is opted by optimizer when the non-PI column is on a small table.
2. Redistribution of the non PI rows of the table to the amp containing the matching PI row amp.
--> This one is opted when the tables are large. The non PI column is in one table is redistributed to the amp containing the matching PI.
3. Redistribute both the rows of the table by hash values.
--> This is chosen when the join is on a column that is not the PI in either table. This will require the most spool space.

How many types of Skew exist?

Answer

If you utilized unequally TD resources (CPU,AMP,IO,Disk and etc) this is called skew exists. Major are 3 types of skews (CPU skew, AMP/Data skew, IO Skew).

-Data skew?
When data is not distributed equally on all the AMPs.
-Cpu skew?
Who is taking/consuming more CPU called cpu skew.
-IO skew?
Who perform more IO Operation? Resulting in IO Skew

Can you load multiple data files for same target table using Fastload?

Answer:

Yes, we can Load a table using multiple datafiles in Fastload.

Before giving "end loading" statement user can define file path and use insert sql for multiple source files and give "end loading" statement at the end

How many codd's rules are satisfied by teradata database?

How many codd's rules are satisfied by teradata database?
Answers:
12 codd’s rules

Explain about Skew Factor?

Answers:

The data distribution of table among AMPs is called Skew Factor . Generally For Non-Unique PI we get duplicate values so the more duplicate vales we get more the data have same row hash so all the same data will come to same amp, it makes data distribution inequality,One amp will store more data and other amp stores less amount of data, when we are accessing full table,
The amp which is having more data will take longer time and makes other amps waiting which leads processing wastage In this situation (unequal distribution of data)we get Skew Factor HighFor this type of tables we should avoid full table scans.
ex:AMP0 AMP110000(10%) 9000000(90%)in this situation skew factor is very high 90%

What is use of compress in teradata?Explain?

Answers:

Compression is used to Minimize the table size, for example when the table size is increasing anonymously We can use Compression to reduce the size of the table
Conditions:1.Compression can be declared at the time of table creation2.We can compress up to 256 column values(not columns) 3.We can't compress variable length fields (vartext,varchar..)

Explanation
for 2 condition:create table tab1(::Order_type char(25) compress ('air','sea','road'):)in the above example order type have 3 fields, one should be selected by the user, so one of the field will repeat for every order, like these column values we can use compress statement because these are repeating for
entire table, like these column values TD supports 256 col generally NCR people will provides ready made scripts for these type of compressions However, we can store only one value per column and not 3(air, sea, road). The compressed value is
stored in column header and is to be used as default of that column unless a value is present.
e.g.
Dept of Taxes in Washington has a database that tracks all people working in Washington. Around 99.9% of the tax payers would have Washington as their state code on their address. Instead of storing “Washington” in millions of records the compress will store the value “Washington” in the table header. Now, a value can be assumed in a row as a default unless another value exists inside the column

What is the process to restart the multiload if it fails

What is the process to restart the multiload if it fails?
Answers:

MULTILOAD will creates 2 error tables, 1 work table When MULTILOAD fails We have to unlock the Main Table, here error tables and work tables are not locked like FLOAD.

To Unlock the Main Table in case of acquisation Phase :
Mload RELEASE MLOAD ;
To release lock in application phase failure :
RELEASE MLOAD .IN APPLY;

Can we load a Multi set table using MLOAD?

Answers:

We can Load SET, MULTISET tables using Mload, But here when loading into MULTISET table using MLOAD duplicate rows will not be rejected, we have to take care of them before loading.But in case of Fload when we are loading into MULTISET duplicate rows are automatically rejected, FLOAD will not load duplicate rows weather table is SET or MULTISET

Can I use “drop” statement in the utility “fload”?

Ans:

YES,But you have to declare it out of the FLOAD Block it means it should not come between .begin loading,.end loading FLOAD also supports DELETE,CREATE,DROP statements which we have to declare out of FLOAD blocking the FLOAD Block we can give only INSERT

Is it possible that there r two primary key will be in a same table?

Answers:

Primary key
==========
1. A table should have only one primary key
2. More than one column can consist of a primary key – upto 64 columns
3. Can not be NULL values (missing values)
4. Should be unique values (no duplicate value)

Foreign key
==========
1. A table may have zero or more than that up-to 32 keys
2. More than one column can consist a primary key – up to 64 columns
3. Can have NULL values
4. Can have duplicate values
5. There should be a corresponding primary key in the parent table to enforce referential integrity for a foreign key

What are the types of tables in Teradata ?

Answers:

1. set table
2. multiset table
3. derived table
4. global temporary table(temporary table)
5. volatile table

Join strategies?

Join Strategies There are 2 tables, table A with 10 million records, table B has 100 million records, now we are joining both tables, when we seen Explain Plan the plan showing TD will took the table A and it will redistributes itNow the Question is: By that plan is the optimizer is correct job or not ? Justify Ans2. From the same above example now the optimizer is taking Table B (100 million records) and it is distributing it, Now is the optimizer is doing best? and How you avoid this situation

Answers:

Teradata is smart enough to decide when to redistribute and when to copy.
It compares the tables. Are they comparable? or one is big as compared to the other?Based on simple logic it decides whether to distribute the smaller table on all the AMPs or to copy.
what I mean is the small table is copied into all the AMPs in the SPOOL space.Remember all always the Join's has to take place on the AMPs SPOOL Space.By redistributing it is making sure that the 100 million rows table gets the feeling that
it is making AMP local JOIN. Remember the basic thing what ever Teradata does.
It does keeping in consideration for Space and Performance and not to forget the Efficiency.

My simple formula:
If the table is small redistribute them to all the AMPs to have the AMP local Join.
Always JOINs are made AMP local if it cannot then you have the high chance of running out of SPOOL space.

What is the difference between start schema and Fsldm?

Answers:
FSLDM --> Financial Services Logical Data Model (Teradata FS-LDM 7.0) its developed by Teradata for the financial sectors (Specifically for Bank). it can be customised based on the user requirement.

StarSchema --> Its a relational database schema for representing multidimensional data. The data is stored in a central fact table, with one or more tables holding information on each dimension.

Pages

Monday, 10 March 2014