Monday 10 March 2014

What is the difference between Global temporary tables and Volatile temporary tables?


Answers:

Global Temporary tables (GTT) –
1. When they are created, its definition goes into Data Dictionary.
2. When materialized data goes in temp space.
3. That's why, data is active upto the session ends, and definition will remain there up-to its not dropped using Drop table statement.If dropped from some other session then its should be Drop table all;
4. you can collect stats on GTT.

Volatile Temporary tables (VTT) -
1. Table Definition is stored in System cache
2. Data is stored in spool space.
3. thats why, data and table definition both are active only upto session ends.
4. No collect stats for VTT.If you are using volatile table, you can not put the default values on column level ( while creating table )

Teradata performance tuning and optimization


Answers:

1. collecting statistics
2. Explain Statements
3. Avoid Product Joins when possible
4. select appropriate primary index to avoid skewness in storage
5. Avoid Redistribution when possible
6. Use sub-selects instead of big "IN" lists
7. Use derived tables
8. Use GROUP BY instead of DISTINCT ( GROUP BY sorts the data locally on the VPROC. DISTINCT sorts the data after it is redistributed)
9. Use Compression on large tables

How do you transfer large amount of data in Teradata?

Answers: 

Transferring of large amount of data can be done using various Application Teradata Utilities which resides on the host computer ( Mainframe or Workstation) i.e. BTEQ, FastLaod, MultiLoad, Tpump and FastExport.

* BTEQ (Basic Teradata Query) supports all 4 DMLs: SELECT, INSERT, UPDATE and DELETE. BTEQ also support IMPORT/EXPORT protocols.
* Fastload, MultiLoad and Tpump transfer the data from Host to Teradata.
* FastExport is used to export data from Teradata to the Host.

Explain TPUMP (Teradata Parallel Data Pump) Utility in Teradata?

Ans: 

* TPUMP allows near real time updates from Transactional Systems into the Data Warehouse.
* It can perform Insert, Update and Delete operations or a combination from the same source.
* It can be used as an alternative to MLOAD for low volume batch maintenance of large databases.
* TPUMP allows target tables to have Secondary Indexes, Join Indexes, Hash Indexes, Referential Integrity, Populated or Empty Table, Multiset or Set Table or Triggers defined on the Tables.
* TPUMP can have many sessions as it doesn’t have session limit.
* TPUMP uses row hash locks thus allowing concurrent updates on the same table.

Can you recover the password of a user in Teradata?

Answers:

No, you can’t recover the password of a user in Teradata. Passwords are stored in this data dictionary table (DBC.DBASE) using a one-way encryption method. You can view the encrypted passwords using the following query

SELECT * FROM DBC.DBASE;

What is Reconfiguration Utility in Teradata and What it is used for?

Answers:

* When we feed Primary Index value to Hashing Algorithm then it gives us Row Hash(32 bit number) value which is used to make entries into Hash Maps.
* Hash Maps are the mechansim for determining which AMP will be getting that row.
* Each Hash Map is an array of 65,536 entries and its size is close to 128KB.

When Teradata is installed on a system then there are some scrpits which we need to execute i.e. DIP Scripts. So it creates a Hash Maps of 65,536 entries for the current configuration. But what if you want to add some more AMPs into your system?

Reconfiguration (Reconfig) is a technique for changing the configuration (i.e. changing the number of AMPs in a system) and is controlled by the Reconfiguration Hash Maps. System builds Reconfiguration Hash Maps by reassigning hash map entries to reflect new configuration of system.

Lets understand this concept with the help of an example; suppose you have a 4 AMPs system which holds 65,536 entries. Each AMP is responsible for holding (65,536/4=16,384) 16,384 entries.

Now you have added 2 more AMPs in your current configuration so you need to reconfigure your system. Now each AMP would be responsible for holding (65,536/6=10922) 10,922 entries.

What is the difference between Access Logging and Query Logging in Teradata?

Ans:

1. Access Logging is concerned with security (i.e. who’s is doing what). In access logging you ask the database to log who’s doing what on a given object. The information stored is based on the object not the SQL fired or the user who fired it.
2. Query Logging (DBQL) is used for debugging (i.e. what’s happening around ?). Incase of DBQL database keep on tracking various parameters i.e. the SQLs, Resource, Spool Usage, Steps and other things which help you understand what’s going on, the information is fruitful to debug a problem. Further DBQL is enabled on a User id rather than an object like say Table or so.

What is FILLER command in Teradata?

Ans:

While running Fastload or Multiload if you don’t want to load a particular field from the datafile to the target table then use the FILLER command to achieve this. Syntax for FILLER command would be as following:

.LAYOUT FILE_PRODUCT; /* It is input file layout name */
.FIELD Prod_No * char(11); /* To load data into Prod_No */
.FIELD Prod_Name * char(11); /* To load data into Prod_Name */
.FIELD Location * char(11); /* To load data into Location */
.FILLER Prod_Chars * char(20); /* To skip the value for the next 5 locations */

What are TPUMP Utility Limitations?

Answers:

Following are the limitations of Teradata TPUMP Utility:
* Use of SELECT statement is not allowed.
* Concatenation of Data Files is not supported.
* Exponential & Aggregate Operators are not allowed.
* Arithmatic functions are not supported.

What are the MultiLoad Utility limitations?

Ans:

MultiLoad is a very powerful utility; it has following limitations:

* MultiLoad Utility doesn’t support SELECT statement.
* Concatenation of multiple input data files is not allowed.
* MultiLoad doesn’t support Arithmatic Functions i.e. ABS, LOG etc. in Mload Script.
* MultiLoad doesn’t support Exponentiation and Aggregator Operators i.e. AVG, SUM etc. in Mload Script.
* MultiLoad doesn’t support USIs (Unique Secondary Indexes), Refrential Integrity, Join Indexes, Hash Indexes and Triggers.
* Import task require use of PI (Primary Index).

What are the functions of a Teradata DBA?

Ans:

Following are the different functions which a DBA can perform:
1. User Management – Creation and managing Users, Databases, Roles, Profiles and Accounts.
2. Space Allocation – Assigning Permanent Space, Spool Space and Temporary Space.
3. Access of Database Objects – Granting and Revoking Access Rights on different database objects.
4. Security Control – Handling logon and logoff rules for Users.
5. System Maintenance – Specification of system defaults, restart etc.
6. System Performance – Use of Performance Monitor(PMON), Priority Scheduler and Job Scheduling.
7. Resource Monitoring – Database Query Log(DBQL) and Access Logging.
8. Data Archives, Restores and Recovery – ARC Utility and Permanent Journals.

What are the 5 phases in a MultiLoad Utility?

Answers:

* Preliminary Phase – Basic Setup
* DML Phase – Get DML steps down on AMPs
* Acquisition Phase – Send the input data to the AMPs and sort it
* Application Phase – Apply the input data to the appropriate Target Tables
* End Phase – Basic Cleanup

How to eliminate Product Joins in a Teradata SQL query?

Ans: 

1. Ensure statistics are collected on join columns and this is especially important if the columns you are joining on are not unique.
2. Make sure you are referencing the correct alias.
3. Also, if you have an alias, you must always reference it instead of a fully qualified tablename.
4. Sometimes product joins happen for a good reason. Joining a small table (100 rows) to a large table (1 million rows) a product join does make sense.

How does Hashing happens in Teradata?

Ans: 

* Hashing is the mechanism through which data is distributed and retrieved to/from AMPs.
* Primary Index (PI) value of a row is the input to the Hashing Algorithm.
* Row Hash (32-bit number) value is the output from this Algorithm.
* Table Id + Row Hash is used to locate Cylinder and Data block.
* Same Primary Index value and data type will always produce same hash value.
* Rows with the same hash value will go to the same AMP.

So data distribution depends directly on the Row Hash uniqueness; be careful while Choosing Indexes in Teradata.

Friday 7 March 2014

How to skip the header row in the fastload script?

RECORD 2;   /* this skips first record in the source file */
DEFINE ...

How to identify PPI columns?


Answer:

Select databasename , tablename , columnposition ,columnname from dbc.indices
where indextype ='Q'
order by 1 ,2,3 ;

What are the scenarios in which Full Table Scans occurs?



1. The where clause in SELECT statement does not use either primary index or secondary index
2. SQL Statement which uses a partial value (like or not like), in the WHERE statement.
3. SQL Statement which does not contain where clause.
4. SQL statement using range in where clause. Ex. (col1 > 40 or col1 < =10000)

How do you create materialized view in Teradata?


There is no such thing as a "materialized view" in Teradata. The equivalent in Teradata would be a join index (or aggregate index) with a corresponding view put on top of it. The command to create one is "CREATE JOIN INDEX...(lots of options)".

Join indices are maintained when the data is maintained. They cannot be accessed directly, but are invoked when the optimizer determines it is beneficial. Even though you cannot invoke a join index directly, you create a view that looks like the join index and in that way, it would be similar to having direct access. However, this approach does not guarantee that the join index will be used.

How to find average sal with out using avg function?


How to find average sal with out using avg function?

Answer

Without using "avg" we can find the avg salary by using sum (sal)/count (sal);
sel sum(sal)/count(sal) as avgsal from tablename

How to handle nulls in Teradata & How many columns can be there in a table???


How to handle nulls in Teradata  & How many columns can be there in a table???

Answer

1. Use zeroifnull, nullifzero in select and NULL in insert directly.
2. 256 columns max per table.