Teradata Material: February 2014

Thursday, 27 February 2014

FALLBACK IN TERADATA

The ability of the Teradata RDBMS to maintain an extra copy of every row of a table in different AMPs. During normal processing, reads are performed using only the primary copy, while updates are performed on both copies. If a failure renders the primary copy inaccessible, the fallback copy is used for reads and updates. The database maintains a log of changes missed by inaccessible copies and automatically applies the missed changes to synchronize the table copy when it becomes accessible.

FALLBACK tables use twice as much disk space as NON-FALLBACK rows.
Generally, fallback is usually only used on critical tables since it will double your disk usage.

During table creation or after a table is created, we may specify whether or not the system should keep a Fallback copy of the table.

Examples:


CREATE MULTISET TABLE employee, fallback,
datablocksize=32 kbytes,freespace= 10 percent,checksum=none
(empno integer,
salary integer);


CREATE SET TABLE deptartment,No fallback,
No Before Journal, No After  Journal,
Checksum =default
(
employee_number integer,
deptno integer
);

Removing Duplicates in TERADATA

Removing Duplicates

In case of Complete Row Duplicate

Take a new table with SET option
Load the duplicate data, so unique rows loaded
Drop old table and Rename new table name to old table name

In case of same column are duplicates

Take a new Table
Load the Distinct rows or Group by Queried rows into it, so unique rows loaded.
Drop old table and Rename new table name to old table name

Note: If u want to eliminate complete row duplicates then Take SET Table

Differences b/w Primary Key and Primary Index

One must not get confused between Primary Key and Primary Index in Teradata. Primary KEY is more of a logical thing however Primary INDEX is more of physical thing. In Teradata, Primary INDEX is used for finding best access path for data retrieval and data insertion and Primary KEY is used for finding each rows uniquely just like in other RDBMS. So below are few differences between PRIMARY KEY and PRIMARY INDEX:

	PRIMARY KEY	PRIMARY INDEX
1	PRIMARY KEY cannot be NULL	PRIMARY INDEX can be NULL
2	PRIMARY KEY is not mandatory in Teradata	PRIMARY INDEX is mandatory In Teradata
3	PRIMARY KEY does not help in data distribution.	PRIMARY INDEX helps in data distribution.
4	PRIMARY KEY should be unique.	PRIMARY INDEX can be UNIQUE (Unique Primary Index) or NON UNIQUE (Primary Index).
5	PRIMARY KEY is logical implementation.	PRIMARY INDEX is physical implementation.

- See more at: http://usefulfreetips.com/Teradata-SQL-Tutorial/difference-between-teradata-primary-index-and-primary-key/#sthash.52AFHXNf.dpuf

Teradata Interview Questions2

Q1. Explain Teradata Architecture ?
Ans1. Teradata Architecture majorly consists of Parsing Engine (PE) , BYNET, Access Module Processors (AMP) and VDISK. MORE.

Q2. What is Primary Index in Teradata and how it is different from Primary Key ?
Ans2. Primary Index is used for determining the access path to retrieve or store data into any table in Teradata. MORE

Q3. How Teradata distributes rows in AMP’s ?
Ans3. Teradata uses ROWHASH on PRIMARY INDEX column to determine which AMP should store the data. MORE

Q4. How to determine the size of table in Teradata?
Ans4. In Teradata, we can use DBC.TABLESIZE view to check the storage space used by any Table. MORE

Q5. How to check SQL history in Teradata?
Ans5. In Teradata we can use DBC.QRYLOG to check SQL history. MORE

Q6. How to find the number of AMPs in any system?
Ans6. SELECT HASHAMP()+1 ; [ We have to do +1 because the HASHAMP number starts with 0]

Q7. Which one is better IN or BETWEEN?
Ans7. If we have to search for range of values, it is always advisable to use BETWEEN rather than list of values in IN clause. BETWEEN tells the optimizer that it is range of values we are going to handle rather than individual values. Hence, BETWEEN is much faster than IN.

Q8. What are disadvantages of Secondary Index?
Ans8. Secondary Indexes need more I/O. Secondary Index slows down the process because of Sub-table maintainence. Secondary Index requires more Physical Storage.

Q9. What is default session transaction mode in Teradata?
Ans9. Teradata has two session transaction mode : BTET(Default) and ANSI. To change session mode we use following command before establishing session: .SET SESSION TRANSACTION (ANSI/BTET)

Q10. What is MSR (Multi Statement Request) in Teradata?
Ans10. Sometimes we may have to insert data into table by treating several INSERT blocks as one transaction. In that case, we use MSR. It can be implemented by putting a semi colon “;” in front of next line rather than putting it on end of existing line. eg:
INSERT INTO EMPLOYEES(NAME,AGE) VALUES(‘WILL’,26)
;INSERT INTO EMPLOYEES(NAME,AGE) VALUES(‘SMITH’,28);

Q11. Which is better GROUP BY or DISTINCT to remove duplicates present in a table?
Ans11. It depends on the tablesize. If the records are more than half a million then GROUP BY is much better than DISTINCT. However if the records are very less than DISTINCT performance is better than GROUP BY.

Q12.How can you apply access lock on table rather than read lock?
Ans12. We can override default read lock on any table by explicitly mentioning the ACCESS LOCK before using the table. Syntax is:
LOCK TABLE EMPLOYEES FOR ACCESS
SELECT * FROM EMPLOYEES;

Q13. Different method of releasing the FASTLOAD lock on the table ?
Ans13. Execute the FASTLOAD script by removing the DML block , i.e only BEGIN LOADING and END LOADING in the FASTLOAD script should be kept. Another method can be to drop the table and create it again.

Q14. What does SLEEP command do ?
Ans14. SLEEP command is used to specify the waiting time Teradata will take before re-trying for establishing the connection or session.

Q15. What does Tenacity command do ?
Ans15. Tencacity command is used to specify the total time Teradata will continue trying to establish the connection. After which Teradata won’t make any more attempts for establishing the connection.

Q16. How to skip 5 rows while reading data from file using Teradata Utilities?
Ans16. In BTEQ, we can use SKIP command and other utilities we can use RECORD command to specify from which record, Teradata should start reading data from file.

Q17. How many Join Strategies are available in Teradata?
Ans17. There are five Join Strategies available in Teradata : Merge, Nested, Hash, Product, Exclusion.

Q18. Different levels of COnfidence Optimizer may have while creating EXPLAIN Plan?
Ans18. HIGH CONFIDENCE, LOW CONFIDENCE, NO CONFIDENCE.

Q19. Can we load duplicate data using FASTLOAD?
Ans19. No, we cannot load duplicate data via FASTLOAD. FASTLOAD property is to discard the duplicate records. When we restart the FASTLOAD, it send the data again from last checkpoint. In that case, it may send some data again. So FASTLOAD identify such records as duplicate and hence discard it. Q20. Can we have several Primary Index on a table?
Ans20. No we cannot have multiple Primary Index on a table. However we can create 32 Secondary Indexes on a table.

21. How to save a string with single quote (‘) as part of the value?
Ans21. INSERT INTO CUSTOMER(COMMENTS) VALUES(‘I”m Good’);

Q22. How to select 5th row from a table?
Ans22. SELECT * FROM TABLENAME QUALIFY ROW_NUMBER() OVER (ORDER BY COLUMN1) = 5;

Q23. Where is password stored in Teradata?
Ans23. User can view the password in DBC.DBASE view however it will be in encrypted format.

Q24. How to find Teradata Release and Version details?
Ans24. SELECT * FROM DBC.DBCINFO;

Q25. How many phases are in MultiLoad?
Ans25. MultiLoad has 5 phases : Preliminary Phase, DML Transaction Phase, Acquisition Phase, Application Phase and Clean-Up phase.

Teradata SQL Short Cuts

Short Cuts

1. F2 : It will open QUERY BUILDER, with SYNTAX for all SQL Queries.

2. F5 : Execute SQL Query

3. F6 : Explain Plan for SQL Query

4. F9 : Execute SQL Queries in Parallel F10 : Abort SQL Query

5. F11 : Display Last Error encountered.

6. Ctrl + N : New SQL Query Window

7. Ctrl + Q : FORMAT SQL Query

8. Ctrl + U : Convert to UPPERCASE

9. Ctrl + H : Find & Replace - See more at: http://usefulfreetips.com/Teradata-SQL-Tutorial/teradata-sql-assistant-tips-settings-for-better-work/#sthash.hY8KdUQ2.dpuf

Views in TERADATA

Create Views in Teradata

Views in Teradata can be easily created or modified. Views are very useful and are used for several purposes. A VIEW is a virtual table and don’t need any physical space to save data. Only VIEW definition is stored in DATA DICTIONARY and every time VIEW is queried ; the SQL statement is executed and result is returned back to the user. VIEWS in Teradata can be created by using following syntax:

CREATE VIEWS IN TERADATA

CREATE VIEW VIEW_NAME (COL1,COL2,COL3) AS SELECT COLUMN1, COLUMN2, COLUMN3 FROM BASE_TABLE;

REPLACE/UPDATE/MODIFY VIEWS IN TERADATA

REPLACE VIEW VIEW_NAME (COL1,COL2,COL3,COL4) AS SELECT COLUMN1, COLUMN2, COLUMN3,COLUMN4 FROM BASE_TABLE;

**Notice there is no column selected as PRIMARY INDEX in VIEW Definition.

ADVANTAGES OF VIEWS IN TERADATA

Adds extra level of security by avoiding access to base tables.
Helps in giving more proper column names than the original names present in base tables.
Allows to limit access to the records present in the base table by the use of WHERE clause.
Allows to limit access to the columns present in the base table by SELECTing only required columns.
Combine the result from multiple tables and provide single place to access the records.
You can use the VIEW to insert or update the data present in the base table.

VIEWS WITH CHECK OPTION

If you are creating a VIEW with CHECK option, then it will not only user to insert or update data in the base table if it violates the data constraints applied in the VIEW definition. Let see the example below:

CREATE VIEW EMP AS SELECT EMP_ID, EMP_NAME, EMP_SAL FROM EMPLOYEE WHERE EMP_SAL>50000 WITH CHECK OPTION;
Now if you try to insert into the base table via this view any data which has EMP_SAL<50000, it will throw the following error:

****Failure 3564 Range Constraint: Check error in field …

- See more at: http://usefulfreetips.com/Teradata-SQL-Tutorial/create-views-in-teradata/#sthash.4RzLJUzk.dpuf

Collect Stats

Collect Statistics

COLLECT STATISTICS is a Teradata sql command that scans columns and indexes of a table and records demographics of the data.

COLLECT STATISITICS is used to provide the Teradata Optimizer with as much information on data as possible. The Optimizer uses this information to determine how many rows exist and which rows qualify for given values.

Collecting statistics can improve the execution of a sql. The optimizer can have more details about each column or index, and therefore determine a better join plan for resolving the query.

COLLECT STATISTICS COMMANDS SHOULD ONLY BE USED WHEN A SPECIFIC SQL (OR GROUP OF SQLS) IS BENEFITTED.

Because, Collecting statistics may not improve a sql’s efficiency. Therefore, the processing time used to collect the statistics was wasted, and this could be a large amount of processing time.

If a sql’s performance is shown to improve from a specific collected statistic, then the statistic should be initially collected and then refreshed on a regular basis.

To possibly avoid spoolspace errors in the application design phase, some common problem columns are listed below in "Collect Statistics - Problem Columns".

With COLLECT STATISTICS:

- COLLECT STATISTICS can make the usage of these more efficient:

Primary indexes

Secondary indexes

Join Columns or base tables

Join Indexes

Qualifying columns

-Statistics are especially informative if index values are distributed unevenly.

- When a query uses conditionals based on non-unique index values, then Teradata uses statistics to determine whether indexing or a full search of all table rows is more efficient.

- If Teradata determines that indexing is the best method, then it uses the statistics to determine whether spooling or building a bitmap would be the most efficient method of qualifying the data rows.

Without COLLECT STATISTICS:

- The Optimizer assumes:

Non-unique indexes are highly non-unique. (Lots of rows per value).

Non-Index columns are even more non-unique than non-unique indexes. (Lots of rows per value)

- Teradata derives row counts from a random AMP sample for:

Small tables (less than 1000 rows per amp),Unevenly distributed tables (skewed row distribution due to PI).

Random amp sample:

Look at data from 1 amp of table, and from this, estimate the total rows in the table.

Random amp samples may not represent the true total number of rows in the table because the rows in the table may not be distributed evenly. This occurs often with small tables. Asof 9/2000, per table, the random amp sample uses the same amp for each sql or query.

Performance Tuning Tips

SQL and Indexes :

1) Primary indexes: Use primary indexes for joins whenever possible, and specify in the where clause all the columns for the primary indexes.

2) Secondary indexes (10% rule rumor): The optimizer does not actually use a 10% rule to determine if a secondary index will be used. But, this is a good estimation: If less than 10% of a table will be accessed if the secondary index is used, then assume the sql will use the secondary index. Otherwise, the sql executionwill do a full table scan.

The optimizer actually uses a “least cost” method: The optimizer determines if the cost of using a secondary index is cheaper than the cost of doing a full table scan. The cost involves the cpu usage, and diskio counts.

3) Constants: Use constants to specify index column contents whenever possible, instead of specifying the constant once, and joining the tables. This may provide a small savings on performance.

4) Mathematical operations: Mathematical operations are faster than string operations (i.e. concatenation), if both can achieve the same result.

5) Variable length columns: The use of variable length columns should be minimized, and should be by exception. Fixed length columns should always be used to define tables.

6) Union: The “union” command can be used to break up a large sql process or statement into several smaller sql processes or statements, which would run in parallel. But these could then cause spoolspace limit problems. “Union all” executes the sql’s single threaded.

7) Where in/where not in (subquery): The sql “where in” is more efficient than the sql “where not in”. It is more efficient to specify constants in these, but if a subquery is specified, then the subquery has a direct impact on the sql time.

If there is a sql time problem with the subquery, then the sql subquery could be separated from the originalquery. This would require 2 sql statements, and an intermediate table. The 2 sql statements would be: 1) New sql statement, which does the previous subquery function, and inserts into the temporary table, and 2) Modified original sql statement, which doesn’t have the subquery, and reads the temporary table.

8) Strategic Semicolon: At the end of every sql statement, there is a semicolon. In some cases, the strategic placement of this semicolon can improve the sql time of a group of sql statements. But this will not improve an individual sql statement’s time. These are a couple cases: 1) The group’s sql time could be improved if a group of sql statements share the same tables (or spool files), 2) The group’s sql time could be improved if several sql statements use the same unix input file.

Reducing Large SQL’s :

The following methods can be used to scope down the size of sql’s.

1) Table denormalization: Duplicating data in another table. This provides faster access to the duplicated data, but requires more update time.

2) Table summarization: The data from one/many table(s) is summarized into commonly used summarytables. This provides faster access to the summarized data, but requires more update time.

3) SQL union: The DBC/SQL Union can be used to break up a large SQL process or statement into several smaller SQL processes or statements, which would run in parallel.

4) Unix split: A large input unix files could be split into several smaller unix files, which could then be input in series, or in parallel, to create smaller SQL processing steps.

5) Unix concatenation: A large query could be broken up into smaller independent queries, whose output is written to several smaller unix files. Then these smaller files are unix concatenated together to provide a single unix file.

6) Trigger tables: A group of tables, each contains a subset of the keys of the index of an original table. thetables could be created based on some value in the index of the original table. This provides an ability to break up a large SQL statement into multiple smaller SQL statements, but creating the trigger tables requires more update time.

7) Sorts (order by): Although sorts take time, these are always done at the end of the query, and the sort time is directly dependent on the size of the solution. Unnecessary sorts could be eliminated.

8) Export/Load: Table data could be exported (Bteq, Fastexport) to a unix file, and updated, and then reloaded into the table (Bteq, fastload, Multiload).

9) C PROGRAM/UNIX SCRIPTS: Some data manipulation is very difficult and time consuming in sql. These could be replaced with c programs/unix scripts. See the “C/Embedded sql” tip.

Reducing Table Update Time :

1) Table update time can be improved by dropping the table’s indexes first, and then doing the updates. After the completion of the updates, then rebuild the indexes, and recollect the table’s statistics on the indexes. The best improvement is obtained when the volume of table updates is large in relation to the size of the table. If more then 5% of a large table is changed.

2) Try to avoid dropping a table, instead, delete the table. Table related statements (i.e. create table, drop table) are single threaded thru a system permissions table and become a bottleneck. They can also cause deadlocks on the dictionary tables. Also, any user permissions specific to the table are dropped when the table is dropped, and these permissions must be recreated.

Collect Statistics Syntax in Teradata

The following are the Collect Statistics Syntaxes in Teradata.

“COLLECT STATISTICS ON tablename COLUMN columnname;” will collect statistics on a column.

“COLLECT STATISTICS ON tablename INDEX (columnname)” will collect statistics on an index.

“COLLECT STATISTICS ON tablename INDEX (col1, col2, ...)” will collect statistics on multiple columns of an index.

“HELP STATISTICS tablename;” will display the number of distinct values of the columns.

“COLLECT STATISTICS tablename;” refreshes (recollects) the table statistics.

“DROP STATISTICS ON tablename ... ;" will drop the statistics.

Alter Table command

Teradata ALTER Table function

We can alter the column without dropping the table by using the following syntax:

Syntax for adding new column name to table

ALTER TABLE <tablename> ADD <new_col> <data_type>;

Syntax to modify a column datatype length

ALTER TABLE <Table_Name>

ADD <Existing_Column_Name> <New_Data_Type(New_Length)>

Here are the results while applying above syntax for changing column's data type:

3558: Cannot alter the specified attribute(s) for f_name.

VARCHAR ( from VARCHAR(20) TO VARCHAR(22)) - successfully executed

INTEGER ( FROM INTEGER TO DECIMAL(15,0)) - gives syntax error - 3558

CHAR ( FROM CHAR (5) TO CHAR (7)) - syntax error - 3558

TIMESTAMP ( FROM TIMESTAMP(0) TO TIMESTAMP(1) )- syntax error - 3558

DECIMAL (FROM DECIMAL(15,0) TO DECIMAL(15,1) )- syntax error - 3558

It proves that these procedure is limited to VARCHAR.