Teradata Material

Thursday, 6 March 2014

Can we define PARTITION PRIMARY INDEX (PPI) on a Primary Index column in Table? Explain Why?

Ans:

PPI cannot be defined on PI column in Table. Since PI is used to locate an AMP and store the data based on HASH value (ROW ID ) it cannot be used as PPI column.
In case of PPI , The data stored in AMP's are Partitioned based on PPI column after they are row hashed (ROW KEY = ROW ID +PPI value )

P.S: If you want to create UPI on table, then PPI column can be added as part of PI .

How do you execute the given SQL statement repeatedly in BTEQ?

Answer:

Select top 1* from database.table1;
=n
Here "=n” is to run the previous sql statement, "n" number of times.

What are benefits of Permanent Journal?

What are the benefits of Permanent Journal?

The benefits of Permanent Journal are

Permits capture of before images for database rollback.
Permits capture of after images for database roll forward.
Permits archiving change images during table maintenance.
Reduces need for full-table backups.
Provides a means of recovering NO FALLBACK tables.
Requires additional disk space for change images.
Requires user intervention for archive and recovery activity

What are the benefits of fallback?

The benefits of fallback are

Protects your data from hardware (disk) failure.
Protects your data from software (node) failure.
Automatically recovers with minimum recovery time, after repairs or fixes are complete

How do you set default date setting in BTEQ?

Ans

There are two default date setting in BTEQ. They have to be set after logging on to the session

set session dateform = ANSIDATE ; /*format is yyyy-mm-dd */

set session dateform = integerdate ; /* format is yy/mm/dd -teradata date format */

Fast load

FASTLOAD

· Fast load is a batch mode utility to load huge volume of data as fast as possible
· Compare with all utilities Fast load is faster
· Fast load loads errors into Error Tables
· It is fully Automatic Restart ability and checkpoint configurable
· Fast load not support duplicate record loading if table is multi set and table should be empty.
· It runs in 2 phases and process the data block by block.

· Fast load require Two Error tables to catch any error which occurs during the execution of the Fast load.

Error Table1:

Error Table1 contains

1. Constraint violation errors

2. Unique violation errors

3. Unavailable AMP errors

Error Table2:

Error Table2 contains the errors if the table has UPI on the Table and we are trying to load duplicate records this type of errors loads into error table2.

Log Table:

· Fast load internally creates log table to implement the restart ability capability.
· If fast load script failed it stored failed point in log table.
· When script restarts it runs from the last failed point stored in log table.

Fast Load runs in Two Phases

Phase1:

· The main objective of this phase is to send rows of data file from HOST COMPUTER to Teradata AMPs as fast as possible.

· Rows of the data file packed into 64k Blocks and send it to PE.

· PE parse the SQL of Fast load and send the explain plan to each AMP. It will by default create 1 session for each AMP, so if u r system has 100 AMP then it will make 100 sessions for fast load job.

· After creation of sessions the 64blocks of data is passed to AMP with the help of PE and BYNET where it is quickly hashed according to its PI value.

· Based on the row hash value the rows are then redistributed its proper AMP.

· Now each row is placed in this proper AMP, but they are not sorted still now.

· Errors in this phase load Error Table1.

Phase2:

· The main objective of this phase to store each row into the Actual Target Table

· The rows sorted by AMP which is temporarily stored in its DISK during phase1

· These sorted then send to actual target Table where they will reside permanently

· Any errors in this phase stored into error table2

Surrogate key generation in Teradata

Surrogate key generation

Putting together a few steps to generate surrogate key:

Most of you might've dealt with it already. But, sending it as it might be a quick reference incase of future use
Generating it as such isn’t a big deal, it might get a little tricky when you are trying to insert new values in continuation of already existing surrogate key.

In the Bteq script, you may use a logic that exports current value of the surrogate key column into a data file:
.EXPORT DATA FILE <$DIR/filename>

SELECT MAX(SURR_KEY_COLUMN) (TITLE '')
FROM TGTDB.TGTTBL;

.EXPORT RESET

/****Read in MAXID variable*****/
.IMPORT DATA FILE <$DIR/filename>
USING (MAXID INT)

The above steps will populate MAXID with maximum value of surrogate key.
Now, this variable can be used along with CSUM function to start from where we stopped :)

INSERT INTO TGTDB.TGTTBL
(SURR_KEY_COLUMN
,COLUMN2
)
SELECT
:MAXID + CSUM(1,1)
,T1.COLUMN2
FROM
STGDB.STGTABLE T1
LEFT OUTER JOIN TGTDB.TGTTBL T2
ON T1.COLUMN2 = T2.COLUMN2
WHERE
T2.COLUMN2 IS NULL
;

The above is a typical example in loading target table(such as a lookup table) from staging table; with surrogate keys for the new values of COLUMN2

Macros in Teradata

Macros

Introduction

A frequently used SQL statement or series of statements can be incorporated into a macro and defined using the SQL CREATE MACRO statement. See “CREATE MACRO” in SQL Data Definition Language.

The statements in the macro are performed using the EXECUTE statement. See “EXECUTE (Macro Form)” in SQL Data Manipulation Language.

A macro can include an EXECUTE statement that executes another macro.

• guidelines for creating triggers • conditions that cause triggers to fire • trigger action that occurs when a trigger fires • the trigger action time • when to use row triggers and when to use statement triggers

CREATE TRIGGER in SQL Data Definition Language.

• temporarily disabling triggers • enabling triggers • changing the creation timestamp of a trigger

ALTER TRIGGER in SQL Data Definition Language.

permanently removing triggers from the system DROP TRIGGER in SQL Data Definition Language.

Definition

A macro consists of one or more statements that can be executed by performing a single statement. Each time the macro is performed, one or more rows of data can be returned.

Performing a macro is similar to performing a multistatement request (see “Multistatement Requests” on page 135).

Single-User and Multiuser Macros

You can create a macro for your own use, or grant execution authorization to others.

For example, your macro might enable a user in another department to perform operations on the data in Teradata Database. When executing the macro, a user need not be aware of the database being accessed, the tables affected, or even the results.

With the exception of CREATE AUTHORIZATION and REPLACE AUTHORIZATION, a data definition statement is allowed in macro if it is the only SQL statement in that macro.

A data definition statement is not resolved until the macro is executed, at which time unqualified database object references are fully resolved using the default database of the user submitting the EXECUTE statement. If this is not the desired result, you must fully qualify all object references in a data definition statement in the macro body.

A macro can contain parameters that are substituted with data values each time the macro is executed. It also can include a USING modifier, which allows the parameters to be filled with data from an external source such as a disk file. A COLON character prefixes references to a parameter name in the macro. Parameters cannot be used for data object names.

Executing a Macro

Regardless of the number of statements in a macro, Teradata Database treats it as a single request.

When you execute a macro, either all its statements are processed successfully or none are processed. If a macro fails, it is aborted, any updates are backed out, and the database is returned to its original state.

Ways to Perform SQL Macros in Embedded SQL

Macros in an embedded SQL program are performed in one of the following ways.

IF the macro … THEN use …

is a single statement, and that statement returns no data

• the EXEC statement to specify static execution of the macro -or-

• the PREPARE and EXECUTE statements to specify dynamic execution. Use DESCRIBE to verify that the single statement of the macro is not a data returning statement.

Static SQL Macro Execution in Embedded SQL

Static SQL macro execution is associated with a macro cursor using the macro form of the DECLARE CURSOR statement.

When you perform a static macro, you must use the EXEC form to distinguish it from the dynamic SQL statement EXECUTE.

Dynamic SQL Macro Execution in Embedded SQL

Define dynamic macro execution using the PREPARE statement with the statement string containing an EXEC macro_name statement rather than a single-statement request.

The dynamic request is then associated with a dynamic cursor. See “DECLARE CURSOR (Macro Form)” in SQL Stored Procedures and Embedded SQL for further information on the use of macros.

Dropping, Replacing, Renaming, and Retrieving Information About a Macro

For more information, see SQL Data Definition Language.

Archiving Macros

Macros are archived and restored as part of a database archive and restoration. Individual macros can be archived or restored using the ARCHIVE or RESTORE statements of the ARC utility.

For details, see Teradata Archive/Recovery Utility Reference.

• consists of multiple statements • returns data

a cursor, either static or dynamic.

The type of cursor used depends on the specific macro and on the needs of the application.

IF the macro … THEN use …

IF you want to … THEN use the following statement …

drop a macro DROP MACRO

redefine an existing macro REPLACE MACRO

rename a macro RENAME MACRO

get the attributes for a macro HELP MACRO

get the data definition statement most recently used to create, replace, or modify a macro

SHOW MACRO

Stored Procedures in Teradata

Stored Procedures

Introduction

Stored procedures are called Persistent Stored Modules in the ANSI SQL:2008 standard. They are written in SQL and consist of a set of control and condition handling statements that make SQL a computationally complete programming language.

These features provide a server-based procedural interface to Teradata Database for application programmers.

Teradata stored procedure facilities are a subset of and conform to the ANSI SQL:2008 standards for semantics.

Elements of Stored Procedures

The set of statements constituting the main tasks of the stored procedure is called the stored procedure body, which can consist of a single statement or a compound statement, or block.

A single statement stored procedure body can contain one control statement, such as LOOP or WHILE, or one SQL DDL, DML, or DCL statement, including dynamic SQL. Some statements are not allowed, including:

• Any declaration (local variable, cursor, condition, or condition handler) statement • A cursor statement (OPEN, FETCH, or CLOSE)

A compound statement stored procedure body consists of a BEGIN-END statement enclosing a set of declarations and statements, including:

• Local variable declarations

• Cursor declarations

• Condition declarations

• Condition handler declaration statements

• Control statements

• SQL DML, DDL, and DCL statements supported by stored procedures, including dynamic SQL

• Multistatement requests (including dynamic multistatement requests) delimited by the keywords BEGIN REQUEST and END REQUEST

Compound statements can also be nested.

For information about control statements, parameters, local variables, and labels, see SQL Stored Procedures and Embedded SQL.

Creating Stored Procedures

A stored procedure can be created from:

• BTEQ utility using the COMPILE command • CLIv2 applications, ODBC, JDBC, and Teradata SQL Assistant (formerly called Queryman) using the SQL CREATE PROCEDURE or REPLACE PROCEDURE statement.

The procedures are stored in the user database space as objects and are executed on the server.

For the syntax of data definition statements related to stored procedures, including CREATE PROCEDURE and REPLACE PROCEDURE, see SQL Data Definition Language. Note: The stored procedure definitions in the next examples are designed only to demonstrate the usage of the feature. They are not recommended for use.

Example: CREATE PROCEDURE

Assume you want to define a stored procedure NewProc to add new employees to the Employee table and retrieve the name of the department to which the employee belongs.

You can also report an error, in case the row that you are trying to insert already exists, and handle that error condition.

The CREATE PROCEDURE statement looks like this:

CREATE PROCEDURE NewProc (IN name CHAR(12), IN number INTEGER, IN dept INTEGER, OUT dname CHAR(10)) BEGIN INSERT INTO Employee (EmpName, EmpNo, DeptNo ) VALUES (name, number, dept); SELECT DeptName INTO dname FROM Department WHERE DeptNo = dept; END;

This stored procedure defines parameters that must be filled in each time it is called.

Modifying Stored Procedures

To modify a stored procedure definition, use the REPLACE PROCEDURE statement.

Example: REPLACE PROCEDURE

Assume you want to change the previous example to insert salary information to the Employee table for new employees.

The REPLACE PROCEDURE statement looks like this:

REPLACE PROCEDURE NewProc (IN name CHAR(12), IN number INTEGER, IN dept INTEGER, IN salary DECIMAL(10,2), OUT dname CHAR(10))

BEGIN INSERT INTO Employee (EmpName, EmpNo, DeptNo, Salary_Amount) VALUES (name, number, dept, salary); SELECT DeptName INTO dname FROM Department WHERE DeptNo = dept; END;

Executing Stored Procedures

If you have sufficient privileges, you can execute a stored procedure from any supporting client utility or interface using the SQL CALL statement. You can also execute a stored procedure from an external stored procedure written in C, C++, or Java.

You have to specify arguments for all the parameters contained in the stored procedure.

Here is an example of a CALL statement to execute the procedure created in “Example: CREATE PROCEDURE”:

CALL NewProc ('Jonathan', 1066, 34, dname);

For details on using CALL to execute stored procedures, see “CALL” in SQL Data Manipulation Language.

For details on executing stored procedures from an external stored procedure, see SQL External Routine Programming.

Output From Stored Procedures

Stored procedures can return output values in the INOUT or OUT arguments of the CALL statement.

Stored procedures can also return result sets, the results of SELECT statements that are executed when the stored procedure opens result set cursors. To return result sets, the CREATE PROCEDURE or REPLACE PROCEDURE statement for the stored procedure must specify the DYNAMIC RESULT SET clause.

For details on how to write a stored procedure that returns result sets, see SQL Stored Procedures and Embedded SQL.

Recompiling Stored Procedures

The ALTER PROCEDURE statement enables recompilation of stored procedures without having to execute SHOW PROCEDURE and REPLACE PROCEDURE statements.

This statement provides the following benefits:

• Stored procedures created in earlier releases of Teradata Database can be recompiled to derive the benefits of new features and performance improvements.

• Recompilation is also useful for cross-platform archive and restoration of stored procedures

• ALTER PROCEDURE allows changes in the following compile-time attributes of a stored procedure: • SPL option • Warnings option

Deleting, Renaming, and Retrieving Information About a Stored Procedure

For more information, see SQL Data Definition Language.

Archiving Procedures

Stored procedures are archived and restored as part of a database archive and restoration. Individual stored procedures can be archived or restored using the ARCHIVE or RESTORE statements of the ARC utility.

Fact Table

Fact Table

The Fact table contains numerical values of what we measure.It is a measurement of a business process.

Characteristics of Fact table:-

1. Fact table contains the keys to associated dimension tables which are foreign keys in fact table.

2. Fact table has lesser number of columns than Dimension table and larger number of rows than Dimension table

3. Fact table consist numerical attributes and used for calculation like- sum, aggregate etc.

Operational DataStore(ODS)

Operational DataStore(ODS)

An ODS stands for -Operational DataStore.

ODS is a database which is designed to integrate data from multiple sources for additional operations on the data like - cleaning, resolving redundancy and checking against business rules for integrity etc.

Operational data store is a real time integrated data store in the organization to manage the business process and their results.

Data Warehouse is used for strategic decision making whereas ODS is used to day to day decision making.

Staging Area in Data warehouse

What is a staging area? Why we need it? What is the purpose of a staging area?

Staging area is place where we the hold temporary tables on data warehouse server.

Staging tables are connected to work area or fact tables.

We basically need staging area to hold the data for cleansing(removing spaces/nulls or to provide expected date format etc.)

and merging purpose, before loading the data into data warehouse/data mart.

What is Datamart ? why we need it? Types of Datamarts?

What is Datamart ? why we need it?

Definition of datamart:–

Datamart is a subset of a Data Warehouse that supports the requirements of a particular department or business function.
Data Mart is specialized single-line of business warehouses. Data Mart collects data for a single department or a specific group of people.

Below are the reasons to create Data Marts :-

1. Provide users access to data the need to analyze most often
2. Provide data in a common format for a group of users
3. Improve end user response time
4. Provide appropriately structured data based on the requirements of end user access
tools
5.Less complex implementation and setup
6.Reduced cost
7.Easier to support users

Types of Datamarts

There are three types of Datamarts mentioned below :-

1. Multidimensional Datamarts
2. Relational Datamarts
3. Networked Datamarts

Multidimensional Datamarts:-

Contain only numeric data
Maintain structure as data enters the framework
Only changed cells are updated and not the others
Contain sparsely populated matrices

Relational Data Marts :-

Contains numeric and textual both the data
Support star schemas/snow flake schema
Maintain structured data
Employ indexes while creating datamarts

Networked Data Marts:-

Data needs to be cleansed and standardized
Distributed data marts can be accessed from a single workstation
A standard metadata directory must exist in it

Data Mining

Data Mining

Data mining is a process of extracting hidden trends within a datawarehouse.

Example –An insurance datawarehouse can be used to mine data to identify the most high risk buildings to insure in a certain geographical area .

Data Mining also involves analyzing moderate to large amounts of detailed historical data to detect behavioral patterns that are then used to predict future behavior.

Using Data Mining, organization can able to find the answer of the questions like-Which customers are most likely to leave?

Data warehouse project lifecycle

Data Warehouse lifecycle:

Below is the Data Warehouse project lifecycle:-

• Project Planning

• Business Requirement Definition(SRS)

• Technical Architecture Design

• Dimensional Modeling

• Physical Design (Data Base designing)

• Data Staging Design and Development (ETL)

• Analytic Application Specification Design and Development (OLAP)

• Testing and Production Deployment

• Maintenance and Support

Components of Data Warehouse

Components of Data Warehouse?

Below are the components of a data warehouse:-

Source databases/systems (Example - Legacy, relational, text or external sources)

Data extraction/transformation/load (ETL) tool (Example –Informatica, Datastage)

Data warehouse maintenance and administration tools

Data modeling tool or interface to external data models (Example –Erwin)

Warehouse databases (Example-Teradata, Oracle, and MySql)

End-user data access and analysis/reporting tools (Example –OBIEE, Business Objects)

What is Data Warehouse

What is Data ware Housing?

Data Warehousing is a process to properly assemble and manage data from various sources to answer business questions not previously possible or known. A Data Warehouse has a centrally located logical architecture which minimizes data synchronization hence provides a single view of the business. A Data Warehousing is a subject-oriented, integrated, time-variant, and non-volatile collection of data in support of management’s decision-making process.

Wednesday, 5 March 2014

Bteq vs FastExport and Multiload

BTEQ vs. FastExport and MultiLoad

BTEQ is similar to Teradata FastExport and Teradata MultiLoad in exporting and importing. Yet, BTEQ is not a utility designed for bulk data movement.

Firstly, Teradata export and load utilities are fully parallel. Therefore, in bulk exporting and loading scenarios, Teradata FastExport and MultiLoad are better performers than BTEQ because they are optimized to apply multiple rows in block-level operations. BTEQ can import and export small volumes of data to and from the Teradata Database. In this aspect, indeed, it is more efficient than Teradata SQL Assistant. However, for tables that have more than a few thousand rows (It depends), FastExport and MultiLoad are recommended for better efficiency. No wonder, "BTEQ on Unix/MVS (dunno about Windoze) is typically faster than FastExport until volumes get well into the millions, if not tens of millions, of rows."³

Secondly, FastExport and MultiLoad have full restart capability. This feature means that if a FastExport or MultiLoad job should be interrupted for some reason, it can be restarted again from the last checkpoint, without having to start the job from the beginning. BTEQ does not support such features.

Thirdly, MultiLoad can load multiple input files concurrently and work on up to five tables at a time, using multiple sessions. MultiLoad places a lock on on the destination table(s) to prevent user queries from getting inconsistent results before the data load or update is complete.

In conclusion, for complex bulk-mode routines, FastExport and MultiLoad are preferred utilities; otherwise, BTEQ is a suitable choice.

BTEQ vs Sql Assistant

BTEQ vs. SQL Assistant

BTEQ is similar to Teradata SQL Assistant in submitting queries to the Teradata Database and generating reports. Yet, they are different in the following aspects:

BTEQ connects to the database by means of CLIv2 (Call-Level Interface Version 2), whereas SQL Assistant does so by means of ODBC;
BTEQ works on Unix and Windows as well, whereas SQL Assistant is designed for Windows only;
BTEQ can run in batch mode for data-manipulating and reporting routines, whereas SQL Assistant cannot;
BTEQ is a Teradata specific application, and can not communicate with any other type of databases, whereas SQL Assistant can communicate with any ODBC-compliant databases;
BTEQ supports conditional logic (i.e., "IF..THEN..."), whereas SQL Assistant does not;
BTEQ's result sets can be more easily formatted to refined effects than SQL Assitant, because the latter applies its own formatting to the result sets.
BTEQ's import / export functions are much more versatile, flexible and programmable than those of SQL Assistant;
BTEQ is a command driven utility, whereas SQL Assistant is a GUI;
BTEQ is more oriented to IT professionals like DBA's and programmers, whereas SQL Assistant is more oriented to business users and casual data consumers

Pages