Thursday, 27 February 2014

Joins In TERADATA


What is Join? Explian about Inner and Outer Joins?

We can use Join condition whenever we need to retrieve data from two (or) more tables.

Join can be used in either Where or From clause. But Join specification in From clause is recommended. We can also use Having and Where clause for filtering the data.



Joins can be classified into following types :


1.Inner Join (Equi Join & Natural Join)
2.Outer Join (Left Outer Join, Right Outer Join & Full Join)
3.Self Join
4.Cross Join

Inner JoinTo retrieve matched records we can use comparison operators (i.e. =, < >).Whenever we need to get the matched records from two tables,this type of Join is used. 


Inner Join Also call as Natural Join or Equi Join.

Syntax:
Select emp.emp_id, emp. sal, ph.phone_no 
from employee emp Inner Join phone ph
on emp.emp_id = ph. emp_id
where emp.sal > 2000


Outer Join : To retrieve matched and unmatched records from two tables we can use Outer Join. Unmatched rows will be displayed as NULL from both tables.

Outer Join can be classified as: Left Outer Join , Right Outer Join and Full Outer Join.

Left Outer Join: All matched and unmatched records from left table will be displayed and unmatched records will be displayed as NULL from left table.But from right table only matched records will be displayed as it isand unmatched records will not be displayed.

Syntax:

Select emp.emp_id, emp. sal, ph.phone_no 
from employee emp Left Join phone ph
on emp.emp_id = ph.emp_id
where emp.sal > 2000 ;

Right Outer Join: All Records from right table will be displayed either matched or unmatched and unmatched records will be displayed as NULL from right table.But from left table only matched records will be displayedas it is and unmatched records will not be displayed.

Syntax:


Select emp.emp_id, emp. sal, ph.phone_no 

from employee emp Right Join phone ph
on emp.emp_id = ph.emp_id
where emp.sal > 2000 ;

Full Outer Join:  All matched and unmatched records from both left table and right table will be displayed and unmatched records will be displayed as NULL from both the tables.

Cross Join: This Join returns all rows from the left table, each and every row from the left table is in combination with all rows from the right table. Cross join also called  as Cartesian Product Join.

Syntax:
SELECT au.au_fname, au.au_lname, p.pub_name 
FROM authors AS au INNER JOIN publishers AS p 
ON au.city = p.city AND au.state = p.state 
ORDER BY au.au_lname ASC, au.au_fname ASC ;

Self Join: A table can be joined to itself is a self-join. 

For example, we can use a self-join to find out the authors in New Zealand who live in the same ZIP Codearea.


SELECT au.au_fname, au.au_lname, a.au_fname, au2.au_lname FROM authors au INNER Jauthors a 
ON au.zip = a.zip 
WHERE au.city = ' New Zealand' 
ORDER BY au.au_fname ASC, au.au_lname ASC ;

What is Join? Explian about Inner and Outer Joins?

We can use Join condition whenever we need to retrieve data from two (or) more tables.

Join can be used in either Where or From clause. But Join specification in From clause is recommended. We can also use Having and Where clause for filtering the data.



Joins can be classified into following types :


1.Inner Join (Equi Join & Natural Join)
2.Outer Join (Left Outer Join, Right Outer Join & Full Join)
3.Self Join
4.Cross Join

Inner JoinTo retrieve matched records we can use comparison operators (i.e. =, < >).Whenever we need to get the matched records from two tables,this type of Join is used. 


Inner Join Also call as Natural Join or Equi Join.

Syntax:
Select emp.emp_id, emp. sal, ph.phone_no 
from employee emp Inner Join phone ph
on emp.emp_id = ph. emp_id
where emp.sal > 2000


Outer Join : To retrieve matched and unmatched records from two tables we can use Outer Join. Unmatched rows will be displayed as NULL from both tables.

Outer Join can be classified as: Left Outer Join , Right Outer Join and Full Outer Join.

Left Outer Join: All matched and unmatched records from left table will be displayed and unmatched records will be displayed as NULL from left table.But from right table only matched records will be displayed as it isand unmatched records will not be displayed.

Syntax:

Select emp.emp_id, emp. sal, ph.phone_no 
from employee emp Left Join phone ph
on emp.emp_id = ph.emp_id
where emp.sal > 2000 ;

Right Outer Join: All Records from right table will be displayed either matched or unmatched and unmatched records will be displayed as NULL from right table.But from left table only matched records will be displayedas it is and unmatched records will not be displayed.

Syntax:


Select emp.emp_id, emp. sal, ph.phone_no 

from employee emp Right Join phone ph
on emp.emp_id = ph.emp_id
where emp.sal > 2000 ;

Full Outer Join:  All matched and unmatched records from both left table and right table will be displayed and unmatched records will be displayed as NULL from both the tables.

Cross Join: This Join returns all rows from the left table, each and every row from the left table is in combination with all rows from the right table. Cross join also called  as Cartesian Product Join.

Syntax:
SELECT au.au_fname, au.au_lname, p.pub_name 
FROM authors AS au INNER JOIN publishers AS p 
ON au.city = p.city AND au.state = p.state 
ORDER BY au.au_lname ASC, au.au_fname ASC ;

Self Join: A table can be joined to itself is a self-join. 

For example, we can use a self-join to find out the authors in New Zealand who live in the same ZIP Codearea.


SELECT au.au_fname, au.au_lname, a.au_fname, au2.au_lname FROM authors au INNER Jauthors a 
ON au.zip = a.zip 
WHERE au.city = ' New Zealand' 
ORDER BY au.au_fname ASC, au.au_lname ASC ;

Teradata Sql

Introduction to Teradata SQL



SQL – It stands for Structured Query Language. This is the language for accessing and manipulating any Relational Database Management Systems. SQL is a non-procedural and set oriented language, meaning it contains no procedural-type statements. 

Teradata SQL is based on ANSI standards. ANSI stands for American National Standards Institute. Teradata SQL is fully certified at the SQL92 Entry level, with some intermediate, some full and some SQL-99 Core features also implemented. To enhance the capabilities of SQL, Teradata has added extensions that are unique to Teradata. 

SQL Commands- There are three categories of SQL Commands: 

Data Definition Language (DDL) 
Data Manipulation Language (DML) 
Data Control Language (DCL)

Categories of SQL Commands



Data Definition Language (DDL) - this is used for creating various database objects like tables, views, macros, users etc.
The following table summarizes the basic DDL Statements : 

TYPE
PURPOSE
CREATETo define a database object like new table, new macro, trigger, index, stored procedure, view.
DROPTo remove database objects like table, index, macro, stored procedure, view.
ALTERModify table, column, trigger or referential constraints or recompile stored procedure
RENAMERename a table, view, trigger, stored procedure
SETSpecifies time zones and the collation or character set for a session
COLLECT STATISTICSCollect Stats on column/columns or index

Data Manipulation Language (DML) - This is used for the manipulation of data like inserting the records, deleting the records, updating the records etc.

TYPE
PURPOSE
INSERTInserting data into a table
UPDATEModifies data in one or more rows in a table
DeleteRemoving data from table
CommentInsert a text comment for a table
CheckpointCHECKPOINT is a function that writes records to a restart log table that you can use to restart in case of a hardware or software system failure.
DatabaseIndicates a default database
MergeCombines both UPDATE and INSERT in a single SQL statement.

Data Control Language(DCL):
database objects or controlling ownership of those objects by using DCL  commands.

SQL statement
Function
GRANT
To give user privileges.
REVOKE
To remove user privileges.
GIVE
To transfer database ownership.

Teradata Architecture

Teradata Architecture

Teradata Database System are 2 Types       A) SMP(Symmetric Multi-Processing)
                                                                       B)MPP(Massively Parallel Processing)
A) SMP(Symmetric Multi-Processing): Teradata Database System is called SMP System, if it have single NODE that contains multiple CPU Sharing Memory Pool.
Node Architecture:

Node:   
1)      Node is important building block of the Teradata Database System and it is collection of Hardware and Software Components.
2)      A server can also be called as a Node.
3)      Channel Attached System and Network Attached Systems can be connect to a Node.
4)      Channel Driver and Teradata Gateway are the applications which runs under the Operating System as Processors.
5)      Remaining Components runs under PDE.
6)      PE and AMP are virtual Processors, where BYNET is the Communication Layer between PE and AMP.
PDE(Parallel Database Extention):
It Runs Teradata Components Parallel(PE,AMP and ByNet)
TPA(Trusted Parallel Application):
1)      A Database is  Pure Parallel or Trusted Parallel Application, if it runs Under PDE.
2)      Teradata is a Database which running under PDE, so we call Teradata is Trusted Parallel Database.
Note: For Giga Bytes of Data Processing with minimum operations, This ARCHITECTURE is Recommended.
PE(Parsing Engine):
1)      PE Takes SQL Request and Delivers Sql Response.
2)      It manages individual Sessions (up to 120).
3)      Parsing and Optimizing your SQL Requests.
4)      Dispatching the optimized plan to AMPs.
5)      Sending the answer set response back to  requesting Client.
AMP(Access Module Processor):
1)      AMP is the heart of Teradata which does most of the operations for data storage and retrieval.
2)      Each AMP is responsible for managing a portion of MAIN DISK SPACE(Virtual Disks).
3)      This space is not sharable by any other AMPs so we call this Architecture is Shared Nothing Architecture.
4)      Each AMP contain Database Management Sub System, it perform below operations.
i)        Performing DDL
ii)      Performing DML
iii)    Performing Joins
iv)    Performs Applying and Releasing locks
v)      Performing Aggregate Operations
vi)    Performing OLAP Operations
vii)  Performing SORTING Operations
BYNET (Banyan Net Work):
1)      Carries messages between the AMPs and PEs
2)      Provides Point-to-Point and Broadcast communications
3)      Merging answer sets and sends back to the PE and Making Teradata parallelism possible





Tuesday, 25 February 2014

Data Loading from Flat file to Teradata

Direct Data Loading (Flat file to Teradata)
Direct loading of tables from flat file in SQL Assistant(queryman):
This is quite simple process. But I'm writing it down for the newbees :)
Step1: Create the table in database (it can also be an existing table, with data already in it)
Step2: The sequence of columns in table and flat file should match
Step3: Delimiter used in flat file is usually pipe "|"
Step4: Set queryman to run in Import mode (File -> Import Data)
Step5: Execute sql:
       
INSERT INTO DBNAME.TBLNAME
       
VALUES(?,?,?,?,?);
According to the number of columns in the table/file, the number of parameters (?) change.
Step6: The "Import file" dialog box will open. Choose the flat file that contains data to be loaded.
Step7: You are done! :-p

Note: Try not to use this method for large number of records. It will consume a lot of time and resources.
I suggest fastload/Multiload for such situations

How does indexing improve query performance?

How does indexing improve query performance?

Indexing is a way to physically reorganise the records to enable some frequently used queries to run faster.


The index can be used as a pointer to the large table. It helps to locate the required row quickly and then return ot back to the user.
or
The frequesntly used queries need not hit a large table for data. they can get what they want from the index itself. - cover queries.

Index comes with the overhead of maintanance. Teradata maintains its index by itself. Each time an insert/update/delete is done on the table the indexes will also need to be updated and maintained.

Indexes cannot be accessed directly by users. Only the optimizer has access to the index
.

Top vs Sample

TOP VS SAMPLE

There was a question raised in my team's discussion forum as to which one of these is better to use to get a sample of the data? There were different kinds of suggestions from various online forums, but none had the reason behind why one is better than the other.
I read through TD documentation and this is what I put forth to my group:

TOP:
1. The system evaluates the top n operator after all other clauses in the SELECT request have been evaluated. (So, I think it doesn’t help in data processing time. Must only affect the I/O time)
2. It gives the ranked result only when you specify order by clause. So, If you don’t specify order by, results can be any n rows.

SAMPLE:
1. SAMPLE clause comes with a keyword ‘RANDOMIZED ALLOCATION’ that specifies whether retrieved rows are sampled randomly across the amps or proportionate to the number of qualified rows per AMP. Proportional allocation is default.
2. SAMPLE also operates on the evaluated output. Hence, it does not improve query processing time.

Teradata Interview Questions1


Interview Questions



Question:  1 For which two reasons would Partitioned Primary Index (PPI) be considered on a table? (Choose two.) 
A. To eliminate ranges of data values from a query and thereby reduce scan time
B. To reduce response time by placing all data in the same partition on the same AMP
C. To reduce the amount of I/O needed to process queries
D. To reduce the number of columns in a table to those needed by most queries 
Answer: A, C 

Question:  2 How is system availability guaranteed, if a node fails in a MPP environment with multi-node cliques? 
A. The AMPs on the failed node migrate to the other nodes in the clique.
B. The remaining nodes migrate to an alternate clique.
C. The RAID groups owned by the AMPs on the failing nodes are shut down.
 D. The RAID groups owned by the AMPs on the failing nodes migrate to AMPs already running on the remaining nodes in the clique. 
Answer: A 
Question:  3 On the subject of FastLoad, which two statements are correct? (Choose two.) 
A. No journaling
 B. Can maintain NUSIs but not USIs
C. No secondary indexes on the target table
 D. Loads duplicate rows into a multiset table 
Answer: A, C 
Question:  4 Which of the following applies to Non-Partitioned Primary Index (NPPI) tables when determining  the size of a row in Teradata? (Choose two.) 
A. Two bytes for variable length accom modation                                  B. Eight bytes per row for the rowid C. Two bytes per row for row length                                                D. One byte per column for com pression E. Four bytes for the primary index 
Answer: B, C 

Question:  5 Which two statements referring to a dimensional data model are correct? (Choose two.) 
A. It often implements fewer tables.                                          B. It is extremely flexible.                                                     C. It requires significantly less ETL activity.                             D. It reduces the need for summary tables. E. It can be adapted to optimize performance. 
Answer: A, E  
Question:  6 What type of query population is helped by a sparse index? 
A. Queries that often redistribute a large table
B. Queries that repeatedly use the same subset of the table
C. Queries that often return a large percent of the rows of a large table based on the primary key
 D. Queries that often eliminate a large percent of the rows of a large table using a superset of the table
E. Queries that rarely redistribute a large table in com plex join processing 
Answer: B 
Question:  7 When is a fully covering join index advantageous to a query? 
A. When the base table is small enough to fit in cache
B. When there is an unpredictable selection of columns from the base table
C. When relatively few columns are in the result and join or selection criteria includes values other than the base tables' primary index(es)
D. When the result set is produced through either join or selection criteria on the base tables' primary index(es) 
Answer: C 
Question:  8 The allocation of system CPU resources in a mixed workload environment can be improved by  executing which three of the following tasks? (Choose three.) 
A. Use the MODIFY USER statement to change the ACCOUNT for a user
B. Modify the Default Resource Partition to a lower weight than other Resource Partitions
C. Use the MODIFY USER statement to give a user more spool space
 D. Maintain development and test users in separate Allocation Groups
E. Define Allocation Groups for different times of the day 
Answer: A, D, E 
Question:  9 Which of the following are two true statements regarding "soft" Referential Integrity (RI)? (Choose  two.) 
A. The RI is presumed to be enforced outside of the database.        
B. The RI is enforced by the database.
 C. The PK columns can appear anywhere in the WHERE clause.
 D. Redundant joins between soft RI defined tables are eliminated. 
Answer: A, D 
Question:  10 A join index would be added to a set of tables for what reason? 
A. To enforce Referential Integrity across tables
B. To ensure same distribution of both tables
 C. To enforce data uniqueness
 D. To decrease response time on well known queries
Answer: D 
Question:  11 Which are two features of the Multi Load utility? (Choose two.) 
A. Up to ten tables can be specified in a single execution.
 B. The upsert function is supported.
 C. Insert, update, and delete are all supported.
D. It uses macros to improve performance.
 E. The processing rate can be controlled. 
Answer: B, C 
Question:  12 Which two of the following statements are correct in relation to data marts? (Choose two.) 
A. Dependent data marts are isolated entities entirely separate from the enterprise data warehouse.
 B. A data mart is generally a relatively small application-specific subset of the data warehouse.
C. A logical data mart is a form of dependent data mart that is constructed virtually from the physical data warehouse.
D. Independent data marts are derived from the enterprise data warehouse. 
Answer: B, C 
Question:  13 A Partitioned Primary Index (PPI), NUSIs, and a USI is defined on Table X. which two of the  following can you employ for daily maintenance? (Choose two.) 
A. Multi Load
B. TPump
C. FastLoad into an empty Table Y, then INSERT/SELECT into Table X
D. Teradata Warehouse Builder Load Operator 
Answer: B, C 
Question:  14 Which three of the following should be included in an analysis that defines TDQM restrictions?  (Choose three.) 
A. A list of users whose access must be above restriction
 B. A listing of the requirements for critical applications and the times of day that are most critical
C. The client resources available during the peak periods
 D. An understanding of the timing requirements for maintenance of specific tables or databases
 E. A projection of the growth pattern for major tables in the database 
Answer: A, B, D 
Question:  15 What are you calculating when you analyze volatility as a factor for selecting indexes? 
A. How unique the column values are
 B. How frequently the column value changes
C. The difference between the maximum and minimum values
D. How frequently the column is used for access 
Answer: B 
Question:  16 An empty target table is required by which two utilities when loading data? (Choose two.) 
A. FastExport                                                                                              B. FastLoad                                 C. MultiLoad D. Teradata Warehouse Builder Load operator 
Answer: B, D 
Question:  17 A RAID configuration for system availability has what advantage? 
A. If a node fails, the AMPs running on that node migrate to the other nodes in a different clique.
B. If a single drive fails, work can proceed because the disk controller can reconstruct the data.
 C. If multiple drives fail in the same drive group, work can proceed using the redundant copies.
D. A disk array cabinet failure does not interrupt operation since RAID forces redundancy across RAID cabinets. 
Answer: B 
Question:  18 If you are defining Roles in Teradata, what are two questions that you should ask yourself?  (Choose two.) 
A. Will this user need more than just SELECT access to the data?
B. From which network path will this user access Teradata?
C. What characteristics are important for this user's password controls?
 D. Will this user be assigned to perform more than one job function? 
Answer: A, D 
Question:  19 For which two reasons would you choose a NUPI? (Choose two.) 
A. It allows clustering of data rows                                      B. It facilitates AMP-local joins of matching sets. C. It is a closer match for the primary key.                  D. It allows the user to generate a much larger table. E. It allows duplicate rows in a set table. 
Answer: A, B 
Question:  20 Fast Export is best described by which three of the following? (Choose three.) 
A. Exports from multiple tables                                                                     B. Uses Support Environment                                 C. Uses multiple sessions                             D. Exports data to multiple files without use of OUTMOD 
Answer: A, B, C 
Question:  21 The difference between the COPY and RESTORE operations of the ARC utility is best described  by which two of the following statements? (Choose two.) 
A. A RESTORE operation moves the data from an archived file back to any Teradata system and creates a new table if one does not already exist on the target database.
 B. A COPY operation moves the data from an archived file back to the same Teradata system from which it was archived or moves data to a different Teradata system as long as the database DBC is already restored.
C. A COPY operation moves the data from an archived file back to any Teradata system and   creates a new table if one does not already exist on the target database.
D. A RESTORE operation moves the data from an archived file back to the same Teradata system from which it was archived or moves data to a different Teradata system as long as the database DBC is already restored. 
Answer: C, D 
Question:  22 Which are three factual statements concerning the creation and use of indexes? (Choose three.) 
A. All secondary, hash, and join indexes are stored in sub tables, so they require additional storage space.
 B. Index sub tables are updated each time an indexed column value in the base table is updated or deleted.
C. Hash indexes are created in order to allow the Optimizer to choose hash joins.
 D. A Unique Secondary Index entry might not reside on the same AMP as the base table row. 
Answer: A, B, D 
Question:  23 For what two reasons are allowing columns to follow consistent data types administered at the  domain level important? (Choose two.) 
A. Reduces data redistribution                                                             B. Reduces data storage requirements C. Avoids data conversions                                                                    D. Eliminates data com pression             E. Eliminates need for constraints 
Answer: A, C 
Question:  24 Com pany.com 's data warehouse environment has critical data that needs to be available 24  hours a day, 7 days a week. How can continual availability be ensured? 
A. Archive data and keep a copy of the archived data offsite
B. Develop and implement a strategy for simultaneously or near simultaneous loading of critical data to an offsite dual system
 C. Send archived data to the offsite system to be loaded on a regular basis
 D. Keep a dual system offsite and restore the data as needed 
Answer: B 
Question:  25 Sufficient data protection can be provided by a backup of your entire database using the ARC  utility, in which three of the following circumstances? (Choose three.) 
A. When there are accidentally dropped tables, views, or macros
 B. When there is a loss of the data on multiple AMPs within the same cluster
C. When there is a loss of data on an AMP for non-Fallback tables
 D. When global temporary tables are in use at the time of the archive operation 
Answer: A, B, C 
Question:  26 A table that has a join index can be updated using which two methods? (Choose two.) 
A. MultiLoad                                                                                                        B. TPump                                             C. FastLoad into an empty table and INSERT/SELECT into the target table                                                D. Teradata Warehouse Builder Apply Operator 
Answer: B, C 
Question:  27 When you run a query against a table without Fallback if an AMP is down, how will it be affected? 
A. The query will execute only if it does not require data from the down AMP.                                              B. The query will always execute and return data from only the AMPs that are available.                                 C. The query will not execute under these conditions.                                                                                                        D. The query will execute and return data from all AMPs. 
Answer: A 
Question:  28 Which of the following is an accurate statement when creating an identity column as  GENERATED ALWAYS? 
A. The column may be updated by an SQL statement.
B. The column may be used to define the primary index.
C. The column is used to facilitate copying data from one table into another.
 D. The column may be used in a join index. 
Answer: B 
Question:  29 For columns with which two characteristics should you consider the com pression allowed in  Teradata, when designing tables? (Choose two.) 
A. Numeric columns with a large number of leading zeros
B. Columns with a large number of null occurrences
C. Variable length character columns with nulls
D. Columns with a large number of default occurrences 

Answer: B, D