Teradata Basics
Teradata Basics
Teradata is one of the popular Relational Database Management System. It is mainly suitable
for building large scale data warehousing applications. Teradata achieves this by the concept of
parallelism. It is developed by the company called Teradata.
History of Teradata
Following is a quick summary of the history of Teradata, listing major milestones.
1999 Largest database in the world using Teradata with 130 Terabytes.
2002 Teradata V2R5 released with Partition Primary Index and compression.
2011 Acquires Teradata Aster and enters into Advanced Analytics Space.
Features of Teradata
Following are some of the features of Teradata
1 | Page
Linear Scalability Teradata systems are highly scalable. They can scale up to 2048
Nodes. For example, you can double the capacity of the system by doubling the number
of AMPs.
Mature Optimizer Teradata optimizer is one of the matured optimizer in the market.
It has been designed to be parallel since its beginning. It has been refined for each
release.
SQL Teradata supports industry standard SQL to interact with the data stored in tables.
In addition to this, it provides its own extension.
Automatic Distribution Teradata automatically distributes the data evenly to the disks
without any manual intervention.
Teradata - Architecture
Teradata architecture is based on Massively Parallel Processing (MPP) architecture. The major
components of Teradata are Parsing Engine, BYNET and Access Module Processors (AMPs).
The following diagram shows the high level architecture of a Teradata Node.
2 | Page
Components of Teradata
The key components of Teradata are as follows
Node It is the basic unit in Teradata System. Each individual server in a Teradata
system is referred as a Node. A node consists of its own operating system, CPU,
memory, own copy of Teradata RDBMS software and disk space. A cabinet consists of
one or more Nodes.
Parsing Engine Parsing Engine is responsible for receiving queries from the client and
preparing an efficient execution plan. The responsibilities of parsing engine are
o Check if the user has required privilege against the objects used in the SQL query
o Prepare the execution plan to execute the SQL query and pass it to BYNET
3 | Page
o Receives the results from the AMPs and send to the client
Message Passing Layer Message Passing Layer called as BYNET, is the networking
layer in Teradata system. It allows the communication between PE and AMP and also
between the nodes. It receives the execution plan from Parsing Engine and sends to
AMP. Similarly, it receives the results from the AMPs and sends to Parsing Engine.
Access Module Processor (AMP) AMPs, called as Virtual Processors (vprocs) are the
one that actually stores and retrieves the data. AMPs receive the data and execution plan
from Parsing Engine, performs any data type conversion, aggregation, filter, sorting and
stores the data in the disks associated with them. Records from the tables are evenly
distributed among the AMPs in the system. Each AMP is associated with a set of disks
on which data is stored. Only that AMP can read/write data from the disks.
Storage Architecture
When the client runs queries to insert records, Parsing engine sends the records to BYNET.
BYNET retrieves the records and sends the row to the target AMP. AMP stores these records on
its disks. Following diagram shows the storage architecture of Teradata.
Retrieval Architecture
When the client runs queries to retrieve records, the Parsing engine sends a request to BYNET.
BYNET sends the retrieval request to appropriate AMPs. Then AMPs search their disks in
4 | Page
parallel and identify the required records and sends to BYNET. BYNET then sends the records
to Parsing Engine which in turn will send to the client. Following is the retrieval architecture of
Teradata.
Relational Database Management System (RDBMS) is a DBMS software that helps to interact
with databases. They use Structured Query Language (SQL) to interact with the data stored in
tables.
Database
Database is a collection of logically related data. They are accessed by many users for different
purposes. For example, a sales database contains entire information about sales which is stored
in many tables.
Tables
Tables is the basic unit in RDBMS where the data is stored. A table is a collection of rows and
columns. Following is an example of employee table.
5 | Page
101 Mike James 1/5/1980
Columns
A column contains similar data. For example, the column BirthDate in Employee table contains
birth_date information for all employees.
BirthDate
1/5/1980
11/6/1984
3/5/1983
12/1/1984
4/1/1983
6 | Page
Row
Row is one instance of all the columns. For example, in employee table one row contains
information about single employee.
Primary Key
Primary key is used to uniquely identify a row in a table. No duplicate values are allowed in a
primary key column and they cannot accept NULL values. It is a mandatory field in a table.
Foreign Key
Foreign keys are used to build a relationship between the tables. A foreign key in a child table is
defined as the primary key in the parent table. A table can have more than one foreign key. It
can accept duplicate values and also null values. Foreign keys are optional in a table.
7 | Page
BIGINT 8 -9,233,372,036,854,775,80 8 to +9,233,372,036,854,775,8 07
DECIMAL 1-16
NUMERIC 1-16
DATE 4 YYYYYMMDD
Teradata - Tables
Tables in Relational model are defined as collection of data. They are represented as rows and
columns.
Table Types
Permanent Table This is the default table and it contains data inserted by the user and
stores the data permanently.
8 | Page
Volatile Table The data inserted into a volatile table is retained only during the user
session. The table and data is dropped at the end of the session. These tables are mainly
used to hold the intermediate data during data transformation.
Global Temporary Table The definition of Global Temporary table are persistent but
the data in the table is deleted at the end of user session.
Derived Table Derived table holds the intermediate results in a query. Their lifetime is
within the query in which they are created, used and dropped.
Teradata classifies the tables as SET or MULTISET tables based on how the duplicate records
are handled. A table defined as SET table doesnt store the duplicate records, whereas the
MULTISET table can store duplicate records.
1 Create Table
2 Alter Table
ALTER TABLE command is used to add or drop columns from an existing table.
3 Drop Table
This chapter introduces the SQL commands used to manipulate the data stored in Teradata
tables.
Insert Records
9 | Page
INSERT INTO statement is used to insert records into the table.
Syntax
Example
Once the above query is inserted, you can use the SELECT statement to view the records from
the table.
10 | P a g e
EmployeeNo FirstName LastName JoinedDate DepartmentNo BirthDate
Syntax
Example
The following example inserts records into the employee table. Create a table called
Employee_Bkup with the same column definition as employee table before running the
following insert query.
11 | P a g e
FirstName,
LastName,
BirthDate,
JoinedDate,
DepartmentNo
FROM
Employee;
When the above query is executed, it will insert all records from the employee table into
employee_bkup table.
Rules
The number of columns specified in the VALUES list should match with the columns
specified in the INSERT INTO clause.
The data types of columns specified in the VALUES clause should be compatible with
the data types of columns in the INSERT clause.
Update Records
Syntax
UPDATE <tablename>
SET <columnnamme> = <new value>
[WHERE condition];
Example
The following example updates the employee dept to 03 for employee 101.
12 | P a g e
UPDATE Employee
SET DepartmentNo = 03
WHERE EmployeeNo = 101;
In the following output, you can see that the DepartmentNo is updated from 1 to 3 for
EmployeeNo 101.
Rules
If WHERE condition is not specified then all rows of the table are impacted.
You can update a table with the values from another table.
Delete Records
Syntax
Example
The following example deletes the employee 101 from the table employee.
13 | P a g e
DELETE FROM Employee
WHERE EmployeeNo = 101;
In the following output, you can see that employee 101 is deleted from the table.
Rules
If WHERE condition is not specified then all rows of the table are deleted.
You can update a table with the values from another table.
Syntax
SELECT
column 1, column 2, .....
FROM
tablename;
Example
14 | P a g e
103 Peter Paul 3/21/2007 2 4/1/1983
SELECT EmployeeNo,FirstName,LastName
FROM Employee;
When this query is executed, it fetches EmployeeNo, FirstName and LastName columns from
the employee table.
If you want to fetch all the columns from a table, you can use the following command instead of
listing down all columns.
The above query will fetch all records from the employee table.
WHERE Clause
WHERE clause is used to filter the records returned by the SELECT statement. A condition is
associated with WHERE clause. Only, the records that satisfy the condition in the WHERE
clause are returned.
15 | P a g e
Syntax
Example
ORDER BY
When the SELECT statement is executed, the returned rows are not in any specific order.
ORDER BY clause is used to arrange the records in ascending/descending order on any
columns.
Syntax
Example
The following query fetches records from the employee table and orders the results by
FirstName.
16 | P a g e
ORDER BY FirstName;
GROUP BY
GROUP BY clause is used with SELECT statement and arranges similar records into groups.
Syntax
Example
The following example groups the records by DepartmentNo column and identifies the total
count from each department.
DepartmentNo Count(*)
------------ -----------
3 1
1 1
2 3
17 | P a g e
Teradata - Logical and Conditional Operators
Teradata supports the following logical and conditional operators. These operators are used to
perform comparison and combine multiple conditions.
Syntax Meaning
= Equal to
IN If values in <expression>
AND Combine multiple conditions. Evaluates to true only if all conditions are
met
18 | P a g e
conditions is met.
BETWEEN
Example
The following example fetches records with employee numbers in the range between 101,102
and 103.
19 | P a g e
When the above query is executed, it returns the employee records with employee no between
101 and 102.
IN
Example
The following example fetches records with employee numbers in 101, 102 and 103.
NOT IN
NOT IN command reverses the result of IN command. It fetches records with values that dont
match with the given list.
20 | P a g e
Example
The following example fetches records with employee numbers not in 101, 102 and 103.
SELECT * FROM
Employee
WHERE EmployeeNo not in (101,102,103);
SET operators combine results from multiple SELECT statement. This may look similar to
Joins, but joins combines columns from multiple tables whereas SET operators combines rows
from multiple rows.
Rules
UNION
UNION statement is used to combine results from multiple SELECT statements. It ignores
duplicates.
Syntax
21 | P a g e
SELECT col1, col2, col3
FROM
<table 1>
[WHERE condition]
UNION
Example
22 | P a g e
104 75,000 5,000 70,000
The following UNION query combines the EmployeeNo value from both Employee and Salary
table.
SELECT EmployeeNo
FROM
Employee
UNION
SELECT EmployeeNo
FROM
Salary;
EmployeeNo
-----------
101
102
103
104
105
UNION ALL
UNION ALL statement is similar to UNION, it combines results from multiple tables including
duplicate rows.
Syntax
23 | P a g e
[WHERE condition]
UNION ALL
Example
SELECT EmployeeNo
FROM
Employee
UNION ALL
SELECT EmployeeNo
FROM
Salary;
When the above query is executed, it produces the following output. You can see that it returns
the duplicates also.
EmployeeNo
-----------
101
104
102
105
103
101
104
102
103
INTERSECT
24 | P a g e
INTERSECT command is also used to combine results from multiple SELECT statements. It
returns the rows from the first SELECT statement that has corresponding match in the second
SELECT statements. In other words, it returns the rows that exist in both SELECT statements.
Syntax
Example
Following is an example of INTERSECT statement. It returns the EmployeeNo values that exist
in both tables.
SELECT EmployeeNo
FROM
Employee
INTERSECT
SELECT EmployeeNo
FROM
Salary;
When the above query is executed, it returns the following records. EmployeeNo 105 is
excluded since it doesnt exist in SALARY table.
EmployeeNo
-----------
25 | P a g e
101
104
102
103
MINUS/EXCEPT
MINUS/EXCEPT commands combine rows from multiple tables and returns the rows which
are in first SELECT but not in second SELECT. They both return the same results.
Syntax
Example
SELECT EmployeeNo
FROM
Employee
MINUS
SELECT EmployeeNo
26 | P a g e
FROM
Salary;
EmployeeNo
-----------
105
Teradata provides several functions to manipulate the strings. These functions are compatible
with ANSI standard.
1 ||
2 SUBSTR
3 SUBSTRING
4 INDEX
5 POSITION
27 | P a g e
6 TRIM
7 UPPER
8 LOWER
Example
Following table lists some of the string functions with the results.
Date Storage
28 | P a g e
Dates are stored as integer internally using the following formula.
You can use the following query to check how the dates are stored.
Since the dates are stored as integer, you can perform some arithmetic operations on them.
Teradata provides functions to perform these operations.
EXTRACT
EXTRACT function extracts portions of day, month and year from a DATE value. This function
is also used to extract hour, minute and second from TIME/TIMESTAMP value.
Example
Following examples show how to extract Year, Month, Date, Hour, Minute and second values
from Date and Timestamp values.
29 | P a g e
SELECT EXTRACT(HOUR FROM CURRENT_TIMESTAMP);
EXTRACT(HOUR FROM Current TimeStamp(6))
---------------------------------------
4
SELECT EXTRACT(MINUTE FROM CURRENT_TIMESTAMP);
EXTRACT(MINUTE FROM Current TimeStamp(6))
-----------------------------------------
54
SELECT EXTRACT(SECOND FROM CURRENT_TIMESTAMP);
EXTRACT(SECOND FROM Current TimeStamp(6))
-----------------------------------------
27.140000
INTERVAL
Teradata provides INTERVAL function to perform arithmetic operations on DATE and TIME
values. There are two types of INTERVAL functions.
Year-Month Interval
YEAR
YEAR TO MONTH
MONTH
Day-Time Interval
DAY
DAY TO HOUR
DAY TO MINUTE
DAY TO SECOND
30 | P a g e
HOUR
HOUR TO MINUTE
HOUR TO SECOND
MINUTE
MINUTE TO SECOND
SECOND
Example
The following example adds 01 day, 05 hours and 10 minutes to current timestamp.
31 | P a g e
Teradata - Built-in Functions
Teradata provides built-in functions which are extensions to SQL. Following are the common
built-in functions.
Function Result
Date
SELECT DATE; --------
16/01/01
Date
SELECT CURRENT_DATE; --------
16/01/01
Time
SELECT TIME; --------
04:50:29
Time
SELECT CURRENT_TIME; --------
04:50:29
Current TimeStamp(6)
SELECT CURRENT_TIMESTAMP; --------------------------------
2016-01-01 04:51:06.990000+00:00
Database
SELECT DATABASE; ------------------------------
TDUSER
32 | P a g e
MAX Returns the large value of the specified column
Example
Consider the following Salary Table.
COUNT
The following example counts the number of records in the Salary table.
Count(*)
-----------
5
MAX
The following example returns maximum employee net salary value.
33 | P a g e
Maximum(NetPay)
---------------------
83000
MIN
The following example returns minimum employee net salary value from the Salary table.
Minimum(NetPay)
---------------------
36000
AVG
The following example returns the average of employees net salary value from the table.
Average(NetPay)
---------------------
65800
SUM
The following example calculates the sum of employees net salary from all records of the
Salary table.
Sum(NetPay)
-----------------
329000
34 | P a g e
CASE Expression
CASE expression evaluates each row against a condition or WHEN clause and returns the result
of the first match. If there are no matches then the result from ELSE part of returned.
Syntax
CASE <expression>
WHEN <expression> THEN result-1
WHEN <expression> THEN result-2
ELSE
Result-n
END
Example
The following example evaluates the DepartmentNo column and returns value of 1 if the
department number is 1; returns 2 if the department number is 3; otherwise it returns value as
invalid department.
35 | P a g e
SELECT
EmployeeNo,
CASE DepartmentNo
WHEN 1 THEN 'Admin'
WHEN 2 THEN 'IT'
ELSE 'Invalid Dept'
END AS Department
FROM Employee;
The above CASE expression can also be written in the following form which will produce the
same result as above.
SELECT
EmployeeNo,
CASE
WHEN DepartmentNo = 1 THEN 'Admin'
WHEN DepartmentNo = 2 THEN 'IT'
ELSE 'Invalid Dept'
END AS Department
FROM Employee;
COALESCE
COALESCE is a statement that returns the first non-null value of the expression. It returns
NULL if all the arguments of the expression evaluates to NULL. Following is the syntax.
36 | P a g e
Syntax
COALESCE(expression 1, expression 2, ....)
Example
SELECT
EmployeeNo,
COALESCE(dept_no, 'Department not found')
FROM
employee;
NULLIF
Syntax
NULLIF(expression 1, expression 2)
Example
The following example returns NULL if the DepartmentNo is equal to 3. Otherwise, it returns
the DepartmentNo value.
SELECT
EmployeeNo,
NULLIF(DepartmentNo,3) AS department
FROM Employee;
The above query returns the following records. You can see that employee 105 has department
no. as NULL.
37 | P a g e
101 1
104 2
102 2
105 ?
103 2
Primary index is used to specify where the data resides in Teradata. It is used to specify which
AMP gets the data row. Each table in Teradata is required to have a primary index defined. If
the primary index is not defined, Teradata automatically assigns the primary index. Primary
index provides the fastest way to access the data. A primary may have a maximum of 64
columns.
Primary index is defined while creating a table. There are 2 types of Primary Indexes.
If the table is defined to be having UPI, then the column deemed as UPI should not have any
duplicate values. If any duplicate values are inserted, they will be rejected.
The following example creates the Salary table with column EmployeeNo as Unique Primary
Index.
38 | P a g e
Non Unique Primary Index (NUPI)
If the table is defined to be having NUPI, then the column deemed as UPI can accept duplicate
values.
The following example creates the employee accounts table with column EmployeeNo as Non
Unique Primary Index. EmployeeNo is defined as Non Unique Primary Index since an
employee can have multiple accounts in the table; one for salary account and another one for
reimbursement account.
Teradata - Joins
Join is used to combine records from more than one table. Tables are joined based on the
common columns/values from these tables.
Inner Join
Self Join
39 | P a g e
Cross Join
INNER JOIN
Inner Join combines records from multiple tables and returns the values that exist in both the
tables.
Syntax
Example
40 | P a g e
EmployeeNo Gross Deduction NetPay
The following query joins the Employee table and Salary table on the common column
EmployeeNo. Each table is assigned an alias A & B and the columns are referenced with the
correct alias.
When the above query is executed, it returns the following records. Employee 105 is not
included in the result since it doesnt have matching records in the Salary table.
OUTER JOIN
41 | P a g e
LEFT OUTER JOIN and RIGHT OUTER JOIN also combine the results from multiple table.
LEFT OUTER JOIN returns all the records from the left table and returns only the
matching records from the right table.
RIGHT OUTER JOIN returns all the records from the right table and returns only
matching rows from the left table.
FULL OUTER JOIN combines the results from both LEFT OUTER and RIGHT
OUTER JOINS. It returns both matching and non-matching rows from the joined tables.
Syntax
Following is the syntax of the OUTER JOIN statement. You need to use one of the options from
LEFT OUTER JOIN, RIGHT OUTER JOIN or FULL OUTER JOIN.
Example
Consider the following example of the LEFT OUTER JOIN query. It returns all the records
from Employee table and matching records from Salary table.
42 | P a g e
When the above query is executed, it produces the following output. For employee 105, NetPay
value is NULL, since it doesnt have matching records in Salary table.
CROSS JOIN
Cross Join joins every row from the left table to every row from the right table.
Syntax
When the above query is executed, it produces the following output. Employee No 101 from
Employee table is joined with each and every record from Salary Table.
43 | P a g e
Teradata - SubQueries
A subquery returns records from one table based on the values from another table. It is a
SELECT query within another query. The SELECT query called as inner query is executed first
and the result is used by the outer query. Some of its salient features are
A query can have multiple subqueries and subqueries may contain another subquery.
If subquery returns only one value, you can use = operator to use it with the outer query.
If it returns multiple values you can use IN or NOT IN.
Syntax
Following is the generic syntax of subqueries.
Example
Consider the following Salary table.
The following query identifies the employee number with highest salary. The inner SELECT
performs the aggregation function to return the maximum NetPay value and the outer SELECT
query uses this value to return the employee record with this value.
44 | P a g e
SELECT EmployeeNo, NetPay
FROM Salary
WHERE NetPay =
(SELECT MAX(NetPay)
FROM Salary);
45 | P a g e