Structured Query Language (SQL)

Creating and Manipulating Relational Databases

Table Definition

Ø CREATE TABLE <table name> (

<column name> <data type> <constraint>,

<column name> <data type> <constraint>);

Ø CREATE TABLE Item (

Item_Identifier char(5) NOT NULL,

Item_Description char(15) NOT NULL,

CONSTRAINT pk_item PRIMARY KEY (Item_Identifier) );

Data Types (check your implementation for definition and availability)

Ø CHAR(n) Fixed length string of exactly n characters.

Ø NUMERIC(p,q) Decimal number, p digits with decimal point q digits from the right.

Ø INTEGER Signed integer.

Ø DECIMAL (p,q), FLOAT, DOUBLE PRECISION, REAL, SMALLINT, ...

Ø There are many synonyms, always check your implementation.

Table Alteration

Ø You can eliminate the need to control the order in which tables and constraints are created by executing all the create table statements and then adding FK constraints using alter table statements. PK constraints can be added at either stage.

Ø ALTER TABLE <tablename> ADD

Ø columns, constraints

Ø ALTER TABLE Employee ADD birthdate DATE;

Ø NOT NULL is not allowed with ALTER TABLE ADD column as all the rows will have a null value immediately after the column is added

Ø ALTER TABLE Employee ADD PRIMARY KEY(EmployeeID);

Ø ALTER TABLE Product ADD FOREIGN KEY(prod_group_code)

Ø REFERENCES Product_Group (prod_group_code);

Inserting values into a database

Ø INSERT INTO employees VALUES

(98765, 00011, Jones,Cecile, 822-8947, 3343, null);

Ø INSERT INTO employees VALUES

(98775, 00011, Cray,Jean, 822-8947, 3344, 834-3382);

QUERIES

Ø Queries in SQL are performed using the SELECT statement.

Ø The SELECT statement has 6 clauses:

Ø SELECT <attribute list>

Ø FROM <table list>

Ø [WHERE <selection criteria and join condition>]

Ø [GROUP BY <grouping attribute(s)>]

Ø [HAVING <group selection condition>]

Ø [ORDER BY <attribute list>]

Ø Clauses in brackets [ ... ] are optional

SELECT

Ø SELECT *

Ø * refers to all attributes in the order they were created in CREATE TABLE

Ø SELECT * FROM item;

Ø This statement projects all columns from all tuples from the specified table.

Ø SELECT supplier_identifier, supplier_name

Ø if a list of attributes is provided then only these attributes are projected and in the order specified.

Ø SELECT ALL / DISTINCT

Ø ALL means that all tuples are selected. DISTINCT means that only one tuple for each distinct value in the specified attributes will be selected.

Ø If not specified, the default is ALL

Ø Null values are considered equal and will be treated as duplicate values.

Ø As noted above, the FROM clause is required in all queries.

SELECT DISTINCT item_description

FROM supplier_item;

Ø The simplest query is of the form: SELECT * FROM tablename;

FROM

Ø the FROM clause specifies the tables in which the requested data is found.

Ø You can create a table alias in the from clause in order to make the select statement more readable.

SELECT i.item_description, s.supplier_name, si.item_code

FROM supplier_item si, supplier s, item i

WHERE si.item_identifier=i.item_identifier

AND si.supplier_identifier=s.supplier_identifier

WHERE

Ø The WHERE clause specifies the criteria for selecting individual tuples

(the relational SELECT operation)

Ø The WHERE clause specifies the join condition for joining two relations

(the relational JOIN operation)

Ø Single table queries have no join conditions, hence the where clause specifies only the criteria for selecting tuples.

Ø The standard comparison operators are used in the selection criteria

Ø = Equal

Ø <> Not equal

Ø > Greater than

Ø < Less than

Ø >= Greater than or Equal

Ø <= Less than or Equal

SELECT item_description

FROM item

WHERE item_identifier = 00722;

AND

Ø AND is used to specify multiple selection criteria, all of which must be met for a row to be selected.

Ø AND is useful when formulating search criteria involving multi-attribute keys:

SELECT unit_price

FROM supplier_item

WHERE item_identifier = '00722'

AND supplier_identifier = '92327';

Ø OR is used to specify multiple selection criteria, any of which must be met for a row to be selected.

Ø OR is useful to select rows outside a range:

SELECT unit_price

FROM supplier_item

WHERE item_identifier = '00722'

OR supplier_identifier = '92327';

NOT

Ø NOT is placed before an entire expression to negate the truth value

SELECT item_code

FROM supplier_item

WHERE NOT supplier_identifier = '92327';

WHERE supplier_identifier NOT = 123456 is incorrect

No abbreviations of logical conditions are allowed.

Cannot be written as:

SELECT *

FROM item

WHERE item_identifier = 12345

OR item_ identifier = 19961
OR item_ identifier = 19963

SELECT *

FROM item

WHERE item_ identifier = 12345
OR 19961

OR 19963

BETWEEN

SELECT *

FROM item

WHERE item_identifier BETWEEN 12000 AND 21000

BETWEEN is inclusive.

Ø IN selects those rows for which a specified value appears in a list of constant values enclosed in parentheses

SELECT *

FROM item

WHERE item_identifier IN (67211, 27412)

NULL

Ø NULL selects those rows for which a specified value is the special SQL null value.

SELECT *

FROM employees

WHERE pager IS NULL

LIKE

Ø LIKE selects those rows that match a given search pattern.

Ø LIKE can only be used with character data.

SELECT *

FROM supplier_item

WHERE item_code LIKE %345%

Ø The following wildcards are supported:

%Avenue%	any character string containing the word Avenue.
%Avenue	any character string consisting of any number of characters ending in the word Avenue.
Avenue%	Any character string beginning with the word Avenue.
b_g	Any three letter string beginning with b and ending in g. Only a three character string can match this pattern.
b_g%	Any character string beginning with b, followed by any single character, followed by g, followed by any string of zero or more characters.
%b_g%	Any character string that has a three character string beginning with b and ending with g anywhere in it.

WHERE specifies join conditions

Ø Join conditions are specified by a comparison between the values of two columns in different tables. Consider the following tables:

create table SUPPLIER

(

SUPPLIER_IDENTIFIER char(5) not null,

SUPPLIER_NAME char(30) not null,

SUPPLIER_ADDRESS char(45) not null,

constraint pk_supplier primary key (SUPPLIER_IDENTIFIER)

)

create table SUPPLIER_ITEM

(

ITEM_CODE char(5) not null,

ITEM_IDENTIFIER char(5) not null,

SUPPLIER_IDENTIFIER char(5) not null,

UNIT_PRICE money not null,

constraint pk_supplier_item primary key (ITEM_CODE)

)

Ø The join condition is based on the equivalence of the value of the foreign key supplier_identifier in the table supplier_item with the value of the primary key supplier_item in the table supplier.

Ø For every item, list the items identifier, code, unit price, and the name of the supplier that supplies that item at that price.

SELECT item_identifier, item_code, unit_price, supplier_name

FROM supplier_item si, supplier s

WHERE s.supplier_identifier = si.supplier_identifier

Aggregate Functions

Ø COUNT, SUM, MAX, MIN, AVG (your DBMS may have additional aggregate functions)

SELECT MAX(unit_price), MIN(unit_price), AVG(unit_price)

FROM supplier_item;

SELECT COUNT(item_identifier)

FROM supplier_item

Is equivalent to COUNT(*) which is a count of how many tuples in the relation.

SELECT COUNT (DISTINCT item_identifier)

FROM supplier_item

Is the number of distinct item identifiers in the table.

Ø Aggregate functions require the GROUP BY clause

Ø An aggregate function gives a result based on a group of rows in a table. Count, Sum, Max, Min, Average all are operations on groups of values.

Ø Remember that the result of an operation on a relation is another relation. Relations are rectangular in shape, thus any operation must produce a rectangular result.

Ø It follows that you cant mix aggregate functions, which return one value, with standard SQL statements that return a set of individual values. For example, the query

SELECT item_identifier, AVG(unit_price)

FROM supplier_item;

would result in multiple rows for the product_code part and a single row for the AVG(unit_price) part.

GROUP BY

Ø The purpose of the GROUP BY clause is to group those columns in the SELECT clause that are not the arguments of an aggregate function so that the resulting table will be rectangular.

Ø All attributes appearing in SELECT attribute list must appear in the GROUP BY list unless they are the argument of an aggregate function. This guarantees that each attribute in the SELECT clause will be single valued per group.

SELECT item_identifier, AVG(unit_price)

FROM supplier_item

GROUP BY item_identifier;

This query will give the expected results. The aggregate function is calculated on each group (product_code).

Another way to think about GROUP BY is that it specifies the control break for calculating the values of the aggregate functions.

HAVING

Ø HAVING selects groups of tuples just like the selection criteria in the WHERE clause selects individual tuples.

Ø Think of the HAVING clause as specifying the selection criteria for group values generated by aggregate functions.

v Which items have an average unit_price greater than 100?

SELECT item_identifier, AVG(unit_price)

FROM supplier_item

GROUP BY item_identifier;

HAVING AVG(unit_price) > 100;

ORDER BY

Ø ORDER BY permits the ordering of the tuples in the result of a query by one or more attributes. The default order is ascending. The order used for each attribute in the ORDER BY clause can be specified by following the attribute name with DESC for descending or ASC for ascending.

v Generate a list of item identifiers having an average unit price of more than 100. List the output in order of the item identifiers.

SELECT item_identifier, AVG(unit_price)

FROM supplier_item

GROUP BY item_identifier;

HAVING AVG(unit_price) > 100

ORDER BY item_identifier;

v Generate a list of item identifiers having an average unit price of more than 100. List the item identifiers in order of largest to smallest average unit price.

SELECT item_identifier, AVG(unit_price)

FROM supplier_item

GROUP BY item_identifier;

HAVING AVG(unit_price) > 100

ORDER BY AVG(unit_price) DESC;

Nested Select

Ø SELECT statements can be nested to an arbitrary number of levels. The innermost statement is evaluated first.

Ø Remember that the result of any query on a relation is another relation.

v Create a list that shows the items whose average unit price is greater than 100$. Your list should show the item id, description, and average price.

SELECT i.item_identifier, i.item_description, AVG(si.unit_price)

FROM item i, supplier_item si

WHERE i.item_identifier = si.item_identifier AND

i.item_identifier IN (SELECT item_identifier

FROM supplier_item

GROUP BY item_identifier

HAVING AVG(unit_price) > 100)

GROUP BY i.item_identifier, i.item_description

ORDER BY i.item_description;

CORRELATED QUERY

Ø The inner querys result is dependent on the row being processed by the outer query

Ø The subquery result varies with the row being processed by the outer query. The defining characteristic is a reference to one or more outer query tables in the WHERE or HAVING clause of the subquery.

Ø the subquery is re-evaluated for each row in the outer query (consumes resources).

v List items whose average unit price > 150

SELECT i.item_description

FROM item i

WHERE 150 <

(SELECT avg(unit_price)

FROM supplier_item si

WHERE i.item_identifier = si.item_identifier)

SQL VIEWS

Ø An SQL view is a single, virtual table derived from other tables.

Ø It exists not as the intension itself, but only as the instructions to create that intension.

Ø The tables from which the view is derived can be base tables (physically exist) or other views.

Ø The advantage of a view is that it is always up to date.

CREATE VIEW WhoSuppliesWhat AS

SELECT s.supplier_name, si.item_identifier, i.item_description,

si.item_code, si.unit_price

FROM supplier s, item i, supplier_item si

WHERE si.item_identifier=i.item_identifier

AND si.supplier_identifier=s.supplier_identifier;

CREATE VIEW Catalogue (Who, What, Describe, CatNumber, Price) AS

SELECT s.supplier_name, si.item_identifier, i.item_description,

si.item_code, si.unit_price

FROM supplier s, item i, supplier_item si

WHERE si.item_identifier=i.item_identifier

AND si.supplier_identifier=s.supplier_identifier;

Ø Not specifying the column names in the view means the view (virtual table) takes on the corresponding column names from the column list of the select statement from which it is created.

Ø Once created, the view acts as a regular table EXCEPT that its values cannot be updated. Updates are performed on the base tables on which the view is defined.

SELECT Who, Describe, Price

FROM catalogue

WHERE Describe = Sweater_1

ORDER BY Who;

DROP

Ø DROP is used to delete objects from the database

Ø DROP TABLE employees;

Ø DROP VIEW catalogue;

Ø ALTER TABLE item DROP COLUMN item_description CASCADE;

Ø CASCADE will drop not only the specified column, but also all view definitions and integrity constraints using this column. Note that this is not a cascading delete of values of attributes in other tables but rather of objects (views and constraints) defined using the dropped column.

SECURITY

Ø A second advantage of a view is that it provides some level of security.

Ø It limits the columns and rows that a user can see, thus limiting access to sensitive data

Ø It is read only, thus preventing the alteration of any data values.

Ø A second level of security is provided through passwords that permit connection to the database

Ø A third level of security is provided by according certain privileges to certain passwords. This is done through the GRANT statement which takes the following form:

GRANT privileges

ON object

TO users [WITH GRANT OPTION);

OBJECTS

Ø Objects are things on which privileges are granted and include the following:

Ø DOMAIN domain_name

Ø [TABLE] table_name (table includes views in this case)

Ø CHARACTER SET character-set_name

Ø COLLATION collation_name

Ø TRANSLATION translation_name

Ø The key word TABLE is not required for the table object.

Ø The privileges granted must make sense for the object they are granted on.

PRIVILEGES

Ø Privileges is a comma separated list of privileges, or the phrase ALL PRIVILEGES, on the specified object for which the current userID (the one issuing the GRANT) has grant authority.

Ø USAGE

Ø (to use a specific domain, character set, collation, or translation).

Ø SELECT

Ø (access all columns of a named table, including columns added later).

Ø INSERT(x) or UPDATE(x)

Ø (insert into or update a specific column x of a named table).

Ø INSERT or UPDATE

Ø (update all columns of a named table, including columns added later).

Ø DELETE

Ø (delete rows from a specified table).

Ø REFERENCES(x) or REFERENCES

Ø (to impose any integrity constraint on column x or all columns of a named table. In the case of REFERENCES this includes all columns added later).

GRANT

Ø An owner of an object is automatically granted all privileges that make sense for that object.

Ø If a privilege is granted with grant option then the recipient has grant authority for that privilege, which means that the recipient can in turn grant that same privilege, with or without grant option, to some further user.

GRANT delete

ON employees

TO personnelassistant

GRANT select

ON employees

TO PUBLIC

Ø PUBLIC is a special key word meaning all userids known to the system at any given time.

REVOKE

Ø REVOKE is used to take privileges on objects away from users. The user issuing the revoke must be the user who granted the privileges in the first place. The statement has the general form:

Ø REVOKE privileges ON object FROM users

Ø Dropping a base table, column, or view automatically revokes all privileges on the dropped object from all users.

Ø If A grants a privilege to B and C on Table T1 with grant option, and both B and C grant the privilege to D on T1, and then later B revokes this same privilege from D, D still retains the privilege on T1 based on Cs granting of the privilege.

NULL VALUES

Ø Ambiguous: can mean dont know or doesnt exist (no/0) or not applicable

Ø Nulls in the argument of aggregate functions are eliminated before the aggregate function is applied, except in the case of COUNT(*), where nulls are handled just like non-null values.

Ø Effect on conditional expressions:

Ø If A is null, or B is null, or both are null, then the following all evaluate to unknown:

Ø A=B, A<B, A>B, A<>B, A<=B, A>=B

Ø Two nulls are not considered equal: A=A evaluates to unknown.

Ø This means that we need to adopt a three-valued truth condition for any expression involving nulls

Ø true, false, unknown (null) = AMBIGUITY

AND	t	U	f	OR	T	u	f	NOT
T	t	U	f	t	T	t	t	t	f
U	u	U	f	u	T	u	u	u	u
F	f	F	f	f	T	u	f	f	t

NULLS in other conditions

Ø y BETWEEN x AND z is equivalent to x <= y AND y <= x

Ø either component may evaluate to null with the result of either unknown (one or both null) or false (one false and the other either false or null) depending on the value of the other component expression.

Ø string LIKE pattern

Ø is null if either string or pattern is null

Ø x IN (a, b)

Ø is equivalent to x = a OR x = b. The result will be unknown unless one of the conditions is true. If all are false and at least one unknown then the overall result is unknown.

Ø All rows that evaluate to false or unknown are eliminated from the WHERE and HAVING clauses.

Ø All nulls are considered equal for purposes of ORDER BY, however whether they are greater or less than all non-null values is implementation defined.

Ø An integrity constraint is an expression that must not evaluate to false. Hence it is not considered to be violated when it evaluates to unknown (formally it is not known if the constraint is violated).

Ø For the WHERE clause, unknown = false. For domain and check constraints, unknown = true.

Ø For FK constraints unknown is generally interpreted as false.

Ø PK constraint is defined as including NOT NULL.

DESIGN TO AVOID NULLS