Database Memo

Database level SQL commands

Creating and deleting database

CREATE DATABASE testDB;
DROP DATABASE testDB;

Creating, altering and dropping table

CREATE TABLE Persons (
    PersonID int,
    LastName varchar(255),
    FirstName varchar(255),
    Address varchar(255),
    City varchar(255) 
);
DROP TABLE Shippers;
TRUNCATE TABLE Shippers;
ALTER TABLE table_name ADD COLUMN column_name datatype;
ALTER TABLE table_name MODIFY COLUMN column_name datatype;
ALTER TABLE table_name DROP COLUMN column_name;

Data handling SQL commands

SELECT

SELECT * FROM Customers;
SELECT CustomerName, City FROM Customers;
SELECT DISTINCT Country FROM Customers;
SELECT CustomerID as ID, CustomerName AS Customer FROM Customers;
SELECT CustomerName, Address + ', ' + PostalCode + ' ' + City + ', ' + Country AS Address FROM Customers;

WHERE clause

SELECT * FROM Customers WHERE Country='Mexico';

WHERE conditions:

- columnName = value
- columnName LIKE 'pattern%'
- columnName IS NULL
- columnName IN ('listElement1', 'listElement2')
- columnName1 IN (SELECT columnName2 FROM table2)
- columnName BETWEEN x AND y
- EXISTS (SELECT columnName FROM table WHERE condition)
- columnName1 = ANY (SELECT columnName2 FROM table2 WHERE condition2)
- columnName1 = ALL (SELECT columnName2 FROM table2 WHERE condition2)
- condition1 AND condition2
- condition1 OR condition2
- NOT condition

TOP

SELECT TOP 3 * FROM Customers;
SELECT * FROM Customers LIMIT 3;
SELECT TOP 50 PERCENT * FROM Customers;

UNION

SELECT City FROM Customers UNION SELECT City FROM Suppliers;
SELECT City FROM Customers UNION ALL SELECT City FROM Suppliers;

Basic math

SELECT MIN(Price) AS SmallestPrice FROM Products;
SELECT MAX(Price) AS LargestPrice FROM Products;
SELECT COUNT(ProductID) FROM Products;
SELECT AVG(Price) FROM Products;
SELECT SUM(Quantity) FROM OrderDetails;

ORDER BY

SELECT * FROM Customers ORDER BY Country;
SELECT * FROM Customers ORDER BY Country ASC, CustomerName DESC;

GROUP BY, HAVING

SELECT COUNT(CustomerID), Country FROM Customers GROUP BY Country;
SELECT COUNT(CustomerID), Country FROM Customers GROUP BY Country HAVING COUNT(CustomerID) > 5;

Joins

Here are the different types of the JOINs in SQL:

  • JOIN, INNER JOIN: returns records that have matching values in both tables
innerjoin.gif
  • LEFT JOIN, LEFT OUTER JOIN: returns all records from the left table, and the matched records from the right table
leftjoin.gif
  • RIGHT JOIN, RIGHT OUTER JOIN: returns all records from the right table, and the matched records from the left table
rightjoin.gif
  • FULL JOIN, FULL OUTER JOIN: returns all records when there is a match in either left or right table
fullouterjoin.gif
SELECT Orders.OrderID, Customers.CustomerName, Orders.OrderDate FROM Orders JOIN Customers ON Orders.CustomerID=Customers.CustomerID;

UPDATE

UPDATE Customers SET ContactName = 'Alfred Schmidt', City= 'Frankfurt' WHERE CustomerID = 1;

INSERT/SELECT INTO

INSERT INTO Customers (CustomerName, City, Country) VALUES ('Bob', 'New York', 'USA');
INSERT INTO Customers (CustomerName, City, Country) SELECT SupplierName, City, Country FROM Suppliers;
SELECT * INTO CustomersGermany FROM Customers WHERE Country = 'Germany';

DELETE

DELETE FROM Customers WHERE CustomerName='Bob';

Comments

-- This is an SQL comment.

Advanced standard SQL concepts

Views

In SQL, a view is a virtual table based on the result-set of an SQL statement.

CREATE VIEW ViewName AS SELECT Column1, Column2 FROM TableName WHERE SomeColumn = SomeValue;
SELECT * FROM ViewName;
DROP VIEW ViewName;

Constraints

CREATE TABLE table_name (
    column1 datatype constraint,
    column2 datatype constraint,
    column3 datatype constraint,
    ....
);

The following constraints are commonly used in SQL:

  • NOT NULL - ensures that a column cannot have a NULL value
  • UNIQUE - ensures that all values in a column are different
  • PRIMARY KEY - a combination of a NOT NULL and UNIQUE. Uniquely identifies each row in a table
  • FOREIGN KEY - uniquely identifies a row/record in another table
  • CHECK - ensures that all values in a column satisfies a specific condition
  • DEFAULT - sets a default value for a column when no value is specified
CREATE TABLE Persons (
    ID int NOT NULL PRIMARY KEY,
    LastName varchar(255) NOT NULL,
    FirstName varchar(255),
    Age int CHECK (Age>=18)
    City varchar(255) DEFAULT 'Sandnes'
    CONSTRAINT UC_Person UNIQUE (ID,LastName)
);

CREATE TABLE Orders (
    OrderID int NOT NULL,
    OrderNumber int NOT NULL,
    OrderDate date DEFAULT GETDATE()
    PersonID int,
    PRIMARY KEY (OrderID),
    CONSTRAINT FK_PersonOrder FOREIGN KEY (PersonID)
    REFERENCES Persons(PersonID)
);

ALTER TABLE Orders ADD CONSTRAINT FK_PersonOrder FOREIGN KEY (PersonID) REFERENCES Persons(PersonID);

Indexes

Indexes are special lookup tables that the database search engine can use to speed up data retrieval.

CREATE [UNIQUE] INDEX index_name ON table_name (column1, column2, ...);
DROP INDEX index_name;

Automatic indexes for primary keys and unique constraints. Tables where indexes are stored is platform dependent, e.g. in Oracle: ALL_INDEXES, USER_IND_COLUMNS.

PL/SQL

Oracle's procedural extension of SQL.

Basic syntax

-- comment
DECLARE 
   declarations;
BEGIN 
   statements;
END;
/

Output

dbms_output.enable
dbms_output.put_line('String: ' || value);

Declarations

variable_name [CONSTANT] datatype [NOT NULL] [:= | DEFAULT initial_value]

Most common types:

INTEGER, REAL, VARCHAR2(length), BOOLEAN, DATE, TIMESTAMP

User defined type:

SUBTYPE newType IS type;

Unknown data:

NULL

Scope

DECLARE 
    -- Global variables  
    declarations;
BEGIN
    statements;
    DECLARE
        -- Local variables 
        declarations;
    BEGIN 
        statements;
    END;  
END;

Column type

DECLARE 
    variable table.column%type;
BEGIN 
    SELECT column INTO variable FROM table where id=15;
    ...
END;

Common string functions:

CONCAT(x, y), LENGTH(x), LOWER(x), TRIM(x), UPPER(x)

Collections

Associative array: key-value pairs.

TYPE typeName IS TABLE OF elementType INDEX BY subscriptType; 
myAssicateiveArray(mySubscript):= myElement;

Nested table: dynamic increase, can be sparse.

TYPE typeName IS TABLE OF elementType;

Variable size array:

TYPE typeName IS VARRAY(n) of elementType;

Methods:

EXISTS(n), COUNT, LIMIT, FIRST, LAST, NEXT(n), DELETE, ...

Conditions

IF condition1 THEN  
    statements; 
ELSIF condition2 THEN 
    statements; 
ELSE  
    statements; 
END IF; 

CASE selector 
    WHEN 'value1' THEN statements1; 
    WHEN 'value2' THEN statements2; 
    ... 
    ELSE statementsN;  -- default case 
END CASE;

CASE 
    WHEN selector = 'value1' THEN statements1; 
    WHEN selector = 'value2' THEN statements2; 
    ...
    ELSE statementsN;  -- default case 
END CASE;

Loop statements

LOOP
    ...
        EXIT;
    ...
        CONTINUE;
    ...
END LOOP; 

WHILE condition LOOP 
    statements; 
END LOOP; 

FOR counter IN initial_value .. final_value LOOP 
    statements; 
END LOOP;

<< label >>
statements;
...
GOTO label;

Procedures, functions

CREATE [OR REPLACE] PROCEDURE procedureName(parameters) {IS | AS} 
BEGIN
    statements;
END procedureName;

DROP PROCEDURE procedureName; 

CREATE [OR REPLACE] FUNCTION functionName(parameters) RETURN return_datatype {IS | AS} 
BEGIN 
    statements;
END functionName;

Parameters:

parameterName [IN | OUT | IN OUT] type [, ...]

Execution:

EXECUTE procedure_name;
EXECUTE function_name(parameters);
BEGIN
     ...
     procedure_name;
     result := function_name(parameters);
     ...
END;

Cursors

Implicit cursors:

SQL%FOUND, SQL%NOTFOUND, SQL%ISOPEN, SQL%ROWCOUNT

Explicit cursors:

DECLARE 
    var1 table.column1%type; 
    var2 table.column2%type; 
    CURSOR cursor_name IS SELECT column1, column2 FROM table; 
BEGIN 
    OPEN cursor_name;
    LOOP
        FETCH cursor_name INTO var1, var2; 
        EXIT WHEN cursor_name%notfound; 
        dbms_output.put_line(var1 || ' ' || var2); 
    END LOOP; 
    CLOSE cursor_name;
END;

Records

DECLARE 
    myRecord table%rowtype; 
BEGIN 
    SELECT * into myRecord FROM table WHERE ...;  
    dbms_output.put_line(myRecord.column1 || ' ' || myRecord.column1); 
END;

Or using cursor. User defined record:

TYPE typeName IS RECORD (
    fieldName1 datatype1,
    fieldName2 datatype2,
    ...
    fieldNameN datatypeN
);
recordName typeName;

Exceptions

DECLARE
    declarations;
    myException EXCEPTION; 
BEGIN 
    statements;
    IF condition THEN 
        RAISE myException; 
    END IF; 
EXCEPTION 
    WHEN exception1 THEN  
        statements1;
    ...
    WHEN exceptionN THEN 
        statementsN;
    WHEN myException THEN 
        statementsMy;
END;

Predefined exceptions:

CASE_NOT_FOUND, COLLECTION_IS_NULL, INVALID_CURSOR, INVALID_NUMBER, LOGIN_DENIED, NO_DATA_FOUND, PROGRAM_ERROR, ZERO_DIVIDE, ...

Triggers

CREATE [OR REPLACE ] TRIGGER trigger_name {BEFORE | AFTER | INSTEAD OF} 
{INSERT [OR] | UPDATE [OR] | DELETE} [OF col_name] ON table_name  
[REFERENCING OLD AS o NEW AS n] [FOR EACH ROW]  
WHEN (condition)   
DECLARE 
    declarations; 
BEGIN  
    statements 
EXCEPTION
    exceptionHandlingStatements 
END;

Transactions

COMMIT;
ROLLBACK;
SAVEPOINT < savepoint_name >;
...
ROLLBACK [TO SAVEPOINT < savepoint_name>];

Packages

CREATE PACKAGE myPackage AS
    PROCEDURE myProcedure(parameters);
    PROCEDURE ...;
END myPackage;

CREATE OR REPLACE PACKAGE BODY myPackage AS  
    PROCEDURE myProcedure(parameters) IS 
        declarations;
    BEGIN
        ...
    END myProcedure;
    ...
END myPackage;

Object-orientation

CREATE OR REPLACE TYPE MyClass AS OBJECT (
    myAttribute1 type1;
    myAttribute2 type2;    
    MEMBER FUNCTION myFunction(parameters) RETURN type;
    MEMBER PROCEDURE myProcedure;
); 

CREATE OR REPLACE TYPE BODY MyClass AS 
    MEMBER FUNCTION myFunction(parameters) RETURN type IS 
    BEGIN 
        statements;
        RETURN result;
    END myFunction;

    MEMBER PROCEDURE myProcedure IS
    BEGIN 
        statements;
    END myProcedure;
END;

DECLARE
    myInstance MyClass;
BEGIN 
    myInstance1 := myClass(param1, param2); 
    ...
END;

Comparison

MAP MEMBER FUNCTION measure RETURN NUMBER; -- absolute value
ORDER MEMBER FUNCTION measure(other MyClass) RETURN NUMBER; -- -1, 0 or +1

myInstance1 MyClass;
myInstance2 MyClass;
...
IF (myInstance1 > myInstance2) ...

Inheritance

CREATE OR REPLACE TYPE MyChildClass UNDER MyClass(   
    myAttribute3 type3, 
    OVERRIDING MEMBER PROCEDURE myProcedure
) 

CREATE OR REPLACE TYPE BODY MyChildClass AS 
    OVERRIDING MEMBER PROCEDURE myProcedure IS 
    BEGIN
        statements;
    END myProcedure;

Abstract class

CREATE OR REPLACE TYPE MyAbstractClass AS OBJECT (
    myAttribute type,
    NOT INSTANTIABLE NOT FINAL MEMBER PROCEDURE myAbstractprocedure
) NOT INSTANTIABLE NOT FINAL

T-SQL

TODO: elaborate

NoSQL

TODO: elaborate

Theory

Normal forms:

  • Unnormalized:
    • Definition: multiple values in a column.
    • Example: comma separated phone numbers.
  • First normal form (1NF):
    • Definition: atomic columns.
    • Example that violates 1NF: comma separated phone numbers.
    • Possible solutions:
      • More columns: Phone number 1, Phone number 2, … (formally 1NF but contains repeated group)
      • Repeated entries
      • Two tables, first one containing name and other one phone numbers, with the foreign key ID.
  • Second normal form (2NF):
    • Definition: 1NF + every non-prime attribute of the relation is dependent on the whole of every candidate key.
    • Example that violates 2NF:
      • Columns: Manufacturer, Model, Country
      • Primary key: Manufacturer + Model
      • Dependency: Country depends on Manufacturer
    • Solution: two tables:
      • Manufacturer, Country
      • Manufacturer, Model
  • Third normal form (3NF):
    • Definition: 2NF + every non-prime attribute of R is non-transitively dependent on every key of R.
    • Example:
      • Columns: Tournament, Year, Winner, Winner birth date.
      • Primary key: Tournament + Year.
      • Dependencies:
        • Winner depends on Tournament + Year
        • Winner birth date depends on Winner.
    • Solution: two tables:
      • Tournament, Year, Winner
      • Winner, Winner birth date

Database engines

  • Oracle with SQL developer
  • MySQL with MySQL Workbench
  • PostgreSQL with pgAdmin
  • MS SQL with Microsoft SQL Server Management Studio
  • MongoDB

The most common tasks of the database management tools are the following:

  • view and modify data,
  • view constraints, indexes, triggers, views etc.,
  • export-import,
  • generate SQL code,
  • draw data model,
  • manage permissions,
  • and many more.

Big data

Apache Hadoop

It provides a software framework for distributed storage (HDFS) and processing of big data (MapReduce).

MapReduce algorithm. The input is divided into several parts, they are processed parallel, and the result is combined.

  • map: <key1, value1> → list (<key2, value2>)
  • combine: list of values of the same key (automatically done by the framework)
  • reduce: <key2, list (<value2>)> → list (<key3, value3>)

The map functions can run parallel, and so the reduce functions as well.

Apache Spark

  • Spark Core: cluster management (including Hadoop), distributed task dispatch, RDD (Resilient Distributed Dataset), …
  • Spark SQL: Data Frames
  • Spark Streaming: e.g. connecting to Kafka
  • MLib: machine learning
  • GraphX: graph processing framework
Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License