SQL is a programming language designed to manipulate and manage data stored in relational databases.
- A relational database is a database that organizes information into one or more tables.
- A table is a collection of data organized into rows and columns. A statement is a string of characters that the database recognizes as a valid command.
CREATE: Creates new tableINSERT INTO: adds a new row to a tableSELECT: queries data from a table.ALTER TABLE: changes an existing table.UPDATE: edits a row in a table.DELETE FROM: deletes rows from a table.
- Integer
- Text
- Data
- Real
CREATE TABLE table_name( column_name_1 data_type, column_name_2 data_type, column_name_3 data_type);
INSERT INTO table_name( column_name_1,column_name_2,column_3) VALUES (a1,b1,c1),(a2,b2,c2),(a3,b3,c3);
SELECT * FROM table_name;,SELECT column_name FROM table_name;
ALTER TABLE table_name ADD COLUMN new_column_name TEXT;
UPDATE table_name SET (to_be_modified) WHERE (loc_conditon);
DELETE FROM table_name WHERE (condition);
Constraints that add information about how a column can be used are invoked after specifying the data type for a column. They can be used to tell the database to reject inserted data that does not adhere to a certain restriction. For example:
CREATE TABLE table_name( column_name_1 INTEGER PRIMARY KEY, column_name_2 TEXT UNIQUE, column_name_3 TEXT NOT NULL, column_name_$ TEXT DEFAULT 'Not Applicable' );We found that the constrains words likePRIMARY KEY,UNIQUE,NOT NULL,DEFAULT (VALUE).
One of the core purposes of the SQL language is to retrieve information stored in a database. This is commonly referred to as querying. Queries allow us to communicate with the database by asking questions and having the result set return data relevant to the question.
Reviews:
SELECTis the clause we use every time we want to query information from a database.ASrenames a column or table.DISTINCTreturn unique values.WHEREis a popular command that lets you filter the results of the query based on conditions that you specify.LIKEandBETWEENare special operators.ANDandORcombines multiple conditions.ORDER BYsorts the result.LIMITspecifies the maximum number of rows that the query will return.CASEcreates different outputs.
AS is a keyword in SQL that allows you to rename a column or table using an alias. Some important things to note:
- Although it’s not always necessary, it’s best practice to surround your aliases with single quotes.
- When using
AS, the columns are not been renamed in the table. The aliases only appear in the result.
DISTINCT is used to return unique values in the output. It filters out all duplicate values in the specified column(s).
DISTINCT is used to return unique values in the output. It filters out all duplicate values in the specified column(s). Comparison operators used with the WHERE clause are:
=equal to!=not equal to>greater than<less than>=greater than or equal to<=less than or equal to
LIKE can be a useful operator when you want to compare similar values.
- The
_means you can substitute any individual character here without breaking the pattern.For example:- The names Seven and Se7en both match the pattern:
LIKE 'Se_ev'.
- The names Seven and Se7en both match the pattern:
- The
%is a wildcard character that matches zero or more missing letters in the pattern.For example:LIKE 'A%'matches all words with names that begin with letter ‘A’LIKE '%a'matches all words that end with ‘a’.- We can also use % both before and after a pattern:
LIKE '%xxx%'
More often than not, the data you encounter will have missing values.we will have to use these operators:
IS NULLIS NOT NULL
The BETWEEN operator can be used in a WHERE clause to filter the result set within a certain range.
The values can be numbers, text or dates.interesting point to emphasize:
BETWEENtwo letters is not inclusive of the 2nd letter.For example:SELECT * FROM movies WHERE name BETWEEN 'A' AND 'J';The results merely return a series of movie name with a letter of 'A'-'I'.
BETWEENtwo numbers is inclusive of the 2nd number.For example:SELECT * FROM movies WHERE year BETWEEN 1990 AND 1999;The results return with the year between 1990 and 1999.
Sometimes we want to combine multiple conditions in a WHERE clause to make the result set more specific and useful.
With AND, both conditions must be true for the row to be included in the result.
Similar to AND, the OR operator can also be used to combine multiple conditions in WHERE
With OR, if any of the conditions are true, then the row is added to the result.
ORDER BY is a clause that indicates you want to sort the result set by a particular column.(can have more than one column)
The column that we ORDER BY doesn’t even have to be one of the columns that we’re displaying.
LIMIT is a clause that lets US specify the maximum number of rows the result set will have.
This saves space on our screen and makes our queries run faster.
Here, we specify that the result set can’t have more than 10 rows.
LIMIT always goes at the very end of the query. Also, it is not supported in all SQL databases.
A CASE statement allows us to create different outputs (usually in the SELECT statement). It is SQL’s way of handling if-then logic.For example:
SELECT name,CASE WHEN genre='romance' THEN 'Chill' WHEN genre='comedy' THEN 'Chill' ELSE 'Intense' END AS 'Mood' From movies;
Calculations performed on multiple rows of a table are called aggregates.
Reviews:
COUNT(): count the number of rowsSUM(): the sum of the values in a columnMAX()/MIN(): the largest/smallest valueAVG(): the average of the values in a columnROUND(): round the values in the columnGROUP BYis a clause used with aggregate functions to combine data from one or more columns.HAVINGlimit the results of a query based on an aggregate property.
COUNT() is a function that takes the name of a column as an argument and counts the number of non-empty values in that column.
SUM() is a function that takes the name of a column as an argument and returns the sum of all the values in that column.
The MAX() and MIN() functions return the highest and lowest values in a column, respectively.
The AVG() function works by taking a column name as an argument and returns the average value for that column.
ROUND() function takes two arguments inside the parenthesis:
- A column name
- A column name
GROUP BY is a clause in SQL that is used with aggregate functions. It is used in collaboration with the SELECT statement to arrange identical data into groups.The GROUP BY statement comes after any WHERE statements, but before ORDER BY or LIMIT.
SQL lets us use column reference(s) in our GROUP BY that will make our lives easier:
1is the first column selected2is the second column selected3is the third column selected
HAVING allows you to filter which groups to include and which to exclude.
HAVING statement always comes after GROUP BY, but before ORDER BY and LIMIT.
In order to efficiently store data, we often spread related information across multiple tables.Such as:
- the
orderswould contain just the information necessary to describe what was ordered:- order_id, customer_id, subscription_id, purchase_date
- subscriptions would contain the information to describe each type of subscription:
- subscription_id, description, price_per_month, subscription_length
- customers would contain the information for each customer:
- customer_id, customer_name, address
Reviews:
JOINwill combine rows from different tables if the join condition is true.LEFT JOINwill return every row in the left table, and if the join condition is not met, NULL values are used to fill in the columns from the right table.- Primary key is a column that serves a unique identifier for the rows in the table.
- Foreign key is a column that contains the primary key to another table.
CROSS JOINlets us combine all rows of one table with all rows of another table.UNIONstacks one dataset on top of another.WITHallows us to define one or more temporary tables that can be used in the final query.
Because column names are often repeated across multiple tables, we use the syntax table_name.column_name to be sure that our requests for columns are unambiguous. In our example, we use this syntax in the ON statement, but we will also use it in the SELECT or any other statement where we refer to column names.
When we perform a simple JOIN (often called an inner join) our result only includes rows that match our ON condition.
A left join will keep all rows from the first table, regardless of whether there is a matching row in the second table.
The final result will keep all rows of the left table but will omit the un-matched row.
Each of tables has a column that uniquely identifies each row of that table, for example:
order_idfororderssubscription_idforsubscriptionscustomer_idforcustomersThese special columns are called primary keys.Primary keys have a few requirements:- None of the values can be NULL.
- Each value must be unique (i.e., you can’t have two customers with the same customer_id in the customers table).
- A table can not have more than one primary key column. When the primary key for one table appears in a different table, it is called a foreign key.
- So
customer_idis a primary key when it appears incustomers, but a foreign key when it appears inorders. The most common types of joins will be joining a foreign key from one table with the primary key from another table. For instance: - When we join
ordersandcustomers, we join oncustomer_id, which is a foreign key inordersand the primary key incustomers.
Combining all rows of one table with all rows of another table.
Notice that cross joins don’t require an ON statement. You’re not really joining on any columns!
Suppose we have 3 different shirts (white, grey, and olive) and 2 different pants (light denim and black), the results might look like this: 3 shirts × 2 pants = 6 combinations
- A more common usage of CROSS JOIN is when we need to compare each row of a table to a list of values.
Sometimes we just want to stack one dataset on top of the other. Well, the UNION operator allows us to do that.
SQL has strict rules for appending data:
- Tables must have the same number of columns.
- The columns must have the same data types in the same order as the first table.
Often times, we want to combine two tables, but one of the tables is the result of another calculation.
- The
WITHstatement allows us to perform a separate query (such as aggregating customer’s subscriptions) previous_resultsis the alias that we will use to reference any columns from the query inside of the WITH clause- We can then go on to do whatever we want with this temporary table (such as join the temporary table with another table) Essentially, we are putting a whole first query inside the parentheses () and giving it a name. After that, we can use this name as if it’s a table and write a new query using the first query.
With previous_results_name[a temporary table using aggregation] AS(SELECT.....) SELECT ....FROM...JOIN...ON ...=...;