Home » Top SQL Queries for Data Analysis

Top SQL Queries for Data Analysis

Last updated 2 years ago

Improving your data analysis skills is possible by mastering advanced SQL techniques, which can assist in deeper exploration of datasets leading to valuable insights. This article highlights seven advanced SQL queries that can be helpful for data analysis. Each query will be presented along with its results, syntax details and an explanation of how the database computes it.

Top SQL Queries for Data Analysis

1. Grouping Data by Time Period

For the 1st Example, let us scrutinize information found in the sales table. The said data set contains a unique identifier for every product, the date when the deal has transpired, and the overall value of the transaction.

id	product_id	amount	date
1	1	120	2023-04-01
2	112	310	2023-03-11
3	810	520	2023-04-08
4	11	410	2023-02-26
5	810	520	2023-04-08
6	11	410	2023-04-13

Avery common in analysis scenario required data like specific date periods, such as month or year.

SELECT EXTRACT(YEAR FROM date) AS year, EXTRACT(MONTH FROM date) AS month, SUM(amount) AS total_sales_amount FROM sales GROUP BY year, month ORDER BY year, month;

Output:

year	month	total_sales_amount
2023	2	410
2023	3	310
2023	4	1570

Using the EXTRACT() function, we retrieve the year and month from the date field in this query. The data is then grouped based on the new year and month fields, and the total sales for each group are calculated by employing the SUM() function.

2. Ranking Data

Ranking information is frequently needed for complex data analysis purposes. For instance, you may want to rank your merchandise based on their sales figures to identify which items generate the majority of the earnings, or you may need to rank your outlets based on their subpar sales figures to pinpoint which locations are performing poorly.

Utilizing functions such as RANK() or DENSE_RANK() as window functions can allocate rankings to rows based on the value of a particular column. To begin, we shall examine the RANK() function:

SELECT product_id, SUM(amount) AS total_sales_amount, RANK() OVER(ORDER BY SUM(amount) DESC) AS rank FROM sales GROUP BY id;

Output:

product_id	total_sales_amount	rank
810	520	1
810	520	1
11	410	3
11	410	3
112	310	5
1	120	6

3. Difference Between two Rows

At times, it becomes necessary to juxtapose the values of successive rows, such as when retrieving sales figures from the preceding month to measure against those of the current month. By having data on sales from both the current and previous months at hand, it becomes possible to calculate the variance between the two values.

To accomplish this, you may use window functions such as LAG() and LEAD(). For this i will use a new table named sales_stats.

id	total	month	year
3	10	1	2022
4	2	2	2022
1	10	1	2023
2	20	2	2023

Using the LAG() function, this inquiry computes the variance between the sales of the present month and the sales of the preceding month.

SELECT year, month, total, LAG(total) OVER (ORDER BY year, month) AS previous_month, total-LAG(total) OVER (ORDER BY year, month) AS sales_difference FROM sales_stats ORDER BY year, month;

Output:

year	month	total	previous_month	sales_difference
2022	1	10	NULL	NULL
2022	2	2	10	-8
2023	1	10	2	8
2023	2	20	10	10

LAG() is a window function which retrieves the total amount from the preceding row, sorted by year and month, and assigns an alias of previous_month to that column.

The sales_difference column is obtained by deducting the previous_month value from the total in each row. In the initial row, NULL values are present because there is no previous row.

4. Calculate the Moving Average

Using a moving average, which calculates the mean of a given value and a set number of its preceding values, can be more beneficial in studying sales trends through data analysis rather than relying solely on individual data points. This method is also referred to as a rolling average.

For this example, i will use total_sales table.

total_sales

id	total_sales	month	year
1	320	1	2022
2	400	2	2022
3	200	3	2022
4	600	4	2022
5	650	5	2022
6	50	6	2022
7	623	7	2022
8	700	8	2022

let's calculate the average

SELECT year, month, total_sales, AVG(total_sales) OVER(ORDER BY year, month ROWS BETWEEN 2 PRECEDING and CURRENT ROW) AS moving_average FROM total_sales ORDER BY year, month;

Output:

year	month	total_sales	moving_average
2022	1	320	320.0000
2022	2	400	360.0000
2022	3	200	306.6667
2022	4	600	400.0000
2022	5	650	483.3333
2022	6	50	433.3333
2022	7	623	441.0000
2022	8	700	457.6667

5. Calculate the Running Total

A cumulative sum, also known as a running total or cumulative total, sums up a series of numbers by adding each new value to the previous running total.

Accumulating amounts are valuable in computing the combined influence of data elements as time progresses. One instance where it can be utilized is in computing the combined count of web visitors up to a specific moment to comprehend your website's user expansion.

For this example i will use the same table of previous point. Table name total_sales

SELECT year, month, total_sales, SUM(total_sales) OVER(ORDER BY year, month) AS cumulative_total FROM total_sales
ORDER BY year, month;

Output:

year	month	total_sales	cumulative_total
2022	1	320	320
2022	2	400	720
2022	3	200	920
2022	4	600	1520
2022	5	650	2170
2022	6	50	2220
2022	7	623	2843
2022	8	700	3543

Changing Seconds to a Tim...

You’d like to display a number of seconds as a time value in hours, minutes, and seconds in MySQL.

Read It

An Introduction To CSS Pr...

CSS or Cascading Style Sheets have become a really popular programming language in recent times and...

Read It

----Advertisement----