Top SQL Queries for Data Analysis

Last updated a year ago

...

Improving your data analysis skills is possible by mastering advanced SQL techniques, which can assist in deeper exploration of datasets leading to valuable insights. This article highlights seven advanced SQL queries that can be helpful for data analysis. Each query will be presented along with its results, syntax details and an explanation of how the database computes it.

 

Top SQL Queries for Data Analysis

 

1. Grouping Data by Time Period

For the 1st Example, let us scrutinize information found in the sales table. The said data set contains a unique identifier for every product, the date when the deal has transpired, and the overall value of the transaction.

idproduct_idamountdate
111202023-04-01
21123102023-03-11
38105202023-04-08
4114102023-02-26
58105202023-04-08
6114102023-04-13

 

Avery common in analysis scenario required data like specific date periods, such as month or year.

SELECT EXTRACT(YEAR FROM date) AS year, EXTRACT(MONTH FROM date) AS month, SUM(amount) AS total_sales_amount FROM sales GROUP BY year, month ORDER BY year, month;

Output:

yearmonthtotal_sales_amount
20232410
20233310
202341570

 

Using the EXTRACT() function, we retrieve the year and month from the date field in this query. The data is then grouped based on the new year and month fields, and the total sales for each group are calculated by employing the SUM() function.

 

2. Ranking Data

Ranking information is frequently needed for complex data analysis purposes. For instance, you may want to rank your merchandise based on their sales figures to identify which items generate the majority of the earnings, or you may need to rank your outlets based on their subpar sales figures to pinpoint which locations are performing poorly.

Utilizing functions such as RANK() or DENSE_RANK() as window functions can allocate rankings to rows based on the value of a particular column. To begin, we shall examine the RANK() function:

SELECT product_id, SUM(amount) AS total_sales_amount, RANK() OVER(ORDER BY SUM(amount) DESC) AS rank FROM sales GROUP BY id;

Output:

product_idtotal_sales_amountrank
8105201
8105201
114103
114103
1123105
11206

 

3. Difference Between two Rows

At times, it becomes necessary to juxtapose the values of successive rows, such as when retrieving sales figures from the preceding month to measure against those of the current month. By having data on sales from both the current and previous months at hand, it becomes possible to calculate the variance between the two values.

To accomplish this, you may use window functions such as LAG() and LEAD(). For this i will use a new table named sales_stats.

idtotalmonthyear
31012022
4222022
11012023
22022023

 

Using the LAG() function, this inquiry computes the variance between the sales of the present month and the sales of the preceding month.

SELECT year, month, total, LAG(total) OVER (ORDER BY year, month) AS previous_month, total-LAG(total) OVER (ORDER BY year, month) AS sales_difference FROM sales_stats ORDER BY year, month;

Output:

yearmonthtotalprevious_monthsales_difference
2022110NULLNULL
20222210-8
202311028
20232201010

 

LAG() is a window function which retrieves the total amount from the preceding row, sorted by year and month, and assigns an alias of previous_month to that column.

The sales_difference column is obtained by deducting the previous_month value from the total in each row. In the initial row, NULL values are present because there is no previous row.

 

4. Calculate the Moving Average

Using a moving average, which calculates the mean of a given value and a set number of its preceding values, can be more beneficial in studying sales trends through data analysis rather than relying solely on individual data points. This method is also referred to as a rolling average.

For this example, i will use total_sales table.

total_sales

idtotal_salesmonthyear
132012022
240022022
320032022
460042022
565052022
65062022
762372022
870082022

let's calculate the average

SELECT year, month, total_sales, AVG(total_sales) OVER(ORDER BY year, month ROWS BETWEEN 2 PRECEDING and CURRENT ROW) AS moving_average FROM total_sales ORDER BY year, month;

Output:

yearmonthtotal_salesmoving_average
20221320320.0000
20222400360.0000
20223200306.6667
20224600400.0000
20225650483.3333
2022650433.3333
20227623441.0000
20228700457.6667

 

5. Calculate the Running Total

A cumulative sum, also known as a running total or cumulative total, sums up a series of numbers by adding each new value to the previous running total.

Accumulating amounts are valuable in computing the combined influence of data elements as time progresses. One instance where it can be utilized is in computing the combined count of web visitors up to a specific moment to comprehend your website's user expansion.

For this example i will use the same table of previous point. Table name total_sales

SELECT year, month, total_sales, SUM(total_sales) OVER(ORDER BY year, month) AS cumulative_total FROM total_sales
ORDER BY year, month;

Output:

yearmonthtotal_salescumulative_total
20221320320
20222400720
20223200920
202246001520
202256502170
20226502220
202276232843
202287003543

 

 


Changing Seconds to a Tim...

You’d like to display a number of seconds as a time value in hours, minutes, and seconds in MySQL.

Read It

An Introduction To CSS Pr...

CSS or Cascading Style Sheets have become a really popular programming language in recent times and...

Read It



----Advertisement----