SQL | Top-N Queries

Top-N Analysis in SQL focuses on How to reduce the number of rows that are returned from SQL's ordered sets. Top-N queries seek the smallest and largest columns. The Top-N query is open to both the smallest and greatest value set. This search technique can help you save time and reduce complexity. If you want to only display the n lowest or highest records in a table that is based on one condition, top-N analysis can be useful. The result set is available for further analysis.

A query, or request for data from a particular table or group of tables, is simply a way to ask the database for it. A variety of query languages are available that allow you to write simple and complex queries. Queries can filter specific criteria according to the information they are asked for. You can use it to automate data administration tasks or engage in calculations. These are the top 10 SQL queries

For example, using Top-N analysis we can perform the following types of queries: 

Top five products with the highest sales over the previous two months. The lowest two students scored less than 50% in their final semester exams.

Get Set, Get Ready

It is important to familiarize yourself with the basic functionalities of SQL before we can start building our queries. Vertabelo Academy is an excellent resource for SQL knowledge, especially the Standard SQL Functions Course. Now let's get to work by taking a look at the data in a basic sales table.

Top-N Distinct

 
For a Top-N, you need distinct results. The DENSE_RANK analytic function can be used to obtain a Top N with different results. DENSE_RANK works in the same way as RANK but returns results without gaps. If you want all of the rows and their values, then the Top N Distinct query type would work.

Top-N Non-Distinct with Ranked Values

The Top-N, non-distinct query will return N number of high-ranking values along with all the related rows. We use the RANK function to implement this. It generates a sequential rank that corresponds with each unique value in the specific window.

The RANK function gives the same result for distinct values but skips distinguishable values to preserve consistency. If there were three 100-dollar sales, the RANK function would return three values for first place with 100 dollars each. 

How to select the top-N rows per group with SQL in Oracle Database

If you are using an older version of Oracle Database, you may use the rownum trick. But what if the top-N rows for each group are not available? This could be the order history of each customer's three recent orders. SQL doesn't have a native syntax. This post will show you how to: Find top-N rows for each group. Get the first row of each group. How to deal with ties. Write a reusable top/group function using SQL macros. Conclusion. You can download the Customer Orders schema from  emuktech.com. This Live SQL script contains the scripts needed to complete this post.

How to find the top-N rows in each group

Two steps are required to determine the top N rows of a group.

Assigning values from one for each group

Use the handy row_number() function to adjust row numbers. This function assigns sequential integers to each row based on what sort you specified. The over clause defines the order. This allows you to set the row numbers for each group at 1. You can do this by using the partitioning clause. Partition by clause will do the same.

Restrict the results to the first N in each group

This logic must be placed in a subquery after you have generated the row numbers. You can choose to use an inline view, or with a clause. I prefer CTEs because they are easier to understand.

How to find the first row in each group

Select customer_id max( order_datetime) from co.ordersgroup. Customer_idselect customers_id maximum (order_datetime) from co.orders group. Customer_id. The challenge is to add other columns to your result. By doing this, you can have a row for each value combination. This means you will get multiple rows instead of just one. The values of the row with the most recent dates are not the ones you want. You can also use the row numbering methods above to get only those values.

How to get values for columns with the same value as the first or last

The database will use this information to determine the top or bottom value of the column according to the sort. If more than one row is tied, the database will take max or min. To find the location where the customer's most recent order has been placed, you must: Take the min/max store_id. Keep the last row. Sort by order_datetime.

How to deal with ties

Assumed so far, a customer may only place one order. There isn't a single constraint to ensure that customers can only place one order per customer. But there are situations in which duplicates are almost guaranteed. For instance, each store should show three of the highest-ordered customers.

How to return all rows with the top-N values

Sometimes it is necessary to look at all the rows that contain the first N unique values of the sort. To see all orders placed by store 1, you will need to tie them together with the exact same number and no gaps.

Conclusion

You can have exactly one row in every group, and only the highest or least value in ungrouped columns. A plain group works well. But there is no SQL syntax for getting the top N rows from each group. This is possible by simply numbering rows and filtering the result. You can also make it a reusable component by using SQL macros. This makes it much easier for developers to resolve this issue in the future. You can also modify the code to make it faster. Do you like the idea of SQL macros being used to create these templates that can be reused? We'd love to hear your thoughts in the comments. Get the code using this Live SQL script. Want more information about analytical functions? Check out Connors Analytic SQL course at Emuktech.com