top of page
Antares Studio Games Icon_Small.png

The Antares Alien

Power BI   |  Game Development

  • Writer's pictureBrent Jones

Power BI: How to Remove Duplicates and Keep Most Recent?

Updated: Nov 17, 2021



In this post, I'll show you how you can use Power Query within Power BI to remove duplicates while keeping the most recent item (or the max, min, etc.).


Why is this helpful? Let's say you have a situation where new data is appended to a spreadsheet instead of overwriting it. This introduces duplicate rows in your dataset. Simply removing duplicates may inadvertently remove the new data, causing inaccurate results. This is particularly troublesome if the data source has the potential to be sorted in unexpected ways.



The solution is to use a combination of grouping rows, extracting a value, then performing a number of checks. But don't worry! It's nothing too complicated.


Steps

Here's our sample dataset. You can see we need the products with the most recent date, highlighted below: Fake hotdogs with qty of 6 and old hamburgers with qty of 9, each on 1/5/2021.


1. Group by the product, and a new column with All Rows.

Here is what that looks like:


2. Copy this "Data" column.


3. Extract the latest date with List.Max.

To do this, click on the arrows and use the "Aggregate" function. There is no Max function, so just select Count of Date.



4. After that, change the Count function in the formula bar to List.Max instead. Feel free to change the name of the column as well.


5. Now expand the Copied data column.


6. Add a "Custom Column" to check if the Date equals the Latest Date.


7. Now, just filter to True on this column:


Now you have a dataset with removed duplicates while keeping the latest record.


 

Here is the full M code if you get stuck anywhere (you'll need to create your own dataset).

let
    // Group
    #"Grouped Rows" = Table.Group(
            Source, 
            {"Product"}, 
            {
                {
                    "Data", 
                    each _, 
                    type table [Date=nullable date, 
                    Product=nullable text, Qty=nullable number]
                }
            }),
    
    // Duplicate the data
    #"Duplicated Column" = Table.DuplicateColumn(
    #"Grouped Rows", "Data", "Data - Copy"),
    
    // Extract the max date
    #"Extract Max Date" = Table.AggregateTableColumn(
    #"Duplicated Column", "Data", 
        {
            {"Date", List.Max, "Latest Date"}
        }),
    
    // Expand the data
    #"Expanded Data - Copy" = Table.ExpandTableColumn(
        #"Extract Max Date", 
        "Data - Copy", 
        {"Date", "Qty"}, 
        {"Date", "Qty"}
        ),
    
    // Add the check to see if the latest date equals the record date
    #"Added Custom" = Table.AddColumn(
        #"Expanded Data - Copy", 
        "Latest Date Check", 
        each if [Latest Date] = [Date] then true else false, Logical.Type
        ),
    
    // Filter on True records
    #"Filtered Rows" = Table.SelectRows(
        #"Added Custom", 
        each ([Latest Date Check] = true)
        )
in
    #"Filtered Rows"

  • Facebook
  • Twitter
  • Instagram
bottom of page