How to reconcile two data sets (One is subset or copy of another)
- Gaurav Dhama
- Nov 15, 2021
- 2 min read
Are you still spending lot of time comparing the data in multiple columns in two data sets?
Do you need a quicker way to validate the subset of a source data?
Do you want to validate every required filed quickly with another data?
Well the best solution that I have found is by creating a unique key. Wondering what unique key is? It is a manually created field against each line item that in corporates the fields that needs to be validated.
First Name | Second Name | Age | Store code | Total Amount paid | Unique Key |
Johnny | Bravo | 25 | 101 | 10000 | JohnnyBravo2510110000 |
Dexter | Samuel | 27 | 101 | 23456 | DexterSamuel2710123456 |
Bella | Smith | 22 | 101 | 98764 | BellaSmith2210198764 |
The above table is a subset of a master data for store 101. Now if we need to validate if everything in the subset is exactly same as in master data, if we don't have the right method it is highly likely that we either make a mistake or spend a lot of time validating it.
Lets create a Unique Key.
=CONCAT(A2,B2,C2,D2,E2)
OR
=CONCAT(A2:E2)
Once we have this filed, lets create the same in master data.
Now we can use the VLOOKUP to search for this unique key in the master table unique key column. This will quickly tell you if all the data has been considered correctly or not.
Doing vice versa i.e. checking in the master data and filtering with the errors (#NA which is values not found) will again validate if all the required fields are considered.
Lets take an example of above table, the table is a subset of master sales table and only contains data relating to store 101.
Step1: Filter with only errors in the master data (i.e. the values not found in subset)
Step 2: Check in store ID column for store ID.
Ideally the Store ID must not contain the store ID 101, as that should have been filtered out in step 1.
So with this method, we can quickly validate the data in two ways.
Thanks for reading. If you have any doubt or query, do write back and we will try to SOLVE THAT QUERY.
Comments