If you’ve started using Google Spreadsheets as an alternative to the Non-Free Microsoft Excel, congratulations. The real benefit of cloud-based computing is that you never again have to ask where’s the latest version of that spreadsheet! The bummer can be those nice little enterprise level tools like “conditional formatting”, “eliminate duplicates”, etc. For today, I thought I’d help you learn how to remove duplicates from a Google spreadsheet.
What you must do first is determine the best column or columns for determining a duplicate record (this assumes you have your records in rows, with column headers like “First Name”, “Last Name”, “Email Address”, “Phone”, etc.). Using the example in parens, the best column is probably “Email Address”. So, here’s what you do:
- sort by the column containing email addresses
- find the first empty column field to the right (see Note 1 below first)
- enter the following formula: =IF(D2=D3,1,0)
- Now copy the formula down the column
- Those rows with a “1″ are duplicates and can be deleted (see Note 2 below)
If you need to combine fields to determine a duplicate (i.e. if you only had First Name, Last Name, and Company Name – no email), simply follow steps 2 and 3 so you have a column for each required field. Now you can create another column to total the result of the other three. So, if First Name = 1, Last Name = 1, and Company Name = 1, the total would be 3. A total of 3 would be indicative of a duplicate.
NOTE 1: if your columns have dirty data (All Caps, lower case, and mixed), you’ll want to perform the following operation before determining duplicates (this is especially true if you plan to use the list for email marketing. The reason is when someone receives an email addressed to them, but the case is off – it ruins the illusion that we know who they are, that we have a real relationship. For example: Dear JOHN or Dear john – you would never send an email to a friend like that. It’s even worse if you use their name in the body copy as well).
- copy the column with dirty data (just right click and hold until the entire column is highlighted)
- Ctrl-C or Apple-C to copy
- Ctrl-V or Apple-V to paste into the dialog box on the Case Converter website
- Select the appropriate text treatment (Upper Case, Lower Case, Proper Case, or Sentence Case)
- Highlight all items in dialog box on Case Converter
- Ctrl-C or Apple-C to copy
- Ctrl-V or Apple-V to paste into the appropriate column in your Google Spreadsheet (make sure you select the right field to begin the paste operation – you want everything to match, we are just trying to make the text treatment consistent)
NOTE 2: Advanced deletion requires you to convert the formulaic column containing information about duplicates to a non-formulaic column. So, copy the column using Ctrl-C or Apple-C to copy. Then, paste (using Ctrl-V or Apple-V) into a text editor. Once pasted, change the mode to plain text. Now cut and paste back into your spreadsheet. This is important because you will want to sort the data by the duplicates column (if formulaic, the values will change when you sort). Now that you have non-formulaic information about duplicates, sort and delete the rows with a “1″.
That’s how you do it!