Work with duplicate data in excel - Special Movies and Comedy


Special Movies and Comedy

Saturday, March 5, 2011

Work with duplicate data in excel

In the duplicate world, definition means everything. That’s because a duplicate is subjective to the context of its related data. Duplicates can occur within a single column, across multiple columns, or complete records. There’s no one feature or technique that will find duplicates in every case.
To find duplicate records, use Excel’s easy-to-use Filter feature as follows:
  1. Select any cell inside the recordset.
  2. From the Data menu, choose Filter and then select Advanced Filter to open the Advanced Filter dialog box.
  3. Select Copy To Another Location in the Action section.
  4. Enter a copy range in the Copy To control.
  5. Check Unique Records Only and click OK.
Excel will copy a filtered list of unique records to the range you specified in Copy To. At this point, you can replace the original recordset with the filtered list (the copied list) if you want to delete the duplicates.
Finding duplicates in a single column or across multiple columns is a bit more difficult. Use conditional formatting to highlight duplicates in a single column as follows:
  1. Using the example worksheet, select cell A2. When applying this to your own worksheet, select the first data cell in the list (column).
  2. Choose Conditional Formatting from the Format menu.
  3. Choose Formula Is from the first control’s drop-down list.
  4. In the formula control, enter =COUNTIF(A:A,A2)>1.
  5. Click the Format button and specify the appropriate format. For instance, click the Font tab and choose Red from the Color control and click OK. At this point, the Conditional Formatting dialog box should resemble the following figure:
  1. Click OK to return to the worksheet.
  2. With cell A2 still selected, click Format Painter.
  3. Select the remaining cells in the list (cells A3:A5 in the example worksheet).
The conditional format will highlight any value in column A that’s repeated. If you want Excel to highlight only the copies, leaving the first occurrence of the value unaltered, enter the formula =COUNTIF($A$2:$A2, A2)>1 in step 4.
The conditional format works great for a single column. To find duplicates across multiple columns, use two expressions: One to concatenate the columns you’re comparing; a second to count the duplicates. For example, if you wanted to find duplicates of both first and last names in the example worksheet, you’d enter the following formula in cell D2 to concatenate the first and last name values:

You could insert a space character between the two names if you liked, but it isn’t necessary. Copy the formula to accommodate the remaining list items.
Next, in cell E2 enter the following formula and copy it to accommodate the remaining list:
Notice that the worksheet has a new record (row 6). This record duplicates the first name, Susan, but not the last name. The conditional format highlights the first name because it’s a duplicate in column A. However, the formula in column E doesn’t identify the combined values across columns A and B as a duplicate because the first and last names together aren’t duplicated.


1 comment:

  1. I think the illustrations (screenshots) are very helpful in your above explanations. This is one more post that I will find helpful. I hope I can learn Excel to eventually be able to create my own site databases (or use it along with other programs to do so).