So you’re happily working away in Excel, performing some sorting wizardry on your piles of data. But then disaster strikes. You notice that some fields of data are not being sorted as the data type you have specified. Dates are not sorting from newest to oldest. ISBNs are not being recognized as numbers. The horror!
We’ve all been there. It’s frustrating and can ruin your progress. Unfortunately, you’re most likely a victim of invisible apostrophes.
Invisible apostrophes force Excel and other spreadsheet programs to format the affected data as text as opposed to what you want to format it as. This causes the sorting problems described above. Fortunately, there is a quick and easy solution to eradicate invisible apostrophes once and for all. This tutorial will show you how to do it.
Step 1 - Get the right tools
First, you’ll need to make sure you are using the right tool, and that tool is regular expressions, or RegEx for short. If you use Excel, you’ll need to download and install a regular expressions add-on, as Excel does not support regular expressions natively. You can get that add-on here.
Once installed, you should see the new "RegEx Find and Replace" tool in your home ribbon if you are using Excel 2007 or later:
Or, if you are using Excel 2003 or earlier, it will appear as a menu item under the "Edit" drop-down menu.
Alternatively, you can use an open-source solution. LibreOffice Calc is free and supports regular expressions functionality natively. You can download the LibreOffice Suite (including Calc) here.
For the remainder of this tutorial I will be using Excel with the RegEx add-on to remove the apostrophes.
Step 2 - Use RegEx to remove invisible apostrophes
Once you have RegEx functionality in your spreadsheet program, you’re all set to remove those apostrophes. Here’s what to do:
Highlight all of your data by clicking on the square in the top left corner of your spreadsheet:
Click on the Sort button from the “Data” ribbon. Sort your data by the values in the affected column. In this example the data is date/time data so I will sort by “Oldest to Newest”:
You’ll notice that once you have sorted the data based on the data from the affected the column, the bad records with the invisible apostrophes should either sort to the bottom or the top of your spreadsheet.
In this example, the bad dates sorted to the bottom of the spreadsheet:
With the bad data now isolated, highlight the affected data only:
Click on the RegEx Find & Replace button on the Home ribbon. Or, if you are using Excel 2003 or earlier it appears as a menu item under the Edit menu.
Once the window launches, click the replace tab:
In the find what field enter “^.” without the quotation marks:
These symbols tell Excel to find the beginning of a line (^) and to find any character at the beginning of that line (.), which in our case are those dastardly invisible apostrophes.
In the replace with field do not enter any values. This way you’ll be replacing those invisible apostrophes with nothing.
Click Replace All and voila! Those apostrophes should disappear.
You’re almost done! Now try re-formatting the cells to your desired data type and sorting again. You'll notice that the data is sorting correctly. Dates are all being sorted oldest to newest, ISBNs are in ascending order, and most importantly crisis has been averted.
But wait, there's more!
With your newfound RegEx superpowers you can now perform all sorts of regular expression find and replace searches to improve the quality of your data and your life. A good starting place to learn about the different symbols and their functions is this cheat sheat. Feel free to experiment with different formulas, but always remember to back up your data before performing a RegEx search.