Data cleaning is an essential step when working with datasets in Excel.
Here are some of the techniques to assist you.
Removing Duplicate Rows
Select the desired range or table.
Navigate to: Data > Remove Duplicates.
Finding and Replacing Text
Use shortcut: Ctrl + H.
For instance, with Find: apple and Replace with: orange, it'll replace each occurrence of "apple" with "orange".
Changing the Case of Text
To convert to uppercase: =UPPER("apple") → Result: APPLE.
To convert to lowercase: =LOWER("APPLE") → Result: apple.
For proper case: =PROPER("APPLE tree") → Result: Apple Tree.
Removing Spaces and Non-Printing Characters
To remove unnecessary spaces: =TRIM(" apple ") → Result: apple.
To remove non-printing characters, if A1 contains such: =CLEAN(A1).
Fixing Numbers and Number Signs
Convert text to number: If A1 has "123", then =VALUE(A1) or =A1*1 will yield 123.
To get the absolute value: If A1 has -5, =ABS(A1) will yield 5.
Fixing Dates and Times
Convert text to date: If A1 contains the text "January 1, 2023", =DATEVALUE(A1) will transform it into a date serial.
Convert text to time: If A1 contains the text "12:30", =TIMEVALUE(A1) turns it into a time serial.
Spell Checking
Navigate to: Review > Spelling.
Example: If a cell has "excell", it might suggest "excel".
Merging Text in Columns
Merging: If A1 contains "John" and B1 contains "Doe", then =A1 & " " & B1
or
=CONCATENATE(A1, " ", B1) will yield "John Doe".
Splitting
If A1 has "John Doe", using the menu option: Data > Text to Columns with space as a delimiter will yield two separate columns: "John" and "Doe".
Transforming and Rearranging Columns and Rows
Moving
Simply drag column headers or row numbers.
Transpose
If A1:A3 holds values "apple", "banana", and "cherry", copying them and using 'Paste Special' > 'Transpose' will display them horizontally.
Flash Fill
Flash Fill automatically fills in values based on a pattern it detects from your input.
For example, if you have a column of names in the format "First Last" and you start typing in the adjacent column just the first names, Excel might detect the pattern and auto-suggest to fill the rest of the column with first names.
To trigger Flash Fill, start typing patterns and when Excel auto-suggests, press Enter or you can also manually initiate it via Data > Flash Fill.
Blog article at www.comdex.com.au/post/excels-flash-fill
Reconciling Table Data by Joining or Matching VLOOKUP
Given Sheet1!A1 contains "apple" and Sheet2 has a table with "apple" in A1 and "fruit" in B1, the formula =VLOOKUP(Sheet1!A1, Sheet2!A1:B10, 2, FALSE) will return "fruit".
Blog article at www.comdex.com.au/post/excels-flash-fill
XLOOKUP (in newer Excel versions)
Using the same example, =XLOOKUP(Sheet1!A1, Sheet2!A1:A10, Sheet2!B1:B10) will return "fruit".
Blog article at www.comdex.com.au/post/excels-xlookup
How converting a range to a table helps clean data
Converting a range of cells to a table in Excel offers various advantages that can assist in the data cleaning process.
Let’s firstly look at how to convert a range to a table format:
1. Select the Range
Click on a cell within the range of data you want to convert to a table. If your data is contiguous (i.e., no blank rows or columns within the data set), Excel will automatically detect the entire range when you start the process to create a table.
2. Convert to Table
With the range selected or with an active cell inside the desired range, go to the Insert tab on the Ribbon.
Click on the Table button. Alternatively, you can use the shortcut Ctrl + T.
3. Specify Table Settings
A "Create Table" dialog box will appear.
Ensure the "Create Table" window that pops up has the correct range listed.
If your data has headers (like column titles), make sure the checkbox "My table has headers" is ticked. If not, Excel will automatically provide headers, and you can rename them later. Click OK.
4. Adjust Table Design (Optional)
Once your data is converted into a table, you'll notice a new Table Tools Design tab appears on the Ribbon.
From here, you can rename your table, choose from different table styles for formatting, and adjust other table-specific settings.
5. Naming the Table (Optional, but Recommended)
It's a good practice to give your table a meaningful name, especially if you'll be creating multiple tables or using formulas that reference the table.
Under the Table Tools Design tab on the Ribbon, you'll see a field for "Table Name" on the left side. Simply type in a new name there. Remember, table names cannot contain spaces and must be unique within the workbook.
Your range is now converted to a table, and you can start leveraging the advantages that tables offer in Excel!
How tables can help clean data
Structured References
Tables use structured references, which means formulas that reference table columns by name rather than cell address. This makes your formulas more readable and reduces the chance of errors when rows are added or deleted.
Auto Expansion
When you add new data below a table, the table automatically expands to include this data. This ensures that any calculations or charts based on the table will automatically include the new data.
Automatic Filtering
Excel automatically adds drop-down arrows to the column headers of a table. This feature allows you to sort and filter data quickly, helping you to spot anomalies or inconsistencies.
Consistent Formatting
Tables apply consistent formatting. When you add or remove rows or columns, the table's formatting adjusts automatically, making it easier to visually scan for errors.
Also, every other row gets a shaded colour by default (zebra striping), which can make large sets of data easier to read.
Calculated Columns
If you create a formula in one cell of a table column, Excel will automatically fill down that formula to the remaining cells in that column. This ensures that the entire column uses the same formula and prevents inconsistencies. This feature also automatically applies when new rows are added.
Total Row
Excel offers an option to add a total row at the bottom of the table. This row can contain sum, average, count, etc., for each column, allowing for quick assessments of data integrity.
Data Validation Consistency
If you apply data validation to a table column (like a drop-down list), the validation rule is consistently applied to all cells in that column. This helps prevent invalid entries.
Better Integration with Power Query
If you're using Power Query for data transformation and cleaning, tables work better than ranges. Any changes made in Power Query will be applied directly to the table upon refreshing, ensuring data consistency.
Blog article at www.comdex.com.au/post/what-is-power-query
Easier to Define Named Ranges
Tables and their columns can easily be named, making them more identifiable in formulas, data validation rules, VBA code, and more.
Reduced Errors in Dynamic Data
If you use external data sources that might increase or decrease the number of rows, converting it to a table ensures that the data range updates dynamically, reducing the possibility of missing or extra data.
Improved Data Connectivity
If you're connecting Excel to external databases or importing data, tables are often better recognized and work more seamlessly with connections than standard ranges.
By converting ranges to tables, you not only facilitate the process of cleaning data but also improve the overall management, analysis, and presentation of your data in Excel.
Remember, while the examples given here are basic, real-world applications might require combining multiple functions and features to efficiently clean data in Excel.
You can prepare your content and record your screen while providing a thorough analysis. You can easily achieve this using https://www.movavi.com/support/how-to/how-to-record-screen-on-mac.html. It's a free, user-friendly, and efficient tool that will surely capture the interest of your audience. Good luck with your content creation!