Cleaning Data for Effective Data Science
- Paperback: 498 pages
- Publisher: WOW! eBook (April 9, 2021)
- Language: English
- ISBN-10: 1801071292
- ISBN-13: 978-1801071291
It is something of a truism in data science, data analysis, or machine learning that most of the effort needed to achieve your actual purpose lies in cleaning your data. Written in David’s signature friendly and humorous style, this book discusses in detail the essential steps performed in every production data science or data analysis pipeline and prepares you for data visualization and modeling results.
The Cleaning Data for Effective Data Science book dives into the practical application of tools and techniques needed for data ingestion, anomaly detection, value imputation, and feature engineering. It also offers long-form exercises at the end of each chapter to practice the skills acquired.
You will begin by looking at data ingestion of data formats such as JSON, CSV, SQL RDBMSes, HDF5, NoSQL databases, files in image formats, and binary serialized data structures. Further, the book provides numerous example data sets and data files, which are available for download and independent exploration.
Moving on from formats, you will impute missing values, detect unreliable data and statistical anomalies, and generate synthetic features that are necessary for successful data analysis and visualization goals.
- How to think carefully about your data and ask the right questions
- Identify problem data pertaining to individual data points
- Detect problem data in the systematic “shape” of the data
- Remediate data integrity and hygiene problems
- Prepare data for analytic and machine learning tasks
- Impute values into missing or unreliable data
- Generate synthetic features that are more amenable to data science, data analysis, or visualization goals
By the end of this Cleaning Data for Effective Data Science book, you will have acquired a firm understanding of the data cleaning process necessary to perform real-world data science and machine learning tasks.