Member-only story

How to clean a dirty Web Scrapped Data?

3 min readJun 21, 2021

In this article we will clean the data that we extracted in the previous tutorial. If you haven’t read the previous one then it is recommended to read it first by clicking here.

Just to give an overview, in this tutorial we will clean the product review data extracted from an e-commerce platform. The cleaned data will be further used to understand the sentiments of the reviewers and also to get the answers of some important questions from the seller point of view.

Let’s read the crocs_reviews.csv file (created in the last tutorial) which contains the scrapped data into a dataframe.

Pandas make it very easy to read the data from a data frame. From here we can see that the data frame contains 7 columns and 8696 rows. Each column needs to go through multiple steps for cleaning the data to bring it in a better format to carry the analysis part.

We start by cleaning the first column that is Title. Here, by looking at the dataset we can see that there are multiple ‘\n’ at the beginning and the end of every value. So we remove those values to get clear string values.

Secondly, we know from the .info method that there are null values in this column, which we will replace by 'No title'…

How to clean a dirty Web Scrapped Data?

Create an account to read the full story.

Written by Harshit Maheshwari

No responses yet

More from Harshit Maheshwari

Web-scrapping Product Reviews in 3 Minutes.

The most demanded e-commerce product scrapping.

Empowering Equality: Fight Against AI Gender Bias (Women’s Day Edition)

Artificial Intelligence is like a super-smart robot that’s learning from everything around it. But sometimes, it picks up on the unfair…

Mojo Programming Language: A Python killer?

Programming languages are a fundamental tool for developers and software engineers. They provide a way to express instructions that a…

Top 10 AI Myths Broken: Unpacking the Facts and Figures

There have been a lot of advancements in the field of AI, with this there have been a lot of misconceptions. From fears of job displacement…

Recommended from Medium

The 5 paid subscriptions I actually use in 2025 as a Staff Software Engineer

Tools I use that are cheaper than Netflix

Jeff Bezos Says the 1-Hour Rule Makes Him Smarter. New Neuroscience Says He’s Right

Jeff Bezos’s morning routine has long included the one-hour rule. New neuroscience says yours probably should too.

Lists

Predictive Modeling w/ Python

Coding & Development

Practical Guides to Machine Learning

ChatGPT prompts

I used OpenAI’s o1 model to develop a trading strategy. It is DESTROYING the market

It literally took one try. I was shocked.

Surrogate Modeling: The Secret to Faster, Smarter Engineering

Its fundamentals, capabilities, and engineering applications

The 101 Guide to the Modern Data Stack

We’ve reached the final stage of our deep dive into the modern data stack — your go-to guide for navigating the data landscape as a…

How I Am Using a Lifetime 100% Free Server

Get a server with 24 GB RAM + 4 CPU + 200 GB Storage + Always Free