Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Suresh-28/-Layoffs-Data-Cleaning-Project-SQL-

Open more actions menu

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 

Repository files navigation

Layoffs Data Cleaning Project

Cleaned and standardized a raw layoffs dataset using SQL for analysis and reporting purposes

Tools & Technologies Used:

  • MySQL 8.0+
  • SQL Queries (CTEs, Window Functions, JOINs)
  • STR_TO_DATE(), TRIM(), LIKE

Dataset Source:

Step-by-Step Cleaning Process:

Step 1: Remove Duplicates

  • Created layoffs_staging as a copy of original table.
  • Added row_num using ROW_NUMBER() PARTITION BY company, industry, total_laid_off, percentage_laid_off, date.
  • Created layoffs_staging2 with row_num column.
  • Deleted all rows where row_num > 1.

Step 2: Standardize Data

  • Trimmed whitespace from company names: UPDATE layoffs_staging2 SET company = TRIM(company);
  • Standardized crypto-related industries to 'Crypto': UPDATE layoffs_staging2 SET industry = 'Crypto' WHERE industry LIKE 'cry%';
  • Fixed country names: Updated "United States." → "United States" using TRIM(TRAILING '.' FROM country).
  • Converted date from TEXT ('MM/DD/YYYY') to DATE type: UPDATE layoffs_staging2 SET date = STR_TO_DATE(date, '%m/%d/%Y'); ALTER TABLE layoffs_staging2 MODIFY COLUMN date DATE;

Step 3: Handle Null / Blank Values

  • Changed empty strings in industry to NULL: UPDATE layoffs_staging2 SET industry = NULL WHERE industry = '';
  • Filled missing industry values using self-JOIN on company: UPDATE layoffs_staging2 lay2 JOIN layoffs_staging2 lay1 ON lay1.company = lay2.company SET lay1.industry = lay2.industry WHERE (lay1.industry IS NULL OR lay1.industry = '') AND lay2.industry IS NOT NULL;

Step 4: Remove Unnecessary Columns

  • Kept all columns (none removed); row_num was dropped after deduplication.

Final Table Structure (layoffs_staging2):

  • company (TEXT)
  • location (TEXT)
  • industry (TEXT) — standardized
  • total_laid_off (INT)
  • percentage_laid_off (TEXT)
  • date (DATE)
  • stage (TEXT)
  • country (TEXT) — cleaned
  • funds_raised_millions (INT)

Sample Queries:

Top 5 Industries by Total Laid Off: SELECT industry, SUM(total_laid_off) AS total_laid_off FROM layoffs_staging2 WHERE total_laid_off IS NOT NULL GROUP BY industry ORDER BY total_laid_off DESC LIMIT 5;

Monthly Layoff Trends: SELECT YEAR(date) AS year, MONTH(date) AS month, SUM(total_laid_off) AS monthly_layoffs FROM layoffs_staging2 WHERE date IS NOT NULL GROUP BY year, month ORDER BY year, month;

Highest % Laid Off by Company: SELECT company, percentage_laid_off, total_laid_off, country FROM layoffs_staging2 WHERE percentage_laid_off IS NOT NULL ORDER BY CAST(percentage_laid_off AS DECIMAL(5,2)) DESC LIMIT 10;

Key Insights:

  • Crypto had the highest total layoffs (>100k).
  • US, India, Germany were top countries.
  • Peak layoffs occurred Q1–Q2 2022.

Next Steps:

  • Build dashboard in Tableau/Power BI.
  • Join with economic data (interest rates, GDP).
  • Add Python EDA notebook.

Acknowledgments:

  • Kaggle for dataset
  • Public reports from TechCrunch, Bloomberg, LinkedIn

License: Educational use only.

Contact: Email: your.email@example.com LinkedIn: linkedin.com/in/yourprofile GitHub: github.com/yourgithubusername

Star ⭐ this repo if you found it helpful!

About

Cleaned and standardized a raw layoffs dataset using SQL for analysis and reporting purposes.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published
Morty Proxy This is a proxified and sanitized view of the page, visit original site.