Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Latest commit

 

History

History
History

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 

README.md

Outline

Problems 3 -> Difference between total number of cities and distinct cities

Find the difference between the total number of CITY entries in the table and the number of distinct CITY entries in the table. The STATION table is described as follows:

Problem Difficulty Level : Easy

Data Structure

  • ID
  • City
  • State
  • Lattitude
  • Longitude

image

Data for station table

In CSV Format

Solving using PySpark

In Spark we will solve this problem using two ways

  1. Using PySpark Functions
  2. Using Spark SQL

Use below notebook for solution

Problem Solution First Part

Solving using PostgreSQL

In Postgre SQL We will load data from CSV using PostgreSQL Import functionality. And then we will solve this problem.

Output Query

Problem Solution

Please also follow below blog for understanding this problem

Morty Proxy This is a proxified and sanitized view of the page, visit original site.