Find the difference between the total number of CITY entries in the table and the number of distinct CITY entries in the table. The STATION table is described as follows:
Problem Difficulty Level : Easy
Data Structure
- ID
- City
- State
- Lattitude
- Longitude
Data for station table
In Spark we will solve this problem using two ways
- Using PySpark Functions
- Using Spark SQL
Use below notebook for solution
In Postgre SQL We will load data from CSV using PostgreSQL Import functionality. And then we will solve this problem.
Output Query
Please also follow below blog for understanding this problem
