Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Latest commit

 

History

History
History

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 

README.md

Outline

Problems 8 -> Top distance travelled

Find the top 10 users that have traveled the least distance. Output their id, name and a total distance traveled.

Problem Difficulty Level : Medium

Data Structure ride_log

  • id
  • user_id
  • travel

user

  • id
  • name

image

Data for ride_log and user table

In CSV Format
In CSV Format

Solving using PySpark

In Spark we will solve this problem using two ways

  1. Using PySpark Functions
  2. Using Spark SQL

Use below notebook for solution

Problem Solution First Part

Solving using MySQL

In MySQL We will load data from CSV using MySQL Import functionality. And then we will solve this problem.

Output Query

Problem Solution

Please also follow below blog for understanding this problem

Morty Proxy This is a proxified and sanitized view of the page, visit original site.