Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

s3ni0r/spark-app-example-with-local-hadoop-cluster

Open more actions menu

Repository files navigation

spark-app-example

Main Goal

Create a local but iso to production environnement to be as autonomous as possible while working on spark projects.

How ?

this project contain all needed configuration files to create :

  • Dockerized environnement
  • Local but a real distributed environnement
    • 1 Namenode
    • 1 Datanode (to increase as you wish)
    • Yarn resource manager
    • 3 Yarn node managers
    • Yarn hitory server
    • Spark history
    • Spark shell
  • Line up with exact Hadoop components version on production
  • Deployment to dockerized cluster via sbt command line
  • Mount data to hdfs via docker volumes from withing project folder
  • Access spark history webui for inspection :)
  • Access Yarn logs for debugging :)
  • Access to Spark shell for fiddling :)

prerequisite

add these localhost aliases to /etc/hosts

echo "127.0.0.1       namenode datanode resourcemanager nodemanager nodemanager-1 nodemanager-2 nodemanager-3 historyserver spark-master spark-worker spark-history" >> /etc/hosts

how to run

# start up the cluster if already has been built
docker-compose up -d

cluster docker containers

Load data into hdfs

# Load dev data placed in the data directory into hdfs
docker exec -it namenode bash /scripts/hdfs-loader.sh

Hdfs data load

Run spark job in the cluster via sbt

sbt
;clean;reload;compile;docker;dockerComposeUp

Sbt run

Run Spark shell connected to yarn cluster

docker exec -it spark-shell /spark/bin/spark-shell

Spark shell

Check Yarn history

chrome|firefox http://localhost:8188

Yarn hisotry

Check Spark history

chrome|firefox http://localhost:18080

Spark history

Check hadoop hdfs namenode

chrome|firefox http://localhost:9870

hdfs-ui

stop, remove, clean volumes and network of all cluster

docker stop $(docker ps -a -q) && docker rm $(docker ps -a -q) && docker volume prune -f && docker network prune -f

About

Example of a Spark job project with a local but distributed cluster based on docker [ generated via giter8 template @ ]

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published
Morty Proxy This is a proxified and sanitized view of the page, visit original site.