spark-app-example

Main Goal

Create a local but iso to production environnement to be as autonomous as possible while working on spark projects.

How ?

this project contain all needed configuration files to create :

Dockerized environnement
Local but a real distributed environnement
- 1 Namenode
- 1 Datanode (to increase as you wish)
- Yarn resource manager
- 3 Yarn node managers
- Yarn hitory server
- Spark history
- Spark shell
Line up with exact Hadoop components version on production
Deployment to dockerized cluster via sbt command line
Mount data to hdfs via docker volumes from withing project folder
Access spark history webui for inspection :)
Access Yarn logs for debugging :)
Access to Spark shell for fiddling :)

prerequisite

add these localhost aliases to /etc/hosts

echo "127.0.0.1       namenode datanode resourcemanager nodemanager nodemanager-1 nodemanager-2 nodemanager-3 historyserver spark-master spark-worker spark-history" >> /etc/hosts

how to run

# start up the cluster if already has been built
docker-compose up -d

Load data into hdfs

# Load dev data placed in the data directory into hdfs
docker exec -it namenode bash /scripts/hdfs-loader.sh

Run spark job in the cluster via sbt

sbt
;clean;reload;compile;docker;dockerComposeUp

Run Spark shell connected to yarn cluster

docker exec -it spark-shell /spark/bin/spark-shell

Check Yarn history

chrome|firefox http://localhost:8188

Check Spark history

chrome|firefox http://localhost:18080

Check hadoop hdfs namenode

chrome|firefox http://localhost:9870

stop, remove, clean volumes and network of all cluster

docker stop $(docker ps -a -q) && docker rm $(docker ps -a -q) && docker volume prune -f && docker network prune -f

Name	Name	Last commit message	Last commit date
Latest commit History 1 Commit
cluster	cluster
images	images
project	project
src	src
.gitignore	.gitignore
.scalafmt.conf	.scalafmt.conf
README.md	README.md
build.sbt	build.sbt
docker-compose.tpl.yml	docker-compose.tpl.yml
docker-compose.yml	docker-compose.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

spark-app-example

Main Goal

How ?

prerequisite

add these localhost aliases to /etc/hosts

how to run

Load data into hdfs

Run spark job in the cluster via sbt

Run Spark shell connected to yarn cluster

Check Yarn history

Check Spark history

Check hadoop hdfs namenode

stop, remove, clean volumes and network of all cluster

About

Uh oh!

Releases

Packages

Languages

Search code, repositories, users, issues, pull requests...

s3ni0r/spark-app-example-with-local-hadoop-cluster

Folders and files

Latest commit

History

Repository files navigation

spark-app-example

Main Goal

How ?

prerequisite

add these localhost aliases to /etc/hosts

how to run

Load data into hdfs

Run spark job in the cluster via sbt

Run Spark shell connected to yarn cluster

Check Yarn history

Check Spark history

Check hadoop hdfs namenode

stop, remove, clean volumes and network of all cluster

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages