Open source platform for the machine learning lifecycle
#
apache-spark
Repositories 639
Interactive and Reactive Data Science using Scala and Spark.
酷玩 Spark: Spark 源代码解析、Spark 类库等
Scala
Updated Feb 6, 2019
Oryx 2: Lambda architecture on Apache Spark, Apache Kafka for real-time large scale machine learning
Java
Updated Feb 27, 2019
PySpark + Scikit-learn = Sparkit-learn
Scikit-learn integration package for Apache Spark
Python
Updated Feb 6, 2019
C# and F# language binding and extensions to Apache Spark
spark
apache-spark
rdd
dataframe
dstream
dataset
streaming
csharp
mobius
kafka-streaming
spark-streaming
fsharp
bigdata
mapreduce
eventhubs
near-real-time
C#
Updated Apr 25, 2019
.NET for Apache® Spark™ makes Apache Spark™ easily accessible to .NET developers.
The Internals of Apache Spark
Updated Apr 6, 2019
Code examples that show to integrate Apache Kafka 0.8+ with Apache Storm 0.9+ and Apache Spark Streaming 1.1+, while …
Scala
Updated Jan 24, 2017
A curated list of awesome Apache Spark packages and resources.
Updated Apr 1, 2019
R interface for Apache Spark
R
Updated May 1, 2019
cerndb / dist-keras Archived
571
Distributed Deep Learning, with a focus on distributed training, using Keras and Apache Spark.
machine-learning
deep-learning
apache-spark
data-parallelism
distributed-optimizers
keras
optimization-algorithms
tensorflow
data-science
hadoop
Python
Updated Jul 25, 2018
Kubernetes operator for managing the lifecycle of Apache Spark applications on Kubernetes.
Go
Updated Apr 29, 2019
Apache Spark enhanced with native Kubernetes scheduler back-end
Distributed Tensorflow, Keras and BigDL on Apache Spark
apache-spark
deep-neural-network
distributed-deep-learning
keras-tensorflow
bigdl
analytics-zoo
python
scala
Jupyter Notebook
Updated Apr 29, 2019
A Cluster Computing System for Processing Large-Scale Spatial Data
REST web service for the true real-time scoring (<1 ms) of R, Scikit-Learn and Apache Spark models
Java
Updated Apr 7, 2019
A command-line tool for launching Apache Spark clusters.
Haskell on Apache Spark.
Haskell
Updated Sep 4, 2018
Streaming System 相关的论文读物
stream-processing
streaming
flink
spark-streaming
storm
heron
dataflow
drizzle
millwheel
s4
apache-spark
streaming-engine
spe
stream-processing-engine
Updated Mar 31, 2018
Wirbelsturm is a Vagrant and Puppet based tool to perform 1-click local and remote deployments, with a focus on big d…
Shell
Updated Sep 14, 2015
Code for Agile Data Science 2.0, O'Reilly 2017, Second Edition
data-syndrome
data
data-science
analytics
apache-spark
apache-kafka
kafka
spark
predictive-analytics
machine-learning
machine-learning-algorithms
airflow
python
python-3
python3
amazon-ec2
agile-data
agile-data-science
vagrant
amazon-web-services
Jupyter Notebook
Updated Apr 28, 2019
Serverless proxy for Spark cluster
A guide on how to set up Jupyter with Pyspark painlessly on AWS EC2 clusters, with S3 I/O support
spark
jupyter-notebook
aws
aws-ec2
aws-s3
ec2-instance
spark-clusters
jupyter
ebs-volumes
ec2
apache-spark
apache-spark-cluster
Jupyter Notebook
Updated Nov 3, 2017
The Internals of Spark Structured Streaming
Scala
Updated Apr 19, 2019
Spark Gotchas. A subjective compilation of the Apache Spark tips and tricks
Updated Jun 6, 2017
Easy to use library to bring Tensorflow on Apache Spark
Python
Updated Jan 9, 2019
A boilerplate for writing PySpark Jobs
Python
Updated Jan 24, 2017

