Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

hammerlab/spark-tests

Open more actions menu

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

85 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

spark-tests

Build Status Coverage Status Maven Central

Utilities for writing tests that use Apache Spark.

SparkSuite: a SparkContext for each test suite

Add configuration options in subclasses using sparkConf(…), cf. KryoSparkSuite:

sparkConf(
  // Register this class as its own KryoRegistrator
  "spark.kryo.registrator"  getClass.getCanonicalName,
  "spark.serializer"  "org.apache.spark.serializer.KryoSerializer",
  "spark.kryo.referenceTracking"  referenceTracking.toString,
  "spark.kryo.registrationRequired"  registrationRequired.toString
)

PerCaseSuite: SparkContext for each test case

SparkSuite implementation that provides hooks for kryo-registration:

register(
  classOf[Foo],
  "org.foo.Bar",
  classOf[Bar]  new BarSerializer
)

Also useful for subclassing once per-project and filling in that project's default Kryo registrar, then having concrete tests subclass that; see cf. hammerlab/guacamole and hammerlab/pageant for examples.

Miscellaneous RDD / Job / Stage utilities

  • rdd.Util: make an RDD with specific elements in specific partitions.
  • NumJobsUtil: verify the number of Spark jobs that have been run.
  • RDDSerialization: interface that allows for verifying that performing a serialization+deserialization round-trip on an RDD results in the same RDD.

About

Utilities for writing tests that use Apache Spark.

Resources

License

Stars

Watchers

Forks

Languages

Morty Proxy This is a proxified and sanitized view of the page, visit original site.