How Spark does Class Loading

Using the spark shell, one can define classes on-the-fly and then use these classes in your distributed computation.

Contrived Example
scala> class Vector2D(val x: Double, val y: Double) extends Serializable {
| def length = Math.sqrt(x*x + y*y)
| }
defined class Vector2D
scala> val sourceRDD = sc.parallelize(Seq((3,4), (5,12), (8,15), (7,24)))
sourceRDD: org.apache.spark.rdd.RDD[(Int, Int)] = ParallelCollectionRDD[5] at parallelize at :13
scala> => new Vector2D(x._1, x._2)).map(_.length).collect()
14/03/30 09:21:59 INFO SparkContext: Starting job: collect at :17
res1: Array[Double] = Array(5.0, 13.0, 17.0, 25.0)

In order for the remote executors here to actually run your code, they must have knowledge of the Vector2D class, yet they’re running on a different JVM (and probably different physical machine). How do they get it?

  • we choose a directory on disk to store the class files
  • a virtual directory is created at SparkIMain:101
  • a scala compiler is instantiated with this directory as the output directory at SparkIMain:299
  • this means that whenever a class is defined in the REPL, the class file is written to disk
  • a http server is created to serve the contents of this directory at SparkIMain:102
  • we can see info about the Http server in the logs:
    14/03/23 23:39:21 INFO HttpFileServer: HTTP File server directory is /var/folders/8t/bc2vylk13j14j13cccpv9r6r0000gn/T/spark-1c7fbed7-5c87-4c2c-89e8-be95c2c7ac54
    14/03/23 23:39:21 INFO Executor: Using REPL class URI:
  • the http server url is stored in the Spark Config, which is shipped out to the executors
  • the executors install a URL Classloader, pointing at the Http Class Server at Executor:74

For the curious, we can figure out what the url of a particular class is and then go check it out in a browser/with curl.

def urlOf[T:ClassTag] = {
   val clazz = implicitly[ClassTag[T]].erasure

Do it yourself

It’s pretty trivial to replicate this ourselves – in Spark’s case we have a scala compiler which writes the files to disk, but assuming we want to serve classes from a fairly normal JVM with a fairly standard classloader, we don’t even need to bother with the to disk. We can grab the class file using getResourceAsStream. It also doesn’t require any magic of scala – an example class server in java using Jetty:

class ClasspathClassServer {
	private Server server = null;
	private int port = -1;

	void start() throws Exception {
		System.out.println("Starting server...");
		if(server != null) {

		server = new Server();
		NetworkTrafficSelectChannelConnector connector = new NetworkTrafficSelectChannelConnector(server);

		ClasspathResourceHandler classpath = new ClasspathResourceHandler();


		port = connector.getLocalPort();
		System.out.println("Running on port " + port);

	class ClasspathResourceHandler extends AbstractHandler {
		public void handle(String target, Request baseRequest, HttpServletRequest request, HttpServletResponse response)
					throws IOException, ServletException {
			System.out.println("Serving target: " + target);

			try {
				Class<?> clazz = Class.forName(target.substring(1));
				InputStream classStream = clazz.getResourceAsStream('/' + clazz.getName().replace('.', '/') + ".class");


				OutputStream os = response.getOutputStream();

				IOUtils.copy(classStream, os);
			} catch(Exception e) {
				System.out.println("Exception: " + e.getMessage());

It’s then just a matter of setting up a URL Classloader on the other side!

Further Examples An example of using a similar technique to write a ‘compute server’ in scala – somewhat akin to a very stripped down version of Spark.

How Spark does Class Loading

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s