Apache Spark™ plays a critical role in the adoption and evolution of Big Data technologies. Despite some problems with Spark deployment and use that are occasionally encountered, it still provides more sophisticated ways for enterprises to leverage big data compared to Hadoop. The increasing amount of data being analyzed and processed through the framework is massive and continues to push the boundaries of the engine.
There are a few challenges and problems with Spark, one of which being library version conflicts between dependencies. How do you overcome this and maximize the value you are getting from Spark? Let Pepperdata Field Engineer Alexander Pierce explain this to you in detail in our Apache Spark tutorial webinar. You can also get an overview of the answer below:
Pierce’s advice is to make sure that any external dependencies and classes you are bringing in do not conflict with internal libraries used by your version of Spark, or those that are available in the environment you are using. For example, many developers may use Google’s Protocol Buffers. Protocol Buffers, (or Protobuf for short), which are Google’s language-neutral, platform-neutral, extensible mechanism for serializing structured data. It is a popular binary format for storing and transporting data, and is just like XML, albeit smaller, faster, and simpler. In this example, let’s say you want to use the getUnmodifiableView() function. That function is only available in Protobuf 2.6.0, while most Hadoop implementations are delivered with Protobuf 2.5.0. Ultimately, you would need to shade the jar while building your project to avoid conflicts in which Protobuf is being used by your application.
Watch our Apache Spark tutorial webinar with Pepperdata Engineer Alex Pierce to learn more about how to overcome other problems with Spark. This rich learning experience will also help you to improve the usability and supportability of your Spark systems.