Skip to main content

Command Palette

Search for a command to run...

SparkContext vs SparkSession

Updated
โ€ข2 min read
SparkContext vs SparkSession
N

I am a Tech Enthusiast having 13+ years of experience in ๐ˆ๐“ as a ๐‚๐จ๐ง๐ฌ๐ฎ๐ฅ๐ญ๐š๐ง๐ญ, ๐‚๐จ๐ซ๐ฉ๐จ๐ซ๐š๐ญ๐ž ๐“๐ซ๐š๐ข๐ง๐ž๐ซ, ๐Œ๐ž๐ง๐ญ๐จ๐ซ, with 12+ years in training and mentoring in ๐’๐จ๐Ÿ๐ญ๐ฐ๐š๐ซ๐ž ๐„๐ง๐ ๐ข๐ง๐ž๐ž๐ซ๐ข๐ง๐ , ๐ƒ๐š๐ญ๐š ๐„๐ง๐ ๐ข๐ง๐ž๐ž๐ซ๐ข๐ง๐ , ๐“๐ž๐ฌ๐ญ ๐€๐ฎ๐ญ๐จ๐ฆ๐š๐ญ๐ข๐จ๐ง ๐š๐ง๐ ๐ƒ๐š๐ญ๐š ๐’๐œ๐ข๐ž๐ง๐œ๐ž. I have ๐’•๐’“๐’‚๐’Š๐’๐’†๐’… ๐’Ž๐’๐’“๐’† ๐’•๐’‰๐’‚๐’ 10,000+ ๐‘ฐ๐‘ป ๐‘ท๐’“๐’๐’‡๐’†๐’”๐’”๐’Š๐’๐’๐’‚๐’๐’” and ๐’„๐’๐’๐’…๐’–๐’„๐’•๐’†๐’… ๐’Ž๐’๐’“๐’† ๐’•๐’‰๐’‚๐’ 500+ ๐’•๐’“๐’‚๐’Š๐’๐’Š๐’๐’ˆ ๐’”๐’†๐’”๐’”๐’Š๐’๐’๐’” in the areas of ๐’๐จ๐Ÿ๐ญ๐ฐ๐š๐ซ๐ž ๐ƒ๐ž๐ฏ๐ž๐ฅ๐จ๐ฉ๐ฆ๐ž๐ง๐ญ, ๐ƒ๐š๐ญ๐š ๐„๐ง๐ ๐ข๐ง๐ž๐ž๐ซ๐ข๐ง๐ , ๐‚๐ฅ๐จ๐ฎ๐, ๐ƒ๐š๐ญ๐š ๐€๐ง๐š๐ฅ๐ฒ๐ฌ๐ข๐ฌ, ๐ƒ๐š๐ญ๐š ๐•๐ข๐ฌ๐ฎ๐š๐ฅ๐ข๐ณ๐š๐ญ๐ข๐จ๐ง๐ฌ, ๐€๐ซ๐ญ๐ข๐Ÿ๐ข๐œ๐ข๐š๐ฅ ๐ˆ๐ง๐ญ๐ž๐ฅ๐ฅ๐ข๐ ๐ž๐ง๐œ๐ž ๐š๐ง๐ ๐Œ๐š๐œ๐ก๐ข๐ง๐ž ๐‹๐ž๐š๐ซ๐ง๐ข๐ง๐ . I am interested in ๐ฐ๐ซ๐ข๐ญ๐ข๐ง๐  ๐›๐ฅ๐จ๐ ๐ฌ, ๐ฌ๐ก๐š๐ซ๐ข๐ง๐  ๐ญ๐ž๐œ๐ก๐ง๐ข๐œ๐š๐ฅ ๐ค๐ง๐จ๐ฐ๐ฅ๐ž๐๐ ๐ž, ๐ฌ๐จ๐ฅ๐ฏ๐ข๐ง๐  ๐ญ๐ž๐œ๐ก๐ง๐ข๐œ๐š๐ฅ ๐ข๐ฌ๐ฌ๐ฎ๐ž๐ฌ, ๐ซ๐ž๐š๐๐ข๐ง๐  ๐š๐ง๐ ๐ฅ๐ž๐š๐ซ๐ง๐ข๐ง๐  new subjects.

SparkContext and SparkSession are two important components in Apache Spark, but they serve different purposes.

SparkContext

  • SparkContext (sc) is the entry point for interacting with Spark and represents the connection to a Spark cluster.

  • It was the main entry point in earlier versions of Spark (1.x and 2.x), and it is still available in Spark 3.x for backward compatibility.

  • SparkContext provides access to the underlying Spark functionality and allows you to create RDDs (Resilient Distributed Datasets), which are the fundamental data structure in Spark.

  • However, with the introduction of DataFrame and Dataset APIs, SparkContext is considered a lower-level API and is generally not recommended for use in new applications.

SparkSession

  • SparkSession is the entry point for working with structured data in Spark and is the recommended entry point for applications starting from Spark 2.x.

  • It encapsulates SparkContext and provides a higher-level API that supports both structured and unstructured data processing.

  • SparkSession provides a unified interface for working with different data sources, such as CSV, Parquet, JSON, databases, etc.

  • It enables the use of DataFrames and Datasets, which are distributed collections of data organized into named columns, providing a more efficient and expressive way to work with structured data.

  • SparkSession also includes various utility functions for working with data, such as reading, writing, querying, and manipulating data.

In summary, while SparkContext is the entry point for interacting with Spark and creating RDDs, SparkSession is the higher-level entry point for structured data processing, providing a unified API and supporting DataFrames and Datasets. SparkSession is generally the preferred choice for new Spark applications, as it provides more powerful abstractions and simplifies the overall development experience.


If you like my work and want to support meโ€ฆ

  1. I share tips, tricks and insights on #softwareengineering, #dataengineering #cloud #ml on LinkedIn.

  2. Do you want to connect with me, I have started mentoring others for career and interviews at ๐ญ๐จ๐ฉ๐ฆ๐š๐ญ๐ž.๐ข๐จ/๐ง๐š๐ฏ๐ž๐ž๐ง๐ฉ๐ง

More from this blog

Naveen P.N's Tech Blog

94 posts