Skip to main content

Command Palette

Search for a command to run...

String Manipulation in PySpark!

Updated
β€’1 min read
String Manipulation in PySpark!

In the world of data processing and analysis, data cleanliness is paramount. That's where PySpark's trim, ltrim, and rtrim functions come into play! They're your trusty allies for tidying up strings in DataFrames.

from pyspark.sql import SparkSession

spark = SparkSession.builder.appName("SparkDemoApp").getOrCreate()
data = [(" Java ",), (" Scala ",), (" Python ",)]
df = spark.createDataFrame(data, ["languages"])

Using trim()

Trim leading and trailing spaces

from pyspark.sql.functions import trim, col
df = df.withColumn("cleaned_data", trim(col("languages")))
df.show()

Using .ltrim()

Trim leading spaces

from pyspark.sql.functions import ltrim, col
df = df.withColumn("cleaned_data", ltrim(col("languages")))
df.show()

Using .rtrim()

Trim white spaces at the end

df = df.withColumn("cleaned_data", rtrim(col("languages")))
df.show()

Do you want to connect with me I have started mentoring for career and interviews at 𝐭𝐨𝐩𝐦𝐚𝐭𝐞.𝐒𝐨/𝐧𝐚𝐯𝐞𝐞𝐧𝐩𝐧

More from this blog

Naveen P.N's Tech Blog

94 posts