String Manipulation in PySpark!

In the world of data processing and analysis, data cleanliness is paramount. That's where PySpark's trim, ltrim, and rtrim functions come into play! They're your trusty allies for tidying up strings in DataFrames.
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName("SparkDemoApp").getOrCreate()
data = [(" Java ",), (" Scala ",), (" Python ",)]
df = spark.createDataFrame(data, ["languages"])

Using trim()
Trim leading and trailing spaces
from pyspark.sql.functions import trim, col
df = df.withColumn("cleaned_data", trim(col("languages")))
df.show()

Using .ltrim()
Trim leading spaces
from pyspark.sql.functions import ltrim, col
df = df.withColumn("cleaned_data", ltrim(col("languages")))
df.show()

Using .rtrim()
Trim white spaces at the end
df = df.withColumn("cleaned_data", rtrim(col("languages")))
df.show()

Do you want to connect with me I have started mentoring for career and interviews at ππ¨π©π¦πππ.π’π¨/π§ππ―πππ§π©π§



