# Replace a String using regex_replace in PySpark

In Apache Spark, there is a built-in function called regexp\_replace in `org.apache.spark.sql.functions` package which is a string function that is used to replace part of a string (substring) value with another string on the DataFrame column by using regular expression (regex). This function returns an `org.apache.spark.sql.Column` type after replacing a string value.

Consider you have a dataset as shown below

```python
records = [("KAR","$1"),("TN","$2"),("KAR","$3")]
```

Now let's create a DataFrame

```python
df = spark.createDataFrame(records).toDF("state","amount_in_$")
```

Now we need to perform some transformation on the data like we need to find the sum of amount for each region. If you observe the dataset we have '$' in the amount column.

```python
df.show()
```

![](https://cdn.hashnode.com/res/hashnode/image/upload/v1679931021958/c1ac07c1-dcfd-49f8-b9e5-f55327fa56d6.png align="center")

Now lets use regexp\_replace to replace '$' with '' and apply aggregation logic.

```python
resultDF = df.withColumn("amount",regexp_replace(col("amount_in_$"),"\\$",""))

resultDF.groupBy("state").agg(sum("amount")).show()
```

![](https://cdn.hashnode.com/res/hashnode/image/upload/v1679931101889/f6cae79b-7b2e-4152-987e-90d0a9eff9b8.png align="center")
