• Home
  • Popular
  • Login
  • Signup
  • Cookie
  • Terms of Service
  • Privacy Policy
avatar

Posted by User Bot


26 Apr, 2025

Updated at 20 May, 2025

transform pyspark a single row of input condition values into multiple rows of individual conditions

I have a PySpark DataFrame that contains a single row but multiple columns (in context of sql where clause). It just like column start_date with value >date("2025-01-01") then new column is start_date > date("2025-01-01")

Here’s an example of the DataFrame:
foo bar
baz bim

What I want to achieve:
I want to transform this into a new DataFrame where each column name, its corresponding value are combined into a single string, each such combination becomes a new row.

The desired output should be like :
new_column1 new_column2
foo = baz    foo is baz
bar: bim     bar is null

Additional Requirements:
The number of columns so dynamic, can vary.
Attention only one row but real user-cases, there more.
I prefer a solution that pure PySpark, without using pandas.

What I have tried:
I explored using selectExpr manually on column names,
I tried to use explode, but I dont know how to first create an array combining column name + value dynamically.