I have a PySpark DataFrame that contains a single row but multiple columns (in context of sql where clause). It just like column start_date with value >date("2025-01-01") then new column is start_date > date("2025-01-01")
Here’s an example of the DataFrame:
foo bar
baz bim
What I want to achieve:
I want to transform this into a new DataFrame where each column name, its corresponding value are combined into a single string, each such combination becomes a new row.
The desired output should be like :
new_column1 new_column2
foo = baz foo is baz
bar: bim bar is null
Additional Requirements:
The number of columns so dynamic, can vary.
Attention only one row but real user-cases, there more.
I prefer a solution that pure PySpark, without using pandas.
What I have tried:
I explored using selectExpr manually on column names,
I tried to use explode, but I dont know how to first create an array combining column name + value dynamically.