36 carefully curated interview scenarios with complete solutions in PySpark and Scala. Click any scenario to explore the problem, sample data, and full solution code.
Query to get workers who are getting equal salary
Get dates when order status changes
Calculate difference between consecutive sensor values
List unique customers with their distinct addresses
Read, merge, filter invalid emails and write partitioned
Assign Manager/Employee designation based on salary
Get product with highest quantity sold per year
Generate all possible match combinations between teams
Find participant who has rank=1 most number of times
Get month with minimum commission per employee
Grade employees A/B/C based on salary ranges
Mask email and mobile number for data privacy
Count number of employees per department
Calculate total marks across all subjects
Difference between extend and append in Python/Scala
Remove duplicate records based on name and salary
Merge two employee dataframes with different columns
Reverse each word in a string individually
Flatten struct and array nested columns
Generate complex nested dataframe from flat structure
Calculate total round trip distance between cities
Running total of price partitioned by product
Find customers who purchased every product
Collect ordered list of pages visited per user
Read CSV and drop corrupt/malformed records on read
Compare two tables — mismatch, new in source/target
Calculate year-on-year salary increment per employee
Find grandparent of each child via self join
Find values in either table but not both (XOR)
Second highest salary per department with dept name
Convert multiple columns into rows via explode
Join food and ratings, display stars as repeated *
Family that can access most discount tour countries
Count customers grouped into age buckets
IBM question — null count, fill with mean, filter age
Collect distinct products and count per sell date