Spark SQL Interview Scenarios

36 real-world PySpark & Scala problems

GitHub RepoGitHub

PySpark · Spark SQL · Scala

Master Spark SQL Interviews

36 carefully curated interview scenarios with complete solutions in PySpark and Scala. Click any scenario to explore the problem, sample data, and full solution code.

36Questions

4Languages

100%Free & Open Source

All Scenarios

36 scenarios

Equal Salary

Query to get workers who are getting equal salary

Status Change Dates

Get dates when order status changes

Sensor Value Diff

Calculate difference between consecutive sensor values

Unique Customer Addresses

List unique customers with their distinct addresses

Filter & Partition

Read, merge, filter invalid emails and write partitioned

Employee Designation

Assign Manager/Employee designation based on salary

Top Quantity Per Year

Get product with highest quantity sold per year

Cricket Match Pairs

Generate all possible match combinations between teams

Most Rank-1 Participant

Find participant who has rank=1 most number of times

Min Commission Month

Get month with minimum commission per employee

Salary Grade

Grade employees A/B/C based on salary ranges

Mask PII Data

Mask email and mobile number for data privacy

Department Count

Count number of employees per department

Total Marks

Calculate total marks across all subjects

Extend vs Append List

Difference between extend and append in Python/Scala

Remove Duplicates

Remove duplicate records based on name and salary

Merge DataFrames

Merge two employee dataframes with different columns

Reverse Each Word

Reverse each word in a string individually

Flatten Complex DF

Flatten struct and array nested columns

Generate Complex DF

Generate complex nested dataframe from flat structure

Round Trip Distance

Calculate total round trip distance between cities

Cumulative Sum

Running total of price partitioned by product

Customers Bought All

Find customers who purchased every product

User Page Journey

Collect ordered list of pages visited per user

Handle Bad Records

Read CSV and drop corrupt/malformed records on read

Source vs Target Diff

Compare two tables — mismatch, new in source/target

Salary Increment

Calculate year-on-year salary increment per employee

Grandparent Mapping

Find grandparent of each child via self join

Symmetric Difference

Find values in either table but not both (XOR)

2nd Highest Salary/Dept

Second highest salary per department with dept name

Unpivot Columns to Rows

Convert multiple columns into rows via explode

Star Rating Display

Join food and ratings, display stars as repeated *

Max Discount Tours

Family that can access most discount tour countries

Age Group Count

Count customers grouped into age buckets

IBM: Null Handling

IBM question — null count, fill with mean, filter age

Products Per Sell Date

Collect distinct products and count per sell date