Spark SQL Interview Scenarios

GitHub
PySpark · Spark SQL · Scala

Master Spark SQL Interviews

36 carefully curated interview scenarios with complete solutions in PySpark and Scala. Click any scenario to explore the problem, sample data, and full solution code.

36Questions
4Languages
100%Free & Open Source

All Scenarios

36 scenarios
#01
pyscala

Equal Salary

Query to get workers who are getting equal salary

View Solution
#02
pyscala

Status Change Dates

Get dates when order status changes

View Solution
#03
pyscala

Sensor Value Diff

Calculate difference between consecutive sensor values

View Solution
#04
pyscala

Unique Customer Addresses

List unique customers with their distinct addresses

View Solution
#05
pyscala

Filter & Partition

Read, merge, filter invalid emails and write partitioned

View Solution
#06
pyscala

Employee Designation

Assign Manager/Employee designation based on salary

View Solution
#07
pyscala

Top Quantity Per Year

Get product with highest quantity sold per year

View Solution
#08
pyscala

Cricket Match Pairs

Generate all possible match combinations between teams

View Solution
#09
pyscala

Most Rank-1 Participant

Find participant who has rank=1 most number of times

View Solution
#10
pyscala

Min Commission Month

Get month with minimum commission per employee

View Solution
#11
pyscala

Salary Grade

Grade employees A/B/C based on salary ranges

View Solution
#12
pyscala

Mask PII Data

Mask email and mobile number for data privacy

View Solution
#13
pyscala

Department Count

Count number of employees per department

View Solution
#14
pyscala

Total Marks

Calculate total marks across all subjects

View Solution
#15
pyscala

Extend vs Append List

Difference between extend and append in Python/Scala

View Solution
#16
pyscala

Remove Duplicates

Remove duplicate records based on name and salary

View Solution
#17
pyscala

Merge DataFrames

Merge two employee dataframes with different columns

View Solution
#18
pyscala

Reverse Each Word

Reverse each word in a string individually

View Solution
#19
pyscala

Flatten Complex DF

Flatten struct and array nested columns

View Solution
#20
pyscala

Generate Complex DF

Generate complex nested dataframe from flat structure

View Solution
#21
pyscala

Round Trip Distance

Calculate total round trip distance between cities

View Solution
#22
pyscala

Cumulative Sum

Running total of price partitioned by product

View Solution
#23
pyscala

Customers Bought All

Find customers who purchased every product

View Solution
#24
pyscala

User Page Journey

Collect ordered list of pages visited per user

View Solution
#25
pyscala

Handle Bad Records

Read CSV and drop corrupt/malformed records on read

View Solution
#26
pyscala

Source vs Target Diff

Compare two tables — mismatch, new in source/target

View Solution
#27
pyscala

Salary Increment

Calculate year-on-year salary increment per employee

View Solution
#28
pyscala

Grandparent Mapping

Find grandparent of each child via self join

View Solution
#29
pyscala

Symmetric Difference

Find values in either table but not both (XOR)

View Solution
#30
ipynbscala

2nd Highest Salary/Dept

Second highest salary per department with dept name

View Solution
#31
ipynbscala

Unpivot Columns to Rows

Convert multiple columns into rows via explode

View Solution
#32
ipynbscala

Star Rating Display

Join food and ratings, display stars as repeated *

View Solution
#33
ipynbscala

Max Discount Tours

Family that can access most discount tour countries

View Solution
#34
ipynbscala

Age Group Count

Count customers grouped into age buckets

View Solution
#35
ipynbscala

IBM: Null Handling

IBM question — null count, fill with mean, filter age

View Solution
#36
ipynbscala

Products Per Sell Date

Collect distinct products and count per sell date

View Solution