PySpark Coding Interview Questions - Practice Online

Solve the most common PySpark coding interview questions asked in Data Engineering, Data Analyst and Data Science roles. Use PySpark APIs or Spark SQL with temporary view to solve these coding questions.
You can also check the most popular conceptual questions here.

Your Progress
Easy0/6
Medium0/8
Hard0/6

1. Load and Transform Data

Practice loading a CSV file and apply basic transformations on columns.

PopularEasySelectDropFiltering

2. Handling Null Values

Clean the dataset by filtering out or replacing null values in various columns.

BasicEasySelectCleaningFiltering

3. Total Purchases by Customer

Group data by customer and compute the total purchase amount per user.

BasicEasySelectGroupingFilteringCasting

4. Discounts on Products

Add a new column calculating discounted prices for products using arithmetic operations.

BasicEasySelectArithmeticCasting

5. Load & Transform JSON file

Read a nested JSON file and flatten it using explode and array-handling techniques.

PopularMediumSelectJsonExplodeArrays

6. Employee Earnings

Use window functions to find employees whose salary is higher than the department average.

AccentureHardSelectWindowsGroupingJoiningFunctions

7. Remove Duplicates From Dataset

Identify and remove duplicate records based on custom logic using window functions.

PopularMediumFilteringWindow FunctionsDatesGrouping

8. Word Count Program in PySpark

Implement a word count logic using PySpark RDD transformations on a text file.

PopularMediumRDDTextfileGrouping

9. Group By and Aggregate List

Group records and aggregate values into lists using advanced group and array functions.

Tiger AnalyticsHardGroupingAggregationFunctionsArrays

10. Monthly Transaction Summary

Summarize transactions month-wise by grouping and using date functions to extract months.

BasicMediumGroupingAggregationDate FunctionsTransactions

11. Top Players Summary

Generate a summary of top players using joins, aggregations, and string operations.

ITC InfotechHardGroupingAggregationString FunctionsJoins

12. Daily Total Sales

Calculate total sales for each store on a daily basis using grouping and aggregation.

WalmartEasyAggregationGroupingDate

13. Top 5 Products by Sales

Find the top 5 products with the highest total sales across all stores for a given day.

WalmartMediumAggregationGroupingSortingLimit

14. Products with Increasing Sales

Given two years of product sales data, identify products whose total sales revenue has increased every year.

DeloitteHardWindowJoinFilteringPivot

15. Remove Outliers from Trip Data

Given a dataset of trip costs and customer ratings, remove rows that contain outliers.

VISAMediumQuantilesFilteringIQRData Cleaning

16. Driver Details for Rides

Given datasets of rides and drivers, join them to produce ride-level data.

VISAEasyJoinsSelectAlias

17. Customer Loyalty Score

Calculate customer loyalty scores based on number of trips and ratings.

VISAHardJoinsAggregationFilteringGroupingArithmetic

18. Track Employee History

Implement SCD Type 2 logic to track historical changes in employee records.

DeloitteHardSCD2JoinsFilteringUnion

19. Employee Attendance

Transform employee attendance records to show count of each attendance status.

TCS, InfosysMediumPivotGroupingAggregation

20. Daily Stock Price Change

Calculate day-over-day change in closing stock prices.

NielsenMediumWindow FunctionsLagTime Series

Stay tuned - we're adding more interview questions!

Interested in contributing? We'd love your help - click "Help Us Improve" on the bottom right.