PySpark Coding Interview Questions - Practice Online

Solve the most common PySpark coding interview questions asked in Data Engineering, Data Analyst and Data Science roles.
Use PySpark APIs or Spark SQL with temporary view to solve these coding questions.
You can also check the most popular conceptual questions here.

1. Load and Transform Data

Practice loading a CSV file and apply basic transformations on columns.

PopularEasySelectDropFiltering

2. Handling Null Values

Clean the dataset by filtering out or replacing null values in various columns.

BasicEasySelectCleaningFiltering

3. Total Purchases by Customer

Group data by customer and compute the total purchase amount per user.

BasicEasySelectGroupingFilteringCasting

4. Running Payroll

Help Human Resources calculate pay for each employee.

BasicMediumJoinsArithmeticConditional

5. Food and Beverage Sales

Calculate total sales quantity, revenue, and stock for each product.

BasicMediumAggregate FunctionsJoinsData Cleaning

6. Discounts on Products

Add a new column calculating discounted prices for products using arithmetic operations.

BasicEasySelectArithmeticCasting

7. Load & Transform JSON file

Read a nested JSON file and flatten it using explode and array-handling techniques.

PopularMediumSelectJsonExplodeArrays

8. Employee Earnings

Use window functions to find employees whose salary is higher than the department average.

AccentureHardSelectWindowsGroupingJoiningFunctions

9. Remove Duplicates From Dataset

Identify and remove duplicate records based on custom logic using window functions.

PopularMediumFilteringWindow FunctionsDatesGrouping

10. Word Count Program in PySpark

Implement a word count logic using PySpark RDD transformations on a text file.

PopularMediumRDDTextfileGrouping

11. Group By and Aggregate List

Group records and aggregate values into lists using advanced group and array functions.

Tiger AnalyticsHardGroupingAggregationFunctionsArrays

12. Monthly Transaction Summary

Summarize transactions month-wise by grouping and using date functions to extract months.

BasicMediumGroupingAggregationDate FunctionsTransactions

13. Top Players Summary

Generate a summary of top players using joins, aggregations, and string operations.

ITC InfotechHardGroupingAggregationString FunctionsJoins

14. Daily Total Sales

Calculate total sales for each store on a daily basis using grouping and aggregation.

WalmartEasyAggregationGroupingDate

15. Top 5 Products by Sales

Find the top 5 products with the highest total sales across all stores for a given day.

WalmartMediumAggregationGroupingSortingLimit

16. Products with Increasing Sales

Identify products whose total sales revenue has increased every year.

DeloitteHardWindowJoinFilteringPivot

17. Remove Outliers from Trip Data

Given a dataset of trip costs and customer ratings, remove rows that contain outliers.

VISAMediumQuantilesFilteringIQRData Cleaning

18. Driver Details for Rides

Given datasets of rides and drivers, join them to produce ride-level data.

VISAEasyJoinsSelectAlias

19. Customer Loyalty Score

Calculate customer loyalty scores based on number of trips and ratings.

VISAHardJoinsAggregationFilteringGroupingArithmetic

20. Track Employee History

Implement SCD Type 2 logic to track historical changes in employee records.

DeloitteHardSCD2JoinsFilteringUnion

21. Employee Attendance

Transform employee attendance records to show count of each attendance status.

TCS, InfosysMediumPivotGroupingAggregation

22. Daily Stock Price Change

Calculate day-over-day change in closing stock prices.

NielsenMediumWindow FunctionsLagTime Series

23. Analyze Review Sentiment

Use a PySpark UDF to classify reviews based on keywords.

TargetHardUDFText ProcessingString Manipulation

24. Sessionize Clickstream Data

Assign session IDs and visit numbers to user clickstream data.

NielsenHardWindow FunctionsTime DifferenceSessionization

25. Combine Transactions Data

Unify records from multiple sources with different columns

BasicEasyUnionSchema EvolutionData Integration

26. Categorize Products by Price

Classify products into categories based on price

BasicEasyConditional ColumnData TransformationCategorization

27. Aggregate Item Weight per Person

Compute total weight of each item per person

AmazonEasyAggregationGroupbyData Summarization

28. Maximize Items Within Budget

Determine the maximum number of items that can be purchased within a fixed budget using PySpark.

EPAMHardKnapsackWindow FunctionsAggregationBudget Optimization

29. Running Balance per User

Compute the running balance for each user over time.

BasicMediumWindow FunctionsCumulative SumEvent StreamTime Series

30. Top N Sales per Store

Find the Top N highest sales transactions for each store.

BasicMediumWindow FunctionsRow NumberRankingTop N

31. User Sessionization with Time

Define user sessions and calculate event counts, start and end times

NielsenHardWindow FunctionsLagCumulative SumSessionizationTime Series

32. Employee Attendance Streaks

Calculate each employee's longest consecutive attendance streak.

PayPalHardWindow FunctionsLagGroupingDate Functions

33. Items Purchased by User

Perform various operations on the array column.

BasicMediumArray Functions

34. Sales Segmentation

Categorize sales records into buckets, and compute average sales per category.

UberMediumFilteringConditional LogicAggregations

35. Add Metadata to Customer Data

Load multiple CSV files and include metadata columns.

BasicMediumMetadataDataframeFile Handling

36. Tallying Election Results

Determine the seats won by each party in the election.

DocusignHardWindowsJoinsAggregationRanking

37. Call Center

Count unique callers per date and sum total call durations.

BasicEasyAggregate FunctionsDistinct Function

38. Social Media PII

Anonymize social media data and extract email domains.

BasicMediumRegular Expressions

39. Total Rental Income

Calculate total rental income for each landlord

BasicHardString FunctionsComplex JoinsAggregate Functions

40. E-Commerce Stats

Given product and order data from an e-commerce platform, calculate required KPIs

BasicMediumAggregate FunctionsComplex Joins

Stay tuned - we're adding more interview questions!

Interested in contributing? We'd love your help - click "Help Us Improve" on the bottom right.