PySpark Coding Interview Questions - Practice Online
Solve the most common PySpark coding interview questions asked in Data Engineering, Data Analyst and Data Science roles. Use PySpark APIs or Spark SQL with temporary view to solve these coding questions.
You can also check the most popular conceptual questions here.
# | Title & Description | Difficulty | Tags | Status | |
---|---|---|---|---|---|
1 | Load and Transform DataPopularPractice loading a CSV file and apply basic transformations on columns. | Easy | SelectDropFiltering | ||
2 | Handling Null ValuesClean the dataset by filtering out or replacing null values in various columns. | Easy | SelectCleaningFiltering | ||
3 | Total Purchases by CustomerGroup data by customer and compute the total purchase amount per user. | Easy | SelectGroupingFilteringCasting | ||
4 | Discounts on ProductsPremiumAdd a new column calculating discounted prices for products using arithmetic operations. | Easy | SelectArithmeticCasting | ||
5 | Load & Transform JSON filePopularPremiumRead a nested JSON file and flatten it using explode and array-handling techniques. | Medium | SelectJsonExplodeArrays | ||
6 | Employee EarningsAccenturePremiumUse window functions to find employees whose salary is higher than the department average. | Hard | SelectWindowsGroupingJoiningFunctions | ||
7 | Remove Duplicates From DatasetPopularPremiumIdentify and remove duplicate records based on custom logic using window functions. | Medium | FilteringWindow FunctionsDatesGrouping | ||
8 | Word Count Program in PySparkPopularPremiumImplement a word count logic using PySpark RDD transformations on a text file. | Medium | RDDTextfileGrouping | ||
9 | Group By and Aggregate ListTiger AnalyticsPremiumGroup records and aggregate values into lists using advanced group and array functions. | Hard | GroupingAggregationFunctionsArrays | ||
10 | Monthly Transaction SummaryPremiumSummarize transactions month-wise by grouping and using date functions to extract months. | Medium | GroupingAggregationDate FunctionsTransactions | ||
11 | Top Players SummaryITC InfotechPremiumGenerate a summary of top players using joins, aggregations, and string operations. | Hard | GroupingAggregationString FunctionsJoins | ||
12 | Daily Total SalesWalmartPremiumCalculate total sales for each store on a daily basis using grouping and aggregation. | Easy | AggregationGroupingDate | ||
13 | Top 5 Products by SalesWalmartPremiumFind the top 5 products with the highest total sales across all stores for a given day. | Medium | AggregationGroupingSortingLimit | ||
14 | Products with Increasing SalesDeloittePremiumGiven two years of product sales data, identify products whose total sales revenue has increased every year. | Hard | WindowJoinFilteringPivot | ||
15 | Remove Outliers from Trip DataVISAPremiumGiven a dataset of trip costs and customer ratings, remove rows that contain outliers. | Medium | QuantilesFilteringIQRData Cleaning | ||
16 | Driver Details for RidesVISAPremiumGiven datasets of rides and drivers, join them to produce ride-level data. | Easy | JoinsSelectAlias | ||
17 | Customer Loyalty ScoreVISAPremiumCalculate customer loyalty scores based on number of trips and ratings. | Hard | JoinsAggregationFilteringGroupingArithmetic | ||
18 | Track Employee HistoryDeloittePremiumImplement SCD Type 2 logic to track historical changes in employee records. | Hard | SCD2JoinsFilteringUnion | ||
19 | Employee AttendanceTCS, InfosysPremiumTransform employee attendance records to show count of each attendance status. | Medium | PivotGroupingAggregation | ||
20 | Daily Stock Price ChangeNielsenPremiumCalculate day-over-day change in closing stock prices. | Medium | Window FunctionsLagTime Series |
1. Load and Transform Data
Practice loading a CSV file and apply basic transformations on columns.
2. Handling Null Values
Clean the dataset by filtering out or replacing null values in various columns.
3. Total Purchases by Customer
Group data by customer and compute the total purchase amount per user.
4. Discounts on Products
Add a new column calculating discounted prices for products using arithmetic operations.
5. Load & Transform JSON file
Read a nested JSON file and flatten it using explode and array-handling techniques.
6. Employee Earnings
Use window functions to find employees whose salary is higher than the department average.
7. Remove Duplicates From Dataset
Identify and remove duplicate records based on custom logic using window functions.
8. Word Count Program in PySpark
Implement a word count logic using PySpark RDD transformations on a text file.
9. Group By and Aggregate List
Group records and aggregate values into lists using advanced group and array functions.
10. Monthly Transaction Summary
Summarize transactions month-wise by grouping and using date functions to extract months.
11. Top Players Summary
Generate a summary of top players using joins, aggregations, and string operations.
12. Daily Total Sales
Calculate total sales for each store on a daily basis using grouping and aggregation.
13. Top 5 Products by Sales
Find the top 5 products with the highest total sales across all stores for a given day.
14. Products with Increasing Sales
Given two years of product sales data, identify products whose total sales revenue has increased every year.
15. Remove Outliers from Trip Data
Given a dataset of trip costs and customer ratings, remove rows that contain outliers.
16. Driver Details for Rides
Given datasets of rides and drivers, join them to produce ride-level data.
17. Customer Loyalty Score
Calculate customer loyalty scores based on number of trips and ratings.
18. Track Employee History
Implement SCD Type 2 logic to track historical changes in employee records.
19. Employee Attendance
Transform employee attendance records to show count of each attendance status.
20. Daily Stock Price Change
Calculate day-over-day change in closing stock prices.