Company-Wise PySpark Coding Interview Questions
Solve the most common PySpark coding interview questions asked in Data Engineering, Data Analyst and Data Science roles.
You can also check the most popular conceptual questions here.
# | Title & Description | Companies | Difficulty | Tags |
---|---|---|---|---|
1 | Load and Transform Data Practice loading a CSV file and apply basic transformations on columns. | Popular | Easy | SelectDropFiltering |
2 | Handling Null Values Clean the dataset by filtering out or replacing null values in various columns. | Basic | Easy | SelectCleaningFiltering |
3 | Total Purchases by Customer Group data by customer and compute the total purchase amount per user. | Basic | Easy | SelectGroupingFilteringCasting |
4 | Discounts on Products Add a new column calculating discounted prices for products using arithmetic operations. | Basic | Easy | SelectArithmeticCasting |
5 | Load & Transform JSON file Read a nested JSON file and flatten it using explode and array-handling techniques. | Popular | Medium | SelectJsonExplodeArrays |
6 | Employee Earnings Use window functions to find employees whose salary is higher than the department average. | Accenture | Hard | SelectWindowsGroupingJoiningFunctions |
7 | Remove Duplicates From Dataset Identify and remove duplicate records based on custom logic using window functions. | Popular | Medium | FilteringWindow FunctionsDatesGrouping |
8 | Word Count Program in PySpark Implement a word count logic using PySpark RDD transformations on a text file. | Popular | Medium | RddTextfileGrouping |
9 | Group By and Aggregate List Group records and aggregate values into lists using advanced group and array functions. | Tiger Analytics | Hard | GroupingAggregationFunctionsArrays |
10 | Monthly Transaction Summary Summarize transactions month-wise by grouping and using date functions to extract months. | Basic | Medium | GroupingAggregationDate FunctionsTransactions |
11 | Top Players Summary Generate a summary of top players using joins, aggregations, and string operations. | ITC Infotech | Hard | GroupingAggregationString FunctionsJoins |
12 | Daily Total Sales Calculate total sales for each store on a daily basis using grouping and aggregation. | Walmart | Easy | AggregationGroupingDate |
13 | Top 5 Products by Sales Find the top 5 products with the highest total sales across all stores for a given day. | Walmart | Medium | AggregationGroupingSortingLimit |
14 | Products with Increasing Sales Given two years of product sales data, identify products whose total sales revenue has increased every year. | Deloitte | Hard | WindowJoinFilteringPivot |
15 | Remove Outliers from Trip Data Given a dataset of trip costs and customer ratings, remove rows that contain outliers. | VISA | Medium | QuantilesFilteringIQRData Cleaning |
16 | Driver Details for Rides Given datasets of rides and drivers, join them to produce ride-level data. | VISA | Easy | JoinsSelectAlias |
17 | Customer Loyalty Score Calculate customer loyalty scores based on number of trips and ratings. | VISA | Hard | JoinsAggregationFilteringGroupingArithmetic |
18 | Track Employee History Implement SCD Type 2 logic to track historical changes in employee records. | Deloitte | Hard | SCD2JoinsFilteringUnion |