Skip to Content
InterviewPandas Interview

Pandas Interview Questions

3 Sample Questions25 Total Questions

Pandas problems frequently asked in data analyst coding tests. Each problem is structured as: Problem โ†’ Answer โ†’ Explanation โ†’ Interview Points.


๐ŸŸข Sample Questions (3/25)

Question 1. Basic Filtering and Sorting

Beginner
Time Limit: 5 minutes

[Problem] Extract data that satisfies the following conditions:

  • Category is โ€˜Jeansโ€™ or โ€˜Dressesโ€™
  • Sale price is $50 or more
  • Sort by sale price in descending order
import pandas as pd DATA_PATH = '/data/' order_items = pd.read_csv(DATA_PATH + 'src_order_items.csv') products = pd.read_csv(DATA_PATH + 'src_products.csv') df = order_items.merge(products, on='product_id') # Write your code here

โœ… Answer Code

# Method 1: Boolean indexing result = df[ (df['category'].isin(['Jeans', 'Dresses'])) & (df['sale_price'] >= 50) ].sort_values('sale_price', ascending=False) # Method 2: Using query() result = df.query( "category in ['Jeans', 'Dresses'] and sale_price >= 50" ).sort_values('sale_price', ascending=False) print(f"Result: {len(result):,} rows") print(result[['category', 'sale_price', 'name']].head())

๐Ÿ“– Detailed Explanation

query() vs Boolean Indexing:

MethodProsCons
Boolean indexingMost basic, always worksReadability decreases with many conditions
query()SQL-style, better readabilityString parsing overhead
loc[]Label-based, supports slicingLimited condition expression

isin() Tips:

# โŒ Repeating OR operators df[(df['cat'] == 'A') | (df['cat'] == 'B') | (df['cat'] == 'C')] # โœ… Using isin() df[df['cat'].isin(['A', 'B', 'C'])]

Interviewer Point:

โ€œWhatโ€™s the performance difference between query() and boolean indexing?โ€ โ†’ Mostly similar, choose based on readability


Question 2. GroupBy Basics

Beginner
Time Limit: 5 minutes

[Problem] Calculate total revenue, average price, and product count by brand, then output the top 10 by total revenue.

โœ… Answer Code

result = df.groupby('brand').agg( total_revenue=('sale_price', 'sum'), avg_price=('sale_price', 'mean'), product_count=('product_id', 'nunique') ).round(2).sort_values('total_revenue', ascending=False).head(10) print(result)

๐Ÿ“– Detailed Explanation

3 agg() Syntax Options:

# 1. Dictionary method (column: function) df.groupby('brand')['sale_price'].agg({'total': 'sum', 'average': 'mean'}) # 2. Named Aggregation (recommended, pandas 0.25+) df.groupby('brand').agg( total=('sale_price', 'sum'), average=('sale_price', 'mean') ) # 3. List method (multiple functions) df.groupby('brand')['sale_price'].agg(['sum', 'mean', 'count'])

nunique vs count:

  • nunique(): Count of unique values (excluding duplicates)
  • count(): Row count excluding NULL

Question 3. Missing Value Handling

Beginner
Time Limit: 5 minutes

[Problem] Check missing value status, fill numeric columns with median and categorical columns with mode.

โœ… Answer Code

# Missing value status print("=== Missing Value Status ===") print(df.isnull().sum()[df.isnull().sum() > 0]) # Numeric columns: median numeric_cols = df.select_dtypes(include=['number']).columns for col in numeric_cols: if df[col].isnull().sum() > 0: median_val = df[col].median() df[col] = df[col].fillna(median_val) print(f"{col}: filled with median({median_val:.2f})") # Categorical columns: mode object_cols = df.select_dtypes(include=['object']).columns for col in object_cols: if df[col].isnull().sum() > 0: mode_val = df[col].mode()[0] df[col] = df[col].fillna(mode_val) print(f"{col}: filled with mode({mode_val})")

๐Ÿ“– Detailed Explanation

Missing Value Handling Strategies:

SituationMethodCode
Normal distributionMeanfillna(df['col'].mean())
Skewed distributionMedianfillna(df['col'].median())
CategoricalModefillna(df['col'].mode()[0])
Time seriesForward/backward fillfillna(method='ffill')
Deletedropnadropna(subset=['col'])

Note: inplace vs assignment

# โŒ Modifying original (not recommended) df.fillna(0, inplace=True) # โœ… Assigning to new object (recommended) df = df.fillna(0)

Interviewer Point:

โ€œWhy did you choose median over mean?โ€ โ†’ To minimize the impact of outliers


๐Ÿ”’ Premium Questions (22 Questions)

All 25 Questions Breakdown

DifficultyQuestionsMain Topics
๐ŸŸข Beginner8 questionsFiltering, GroupBy, Missing Values, Merge, String, DateTime
๐ŸŸก Intermediate9 questionsPivot, Apply/Transform, Rolling, Conditional Columns, Melt
๐Ÿ”ด Advanced8 questionsRFM Analysis, Cohort Retention, Memory Optimization

What Youโ€™ll Learn in Premium

  • โœ… apply vs transform vs agg: When to use which
  • โœ… Rolling Window: 7-day moving average, cumulative calculations
  • โœ… RFM Analysis: Real-world customer segmentation code
  • โœ… Cohort Retention: Analysis used in actual companies
  • โœ… Memory Optimization: Large-scale data processing techniques
  • โœ… Answer Points Interviewers Expect

๐ŸŽฏ Purchase All 25 Questions + Explanations

SQL + Pandas + Statistics + Case Study bundle discount


๐Ÿ“ Practice More for Free

If you need more interview preparation, review the concept sections in the Cookbook:

Last updated on

๐Ÿค–AI ๋ชจ์˜๋ฉด์ ‘์‹ค์ „์ฒ˜๋Ÿผ ์—ฐ์Šตํ•˜๊ธฐ