🐍 Python Interview Questions

40 questions with theory, real code, real-world scenarios, common mistakes and follow-up questions — from basic to performance optimization.

40Questions
5Levels
6Answer Sections
240Total Answers
Showing 40 of 40 questions
0 of 40 viewed
01 What is Python and why is it so popular? basic

Python is a high-level, interpreted programming language created by Guido van Rossum in 1991. It emphasises code readability with significant whitespace and a clean, English-like syntax.

Python is popular because it has a gentle learning curve, a massive standard library ("batteries included"), and thriving ecosystems for web development (Django, Flask), data science (pandas, NumPy), machine learning (TensorFlow, PyTorch), and automation. Its community is one of the largest in the world, which means you can find a library for almost anything.

# Python reads almost like English
students = ["Alice", "Bob", "Charlie"]

for student in students:
    if student.startswith("A"):
        print(f"{student} gets a welcome bonus!")

# Output: Alice gets a welcome bonus!

Instagram's entire backend started as a Django (Python) monolith and scaled to 1 billion+ users before any major language migration. Dropbox ran on Python for over a decade. Netflix uses Python for its recommendation engine data pipeline that processes 200+ billion events per day.

Python trades raw execution speed for developer speed — you ship features faster, which matters more in most businesses than microseconds of runtime.
⚠️ Common Mistake

Many candidates say "Python is slow, so it's only for scripting." This is wrong — Python is the backbone of Instagram, Netflix, and most AI/ML production systems. The correct framing is: Python is slower at CPU-bound loops but excels at I/O-bound work and integrates with C/C++ for hot paths.

🔁 Follow-Up Question

What are the differences between Python 2 and Python 3? Why did the migration take so long?

02 What are Python's core data types? basic

Python has several built-in data types grouped into categories:

Numeric: int (unlimited precision integers), float (64-bit doubles), complex.
Sequence: list (mutable, ordered), tuple (immutable, ordered), range.
Text: str (immutable Unicode).
Set: set (mutable, unordered, unique), frozenset (immutable).
Mapping: dict (mutable key-value pairs).
Boolean: bool (True/False, subclass of int).
None: NoneType — Python's null.

# Numeric
price = 49.99                    # float
quantity = 3                     # int

# Sequence
cart_items = ["Laptop", "Mouse"] # list  (mutable)
dimensions = (1920, 1080)        # tuple (immutable)

# Mapping
user = {
    "name": "Priya",
    "age": 28,
    "is_premium": True            # bool
}

# Set
unique_tags = {"python", "coding", "python"}  # duplicates removed
print(unique_tags)  # {'python', 'coding'}

# None
result = None  # placeholder before computation

At a fintech startup processing 500K transactions/day, switching order IDs from a list lookup (O(n)) to a set lookup (O(1)) reduced duplicate-detection time from 14 seconds to 0.003 seconds per batch.

Choose the data type by intent — list for ordered collections, dict for key-value lookups, set for uniqueness, tuple for immutable records.
⚠️ Common Mistake

Candidates often confuse list and tuple. The key difference is mutability, not syntax. Tuples are hashable and can be dict keys; lists cannot.

🔁 Follow-Up Question

What is the difference between a list and a tuple? When would you use one over the other?

03 How do functions work in Python? Explain def, return, and default arguments. basic

A function is defined with the def keyword, takes parameters, and optionally returns a value with return. If no return statement is used, the function returns None.

Python supports default arguments (parameters with preset values), keyword arguments (passing by name), and positional arguments. Default arguments are evaluated once at function definition time — not each call — which is a critical gotcha with mutable defaults.

def calculate_discount(price, discount_pct=10, currency="₹"):
    """Calculate discounted price with optional percentage and currency."""
    savings = price * (discount_pct / 100)
    final_price = price - savings
    return final_price, savings  # returns a tuple

# Usage
final, saved = calculate_discount(1000)
print(f"Pay ₹{final}, you saved ₹{saved}")
# Output: Pay ₹900.0, you saved ₹100.0

# With keyword argument
final, saved = calculate_discount(1000, discount_pct=20, currency="$")
print(f"Pay ${final}, you saved ${saved}")
# Output: Pay $800.0, you saved $200.0

In an e-commerce pricing engine processing 50K products, functions with default arguments let the team reuse one calculate_discount() function across 12 different sale campaigns instead of writing 12 separate functions — reducing code from 600 lines to 45.

Functions return None by default. Never use a mutable object (list, dict) as a default argument — use None and create inside the function.
⚠️ Common Mistake

The #1 Python function bug — mutable default arguments:

❌ Wrong — list is shared across calls
def add_item(item, cart=[]):
    cart.append(item)
    return cart

print(add_item("A"))  # ['A']
print(add_item("B"))  # ['A', 'B'] — BUG!
✅ Correct — use None sentinel
def add_item(item, cart=None):
    if cart is None:
        cart = []
    cart.append(item)
    return cart

print(add_item("A"))  # ['A']
print(add_item("B"))  # ['B'] — correct!
🔁 Follow-Up Question

What is the difference between *args and **kwargs?

04 Explain Python loops — for, while, break, continue, and else clause. basic

for loops iterate over any iterable (list, string, range, dict, file). while loops run as long as a condition is True.

break exits the loop entirely. continue skips the current iteration. Python has a unique else clause on loops — the else block runs only if the loop completes without hitting a break. This is useful for search patterns.

# for with else — search pattern
users_db = ["alice", "bob", "charlie", "diana"]
search = "charlie"

for user in users_db:
    if user == search:
        print(f"Found {user}! Granting access...")
        break
else:
    # Only runs if break was NOT hit
    print(f"{search} not found. Access denied.")

# Output: Found charlie! Granting access...

# while with continue — skip invalid data
raw_scores = [85, -1, 92, 0, 78, -5, 95]
valid_scores = []
i = 0

while i < len(raw_scores):
    score = raw_scores[i]
    i += 1
    if score <= 0:
        continue  # skip invalid
    valid_scores.append(score)

print(f"Valid scores: {valid_scores}")
# Output: Valid scores: [85, 92, 78, 95]

In a data cleaning pipeline for a hospital system, using for-else to detect missing patient IDs eliminated a separate boolean flag variable across 200+ validation functions, making the code cleaner and reducing bugs from forgotten flag resets by 100%.

Use for-else when searching — else runs only if break was never triggered. It replaces the "found = False" flag pattern.
⚠️ Common Mistake

Candidates forget the else clause on loops or confuse it with if-else. The loop's else means "no break happened" — it does not mean "loop didn't execute." An empty loop still triggers else.

🔁 Follow-Up Question

How do you iterate over a dictionary's keys and values simultaneously?

05 What is the difference between a list, tuple, and set? basic

List — mutable, ordered, allows duplicates. Use for collections that change. Syntax: [1, 2, 3].
Tuple — immutable, ordered, allows duplicates. Use for fixed records (coordinates, DB rows). Syntax: (1, 2, 3). Tuples are hashable, so they can be dict keys.
Set — mutable, unordered, no duplicates. Use for membership testing and deduplication. Syntax: {1, 2, 3}.

Performance: set lookups are O(1) average (hash table), list lookups are O(n), tuple lookups are O(n) but tuples use less memory than lists.

# List — shopping cart (changes often)
cart = ["Laptop", "Mouse", "Keyboard"]
cart.append("Monitor")
cart.remove("Mouse")
print(cart)  # ['Laptop', 'Keyboard', 'Monitor']

# Tuple — database record (never changes)
employee = ("E1042", "Priya Sharma", "Engineering", 95000)
emp_id, name, dept, salary = employee  # unpacking
print(f"{name} in {dept}")  # Priya Sharma in Engineering

# Set — unique users who viewed a page
viewers = {"user_101", "user_202", "user_101", "user_303"}
print(len(viewers))  # 3 (duplicate removed)

# Set operations
premium_users = {"user_101", "user_404"}
premium_viewers = viewers & premium_users  # intersection
print(premium_viewers)  # {'user_101'}

At a social media analytics platform, switching the "unique daily active users" counter from a list (checking `if user not in list`) to a set reduced the daily aggregation job from 23 minutes to 47 seconds on 5M user events.

List = mutable + ordered. Tuple = immutable + ordered + hashable. Set = mutable + unordered + unique + O(1) lookup.
⚠️ Common Mistake

Candidates say "tuples are just immutable lists." This misses the key implication — because tuples are hashable, they can be used as dictionary keys and set elements, which lists cannot.

🔁 Follow-Up Question

What is a frozenset and when would you use it?

06 How do string methods work in Python? Explain the most important ones. basic

Strings in Python are immutable — every string method returns a new string. Key methods: strip() removes whitespace, split() breaks into a list, join() combines a list into a string, replace() substitutes substrings, find()/index() search for substrings, startswith()/endswith() check prefixes/suffixes, upper()/lower()/title() change case.

f-strings (Python 3.6+) are the modern way to format strings — faster and more readable than % or .format().

# Cleaning user input
raw_email = "  Priya.Sharma@Gmail.COM  "
clean_email = raw_email.strip().lower()
print(clean_email)  # "priya.sharma@gmail.com"

# Parsing CSV-like data
log_line = "2025-01-15|ERROR|Database connection timeout"
date, level, message = log_line.split("|")
print(f"[{level}] {message}")  # [ERROR] Database connection timeout

# Building output from list
tags = ["python", "interview", "coding"]
hashtags = " ".join(f"#{tag}" for tag in tags)
print(hashtags)  # #python #interview #coding

# f-string with expression
price = 1299.5
print(f"Total: ₹{price:,.2f}")  # Total: ₹1,299.50

In an email validation microservice handling 2M signups/month, chaining strip().lower() on every email input before database storage prevented 12,000+ duplicate accounts per month caused by leading spaces and mixed-case entries.

Strings are immutable — methods return new strings. Use f-strings for formatting, split/join for parsing, strip for cleaning user input.
⚠️ Common Mistake

Candidates forget strings are immutable and write name.upper() expecting name to change. You must reassign: name = name.upper().

🔁 Follow-Up Question

What is the difference between str.find() and str.index()? What happens when the substring is not found?

07 How does file I/O work in Python? Explain open(), read, write, and with statement. basic

Python opens files with open(filename, mode). Modes: "r" read (default), "w" write (overwrites), "a" append, "x" create (fails if exists), "b" binary. You should always use the with statement (context manager) which automatically closes the file even if an exception occurs.

Reading methods: read() loads entire file, readline() reads one line, readlines() returns list of lines. For large files, iterate line by line with for line in file: to avoid loading everything into memory.

# Writing a report file
sales_data = [
    {"product": "Laptop", "revenue": 125000},
    {"product": "Mouse",  "revenue": 8500},
    {"product": "Monitor","revenue": 45000},
]

with open("sales_report.txt", "w") as f:
    f.write("=== Monthly Sales Report ===\n\n")
    for item in sales_data:
        f.write(f"{item['product']:.<20} ₹{item['revenue']:>10,}\n")
    f.write(f"\n{'Total':.<20} ₹{sum(i['revenue'] for i in sales_data):>10,}\n")

# Reading it back — line by line (memory-efficient)
with open("sales_report.txt", "r") as f:
    for line in f:
        print(line.rstrip())

# Output:
# === Monthly Sales Report ===
#
# Laptop.............. ₹   125,000
# Mouse...............  ₹     8,500
# Monitor.............  ₹    45,000
#
# Total............... ₹   178,500

A log analysis script at an e-commerce company needed to scan 15 GB access logs daily. Switching from file.read() (loaded entire file into RAM, crashing on 8 GB servers) to line-by-line iteration reduced memory usage from 15 GB to 12 MB while processing 80M lines in 4 minutes.

Always use "with open(...) as f:" — it guarantees file closure. For large files, iterate line by line instead of read() to avoid memory issues.
⚠️ Common Mistake
❌ Wrong — file may stay open on error
f = open("data.txt", "r")
data = f.read()
f.close()  # never reached if read() throws
✅ Correct — with guarantees close
with open("data.txt", "r") as f:
    data = f.read()
# file is auto-closed here, even on exception
🔁 Follow-Up Question

How would you read a very large CSV file (10 GB) without running out of memory?

08 How does error handling work in Python? Explain try, except, finally, and raise. basic

Python uses try/except/else/finally for error handling. Code that might fail goes in try. The except block catches specific exceptions. else runs only if no exception occurred. finally always runs — for cleanup.

You can raise exceptions manually and create custom exception classes by inheriting from Exception. Always catch specific exceptions, not bare except: which catches everything including KeyboardInterrupt and SystemExit.

def withdraw(balance, amount):
    """Bank withdrawal with proper error handling."""
    if not isinstance(amount, (int, float)):
        raise TypeError(f"Amount must be a number, got {type(amount).__name__}")
    if amount <= 0:
        raise ValueError("Withdrawal amount must be positive")
    if amount > balance:
        raise ValueError(f"Insufficient funds: balance ₹{balance}, requested ₹{amount}")
    return balance - amount

# Usage with try/except/else/finally
try:
    new_balance = withdraw(10000, 3000)
except TypeError as e:
    print(f"Input error: {e}")
except ValueError as e:
    print(f"Business rule error: {e}")
else:
    # Only runs if NO exception
    print(f"Withdrawal successful! New balance: ₹{new_balance}")
finally:
    # Always runs — good for logging, cleanup
    print("Transaction logged.")

# Output:
# Withdrawal successful! New balance: ₹7000
# Transaction logged.

In a payment gateway processing 10K transactions/hour, adding specific except clauses for ConnectionTimeout, InvalidCard, and InsufficientFunds (instead of a bare except) reduced silent failures from 200/day to zero — every error was now categorized and routed to the correct retry or alert system.

Always catch specific exceptions. Use else for success-only code. Use finally for cleanup. Never use bare except: — it hides bugs.
⚠️ Common Mistake
❌ Wrong — catches everything, hides bugs
try:
    result = process_payment(order)
except:
    print("Something went wrong")  # hides the real error
✅ Correct — specific exceptions with context
try:
    result = process_payment(order)
except ConnectionError:
    retry_payment(order)  # network issue, retry
except ValueError as e:
    log.error(f"Invalid order data: {e}")  # data issue, log
    raise  # re-raise after logging
🔁 Follow-Up Question

How do you create a custom exception class in Python? When is it appropriate?

09 What are Python modules and packages? How does import work? basic

A module is any .py file. A package is a directory with an __init__.py file (can be empty) containing modules. The import statement loads code from modules.

Import styles: import math (full module), from math import sqrt (specific function), from math import * (everything — avoid in production). Python searches for modules in: current directory → standard library → installed packages (sys.path).

__name__ == "__main__" is True only when a file is run directly, not when imported. This is the standard entry-point guard.

# utils/validators.py — a custom module
import re

def validate_email(email):
    """Validate email format and return cleaned version."""
    pattern = r"^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$"
    clean = email.strip().lower()
    if not re.match(pattern, clean):
        raise ValueError(f"Invalid email: {email}")
    return clean

def validate_phone(phone, country_code="+91"):
    """Validate Indian phone number."""
    digits = re.sub(r"\D", "", phone)
    if len(digits) != 10:
        raise ValueError(f"Phone must be 10 digits, got {len(digits)}")
    return f"{country_code}{digits}"

# main.py — importing the module
from utils.validators import validate_email, validate_phone

if __name__ == "__main__":
    email = validate_email("  Priya@Gmail.COM  ")
    phone = validate_phone("98765-43210")
    print(f"Email: {email}, Phone: {phone}")
    # Email: priya@gmail.com, Phone: +919876543210

A 50-developer team at a SaaS company reduced import-related bugs from 15/sprint to zero by enforcing explicit imports (from module import X) instead of wildcard imports (from module import *), which had been causing name collisions between 300+ modules.

Modules are .py files, packages are directories with __init__.py. Always use explicit imports. Use if __name__ == "__main__" as entry-point guard.
⚠️ Common Mistake

Candidates use from module import * in production. This pollutes the namespace and causes subtle bugs when two modules export the same name. Always use explicit imports: from module import specific_function.

🔁 Follow-Up Question

What is the difference between absolute and relative imports? When would you use each?

10 What are list comprehensions and how do they differ from regular loops? basic

A list comprehension is a concise way to create lists: [expression for item in iterable if condition]. It combines a loop, an optional filter, and a transformation into a single readable line.

Comprehensions exist for lists [], sets {}, dicts {k:v}, and generators (). They are generally faster than equivalent for loops because the iteration happens in C internally. However, for complex logic (multiple side effects, nested conditions), a regular loop is more readable.

# Regular loop vs comprehension
prices = [1200, 450, 3200, 89, 5600, 230, 780]

# Loop approach — 4 lines
expensive = []
for p in prices:
    if p > 500:
        expensive.append(p * 0.9)  # 10% discount

# Comprehension — 1 line, same result
expensive = [p * 0.9 for p in prices if p > 500]
print(expensive)  # [1080.0, 2880.0, 5040.0, 702.0]

# Dict comprehension — word frequency
sentence = "python is great and python is fun"
word_freq = {word: sentence.split().count(word)
             for word in set(sentence.split())}
print(word_freq)
# {'python': 2, 'is': 2, 'great': 1, 'and': 1, 'fun': 1}

# Nested comprehension — flatten matrix
matrix = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
flat = [num for row in matrix for num in row]
print(flat)  # [1, 2, 3, 4, 5, 6, 7, 8, 9]

In a data pipeline processing 2M product listings, replacing 15 for-loop-with-append patterns with list comprehensions reduced the transformation step from 8.2 seconds to 3.1 seconds — a 62% speedup with zero logic changes, purely from Python's internal C-level optimization.

List comprehensions are faster and more Pythonic than loop-append patterns. Use them for simple transforms + filters. Use regular loops for complex multi-step logic.
⚠️ Common Mistake

Candidates write deeply nested comprehensions that are unreadable:

❌ Wrong — unreadable nested comprehension
result = [x*y for x in range(10) for y in range(10) if x != y if x+y > 5 if x*y < 30]
✅ Correct — use a loop when logic is complex
result = []
for x in range(10):
    for y in range(10):
        if x != y and x + y > 5 and x * y < 30:
            result.append(x * y)
🔁 Follow-Up Question

What is a generator expression and how does it differ from a list comprehension?

11 Explain Object-Oriented Programming in Python — classes, inheritance, and encapsulation. intermediate

Python supports full OOP with classes, inheritance, polymorphism, and encapsulation. A class is a blueprint; an object is an instance. __init__ is the constructor. self refers to the current instance.

Inheritance: a child class inherits methods/attributes from a parent, and can override them. Python supports multiple inheritance (MRO — Method Resolution Order uses C3 linearization). Encapsulation: use a single underscore _var for "protected" (convention) and double underscore __var for name-mangling (not true private, but harder to access accidentally).

class BankAccount:
    """Base bank account with deposit/withdraw."""
    
    def __init__(self, owner, balance=0):
        self.owner = owner
        self._balance = balance  # protected by convention
        self.__transactions = []  # name-mangled
    
    @property
    def balance(self):
        return self._balance
    
    def deposit(self, amount):
        if amount <= 0:
            raise ValueError("Deposit must be positive")
        self._balance += amount
        self.__transactions.append(f"+₹{amount}")
        return self._balance
    
    def withdraw(self, amount):
        if amount > self._balance:
            raise ValueError("Insufficient funds")
        self._balance -= amount
        self.__transactions.append(f"-₹{amount}")
        return self._balance

class SavingsAccount(BankAccount):
    """Savings account with interest — inherits from BankAccount."""
    
    def __init__(self, owner, balance=0, interest_rate=0.04):
        super().__init__(owner, balance)
        self.interest_rate = interest_rate
    
    def apply_interest(self):
        interest = self._balance * self.interest_rate
        self.deposit(interest)
        return interest

# Usage
acc = SavingsAccount("Priya", 50000, 0.06)
acc.deposit(10000)
interest = acc.apply_interest()
print(f"Balance: ₹{acc.balance:,}, Interest earned: ₹{interest:,.2f}")
# Balance: ₹63,600, Interest earned: ₹3,600.00

A fintech team modelled 8 account types (Savings, Current, FD, Recurring, NRI, Joint, Minor, Salary) using a base BankAccount class. Adding a new account type went from 2 weeks (copy-paste 800 lines) to 2 hours (inherit and override 3 methods).

Use @property for controlled access, super().__init__() for parent construction, and single underscore for protected members. Python's OOP is flexible — not enforced like Java.
⚠️ Common Mistake

Candidates forget to call super().__init__() in child classes, causing missing attributes. Also, __var is name-mangled (accessible as _ClassName__var), not truly private — don't rely on it for security.

🔁 Follow-Up Question

What is the MRO (Method Resolution Order) in Python? How does it handle the diamond problem?

12 What are decorators in Python and how do you write one? intermediate

A decorator is a function that takes another function as input and returns an enhanced version of it — without modifying the original function's code. Decorators use the @decorator_name syntax above a function definition.

Under the hood, @my_decorator above def func() is just syntactic sugar for func = my_decorator(func). Decorators are powerful for cross-cutting concerns: logging, timing, authentication, caching, rate-limiting, and input validation.

Use @functools.wraps(func) inside your decorator to preserve the original function's name and docstring.

import functools
import time

def timer(func):
    """Decorator that logs how long a function takes."""
    @functools.wraps(func)
    def wrapper(*args, **kwargs):
        start = time.perf_counter()
        result = func(*args, **kwargs)
        elapsed = time.perf_counter() - start
        print(f"⏱ {func.__name__}() took {elapsed:.4f}s")
        return result
    return wrapper

def retry(max_attempts=3):
    """Decorator with arguments — retries on failure."""
    def decorator(func):
        @functools.wraps(func)
        def wrapper(*args, **kwargs):
            for attempt in range(1, max_attempts + 1):
                try:
                    return func(*args, **kwargs)
                except Exception as e:
                    print(f"Attempt {attempt}/{max_attempts} failed: {e}")
                    if attempt == max_attempts:
                        raise
        return wrapper
    return decorator

@timer
@retry(max_attempts=3)
def fetch_user_data(user_id):
    """Simulate API call."""
    import random
    if random.random() < 0.5:
        raise ConnectionError("API timeout")
    return {"id": user_id, "name": "Priya"}

data = fetch_user_data(42)

At a microservices company, a single @retry(max_attempts=3, backoff=2) decorator applied to 45 API calls reduced cascading failures by 73% during peak traffic — without changing any business logic in those 45 functions.

Decorators wrap functions to add behavior without changing the original code. Always use @functools.wraps to preserve metadata. Decorators with arguments need three nested functions.
⚠️ Common Mistake

Forgetting @functools.wraps(func) causes the decorated function to lose its name and docstring:

❌ Without wraps — name is "wrapper"
def my_decorator(func):
    def wrapper(*args, **kwargs):
        return func(*args, **kwargs)
    return wrapper

@my_decorator
def greet(): pass
print(greet.__name__)  # "wrapper" — wrong!
✅ With wraps — name preserved
import functools
def my_decorator(func):
    @functools.wraps(func)
    def wrapper(*args, **kwargs):
        return func(*args, **kwargs)
    return wrapper

@my_decorator
def greet(): pass
print(greet.__name__)  # "greet" — correct!
🔁 Follow-Up Question

Can you stack multiple decorators on one function? In what order do they execute?

13 What are generators in Python and how do they differ from regular functions? intermediate

A generator is a function that uses yield instead of return. It produces values one at a time, pausing execution between yields and resuming when the next value is requested. This is called lazy evaluation.

Generators are memory-efficient because they don't store the entire sequence in memory — they compute each value on-the-fly. A generator function returns a generator object that implements the iterator protocol (__iter__ and __next__).

Generator expressions use parentheses: (x for x in range(1000000)) — same syntax as list comprehension but uses almost zero memory.

def read_large_csv(filepath, chunk_size=1000):
    """Generator that reads a large CSV in chunks."""
    import csv
    with open(filepath, "r") as f:
        reader = csv.DictReader(f)
        chunk = []
        for row in reader:
            chunk.append(row)
            if len(chunk) == chunk_size:
                yield chunk
                chunk = []
        if chunk:  # remaining rows
            yield chunk

# Usage — processes 10M rows without loading all into memory
total_revenue = 0
for batch in read_large_csv("sales_10M.csv"):
    for row in batch:
        total_revenue += float(row["revenue"])
print(f"Total revenue: ₹{total_revenue:,.2f}")

# Generator expression vs list comprehension
# List — stores 10M numbers in memory (~80 MB)
squares_list = [x**2 for x in range(10_000_000)]

# Generator — stores only 1 number at a time (~120 bytes)
squares_gen = (x**2 for x in range(10_000_000))
print(sum(squares_gen))  # processes 10M numbers, ~0 MB memory

At a data analytics company, replacing a list-based CSV reader with a generator-based chunked reader let them process a 45 GB transaction log on a server with only 4 GB RAM — previously the job crashed with MemoryError after 3 minutes.

Generators yield one value at a time using lazy evaluation. Use them for large datasets, infinite sequences, or pipeline processing where you don't need all values at once.
⚠️ Common Mistake

Candidates forget that generators are single-use — once exhausted, they produce nothing:

❌ Wrong — generator exhausted on second use
gen = (x**2 for x in range(5))
print(list(gen))  # [0, 1, 4, 9, 16]
print(list(gen))  # [] — empty! Generator is exhausted
✅ Correct — recreate or use a list if you need multiple passes
def squares(n):
    return (x**2 for x in range(n))

print(list(squares(5)))  # [0, 1, 4, 9, 16]
print(list(squares(5)))  # [0, 1, 4, 9, 16] — fresh generator
🔁 Follow-Up Question

What is the difference between yield and yield from? When would you use yield from?

14 Explain lambda functions and when to use map(), filter(), and reduce(). intermediate

A lambda is an anonymous, single-expression function: lambda args: expression. It's syntactic sugar for small, throwaway functions.

map(func, iterable) applies a function to every item. filter(func, iterable) keeps items where the function returns True. reduce(func, iterable) (from functools) cumulatively combines items left to right.

In modern Python, list comprehensions are usually preferred over map/filter for readability. But lambda is still useful for: sort keys, callback functions, and functional programming patterns.

# Lambda for sorting — sort employees by salary descending
employees = [
    {"name": "Priya",  "salary": 85000},
    {"name": "Rahul",  "salary": 72000},
    {"name": "Sneha",  "salary": 95000},
    {"name": "Arjun",  "salary": 68000},
]

top_earners = sorted(employees, key=lambda e: e["salary"], reverse=True)
for e in top_earners:
    print(f"{e['name']:.<15} ₹{e['salary']:>8,}")
# Sneha........... ₹  95,000
# Priya........... ₹  85,000
# Rahul........... ₹  72,000
# Arjun........... ₹  68,000

# map + filter — process transaction amounts
transactions = [1200, -500, 3400, -150, 8900, 200]

credits = list(filter(lambda t: t > 0, transactions))
with_tax = list(map(lambda t: round(t * 1.18, 2), credits))
print(f"Credits with 18% GST: {with_tax}")
# Credits with 18% GST: [1416.0, 4012.0, 10502.0, 236.0]

# Equivalent list comprehension (preferred)
with_tax = [round(t * 1.18, 2) for t in transactions if t > 0]

A data team used sorted() with a lambda key to rank 50K customer records by a composite score (recency × frequency × monetary). This one-liner replaced a 30-line custom comparator class, and the sort ran in 0.08 seconds.

Use lambda for sort keys and simple callbacks. Prefer list comprehensions over map/filter for readability. Use reduce only when you truly need cumulative aggregation.
⚠️ Common Mistake

Candidates try to cram complex logic into lambda. Lambda is for one expression only — no statements, no assignments, no if-else chains:

❌ Wrong — too complex for lambda
process = lambda x: x*2 if x > 0 else (x*-1 if x < -100 else 0)
✅ Correct — use a named function
def process(x):
    if x > 0:
        return x * 2
    elif x < -100:
        return x * -1
    return 0
🔁 Follow-Up Question

What is functools.reduce() and can you give a practical example where it's better than a loop?

15 What are *args and **kwargs? How do you use them? intermediate

*args collects extra positional arguments into a tuple. **kwargs collects extra keyword arguments into a dict. The names "args" and "kwargs" are conventions — it's the * and ** that matter.

Parameter order matters: def func(regular, *args, keyword_only, **kwargs). Anything after *args must be passed as a keyword argument. This is how Python enforces keyword-only parameters.

The * and ** operators also work for unpacking — *list unpacks a list into positional args, **dict unpacks a dict into keyword args.

def create_html_tag(tag, *children, class_name=None, **attrs):
    """Build an HTML tag with flexible content and attributes."""
    attr_str = ""
    if class_name:
        attr_str += f' class="{class_name}"'
    for key, val in attrs.items():
        attr_str += f' {key.rstrip("_")}="{val}"'
    
    content = "".join(str(c) for c in children)
    return f"<{tag}{attr_str}>{content}</{tag}>"

# Usage — flexible API
print(create_html_tag("h1", "Welcome"))
# <h1>Welcome</h1>

print(create_html_tag("div", "Hello ", "World",
                       class_name="greeting", id_="main"))
# <div class="greeting" id="main">Hello World</div>

# Unpacking with * and **
config = {"class_name": "btn", "data_action": "submit"}
print(create_html_tag("button", "Click Me", **config))
# <button class="btn" data_action="submit">Click Me</button>

# Forwarding args — common in wrappers
def log_and_call(func, *args, **kwargs):
    print(f"Calling {func.__name__} with args={args}, kwargs={kwargs}")
    return func(*args, **kwargs)

The entire Django framework uses **kwargs extensively in its ORM — Model.objects.filter(**conditions) accepts any combination of field lookups. This pattern lets 500K+ Django projects query databases with flexible filters without Django knowing every possible field name in advance.

*args = tuple of extra positional args. **kwargs = dict of extra keyword args. Use them for flexible APIs, wrappers, and function forwarding.
⚠️ Common Mistake

Candidates confuse the order. Python requires: def f(positional, *args, keyword_only, **kwargs). Putting **kwargs before *args is a syntax error. Also, *args and **kwargs capture extra arguments — named parameters still take priority.

🔁 Follow-Up Question

How do keyword-only arguments work in Python 3? How do you enforce them?

16 What are context managers and how does the with statement work? intermediate

A context manager is an object that defines __enter__ (setup) and __exit__ (cleanup) methods. The with statement guarantees cleanup even if an exception occurs — like try/finally but cleaner.

Common built-in context managers: file objects, threading locks, database connections, decimal.localcontext(). You can create custom ones with a class (define __enter__/__exit__) or with @contextlib.contextmanager decorator (yield-based — simpler).

The __exit__ method receives exception info (type, value, traceback). Returning True suppresses the exception.

import contextlib
import time
import sqlite3

# Custom context manager using decorator
@contextlib.contextmanager
def timer(label):
    """Time a block of code."""
    start = time.perf_counter()
    try:
        yield  # code inside 'with' runs here
    finally:
        elapsed = time.perf_counter() - start
        print(f"⏱ {label}: {elapsed:.4f}s")

# Usage
with timer("Data processing"):
    total = sum(x**2 for x in range(1_000_000))
# Output: ⏱ Data processing: 0.0823s

# Database transaction context manager
@contextlib.contextmanager
def db_transaction(db_path):
    """Auto-commit on success, rollback on failure."""
    conn = sqlite3.connect(db_path)
    cursor = conn.cursor()
    try:
        yield cursor
        conn.commit()
    except Exception:
        conn.rollback()
        raise
    finally:
        conn.close()

# Usage — auto-handles commit/rollback/close
with db_transaction("app.db") as cursor:
    cursor.execute("INSERT INTO users (name) VALUES (?)", ("Priya",))

At a trading platform, wrapping database operations in a custom transaction context manager eliminated 15 "connection leak" incidents per month. Previously, developers forgot conn.close() in 3 out of 40 database functions, causing connection pool exhaustion under load.

Context managers guarantee cleanup (file close, lock release, DB commit/rollback). Use @contextlib.contextmanager for simple cases, class-based for complex ones.
⚠️ Common Mistake

Candidates write context managers that swallow exceptions silently by returning True from __exit__. Only suppress exceptions if you truly handle them — otherwise bugs become invisible.

🔁 Follow-Up Question

Can you nest multiple context managers? What does contextlib.ExitStack do?

17 How do you use regular expressions (regex) in Python? intermediate

Python's re module provides regex support. Key functions: re.match() checks the start of a string, re.search() finds the first match anywhere, re.findall() returns all matches, re.sub() replaces matches, re.compile() pre-compiles a pattern for reuse.

Use raw strings r"pattern" to avoid escaping backslashes. Named groups (?P<name>...) make matches self-documenting. For performance, compile patterns used in loops with re.compile().

import re

# Extract structured data from log lines
log_pattern = re.compile(
    r'(?P<date>\d{4}-\d{2}-\d{2}) '
    r'(?P<time>\d{2}:\d{2}:\d{2}) '
    r'(?P<level>INFO|WARN|ERROR) '
    r'(?P<message>.+)'
)

log_lines = [
    "2025-01-15 14:23:01 ERROR Database connection timeout after 30s",
    "2025-01-15 14:23:05 INFO Retry successful, connected to replica",
    "2025-01-15 14:24:00 WARN Memory usage at 85% threshold",
]

errors = []
for line in log_lines:
    match = log_pattern.match(line)
    if match and match.group("level") == "ERROR":
        errors.append({
            "date": match.group("date"),
            "message": match.group("message"),
        })

print(f"Found {len(errors)} errors")
# Found 1 errors

# Validate and extract email parts
email_pattern = r"^(?P<user>[a-zA-Z0-9._%+-]+)@(?P<domain>[a-zA-Z0-9.-]+\.[a-zA-Z]{2,})$"
email = "priya.sharma@company.co.in"
m = re.match(email_pattern, email)
if m:
    print(f"User: {m.group('user')}, Domain: {m.group('domain')}")
# User: priya.sharma, Domain: company.co.in

A security team used compiled regex to scan 2M email templates for potential XSS patterns. The scan ran in 8 seconds with re.compile() vs 95 seconds without — compilation overhead is amortized when the pattern is reused in loops.

Use re.compile() for patterns in loops, raw strings r"" to avoid escaping, and named groups (?P...) for readable extraction.
⚠️ Common Mistake

Candidates use re.match() when they mean re.search(). match() only checks the start of the string, while search() finds a match anywhere in the string. This trips up candidates consistently.

🔁 Follow-Up Question

What is the difference between greedy and non-greedy matching? Give an example where it matters.

18 How do you handle JSON data in Python? intermediate

Python's built-in json module handles JSON serialization (Python → JSON string) and deserialization (JSON string → Python). Key functions: json.dumps() converts dict/list to JSON string, json.loads() parses JSON string to dict/list, json.dump()/json.load() work with files.

JSON maps to Python types: object → dict, array → list, string → str, number → int/float, true/false → True/False, null → None. For custom objects, you need a custom encoder/decoder.

import json
from datetime import datetime

# API response handling
api_response = '''
{
    "user_id": 1042,
    "name": "Priya Sharma",
    "orders": [
        {"id": "ORD-5001", "amount": 2499.00, "status": "delivered"},
        {"id": "ORD-5023", "amount": 899.50, "status": "shipped"}
    ],
    "is_premium": true,
    "last_login": null
}
'''

# Parse JSON
user = json.loads(api_response)
total_spent = sum(order["amount"] for order in user["orders"])
print(f"{user['name']} spent ₹{total_spent:,.2f}")
# Priya Sharma spent ₹3,398.50

# Custom encoder for non-serializable types
class AppEncoder(json.JSONEncoder):
    def default(self, obj):
        if isinstance(obj, datetime):
            return obj.isoformat()
        return super().default(obj)

# Serialize with custom encoder
event = {
    "event": "purchase",
    "timestamp": datetime.now(),
    "amount": 1299.00,
}
json_str = json.dumps(event, cls=AppEncoder, indent=2)
print(json_str)

# Save to file
with open("user_data.json", "w") as f:
    json.dump(user, f, indent=2, ensure_ascii=False)

A REST API backend processes 50K JSON payloads per minute. Using json.loads() with strict validation (checking required keys, types, and ranges) before database insertion prevented 3,000+ malformed records per day from entering the system.

json.loads()/dumps() for strings, json.load()/dump() for files. Use a custom JSONEncoder for datetime and other non-serializable types. Always validate parsed JSON before using it.
⚠️ Common Mistake

Candidates confuse json.dumps() (to string) with json.dump() (to file). Also, single quotes are invalid JSON — Python dicts use single quotes but JSON requires double quotes. json.dumps() handles this correctly.

🔁 Follow-Up Question

How would you handle very large JSON files (5+ GB) that don't fit in memory?

19 What are virtual environments and why are they important? intermediate

A virtual environment is an isolated Python installation with its own packages, separate from the system Python and other projects. This prevents "dependency hell" — where Project A needs requests==2.28 and Project B needs requests==2.31.

Create with python -m venv myenv, activate with source myenv/bin/activate (Linux/Mac) or myenv\Scripts\activate (Windows). pip freeze > requirements.txt captures exact versions. pip install -r requirements.txt recreates the environment.

Modern alternatives: pipenv (Pipfile.lock), poetry (pyproject.toml), conda (data science), uv (Rust-based, fastest).

# Create and activate virtual environment
# Terminal commands:
# python -m venv project_env
# source project_env/bin/activate  (Linux/Mac)
# project_env\Scripts\activate    (Windows)

# Install project dependencies
# pip install flask==3.0.0 sqlalchemy==2.0.23 redis==5.0.1

# Freeze exact versions for reproducibility
# pip freeze > requirements.txt

# requirements.txt looks like:
# flask==3.0.0
# sqlalchemy==2.0.23
# redis==5.0.1
# jinja2==3.1.2      (auto-installed dependency)
# werkzeug==3.0.1    (auto-installed dependency)

# Teammate recreates identical environment:
# python -m venv their_env
# source their_env/bin/activate
# pip install -r requirements.txt

# Verify isolation
import sys
print(sys.prefix)      # /path/to/project_env  (not system Python)
print(sys.executable)  # /path/to/project_env/bin/python

# Deactivate when done
# deactivate

A team of 12 developers had "works on my machine" bugs every sprint — different Flask versions, different OS-level packages. After enforcing virtual environments with pinned requirements.txt in CI/CD, deployment failures dropped from 8/month to zero.

Always use virtual environments. Pin exact versions with pip freeze. Every project gets its own isolated environment — never install packages globally for project work.
⚠️ Common Mistake

Candidates install packages globally with pip install without activating a virtual environment. This causes version conflicts between projects. Another mistake: using pip freeze in the global environment captures every package ever installed.

🔁 Follow-Up Question

What is the difference between venv, virtualenv, pipenv, and poetry? When would you choose each?

20 Explain map(), filter(), and how they compare to list comprehensions. intermediate

map(func, iterable) applies a function to every element and returns a map object (lazy iterator). filter(func, iterable) returns elements where func returns True. Both are lazy — they don't compute until iterated.

List comprehensions [expr for x in iterable if cond] can replace most map/filter uses and are generally more Pythonic and readable. However, map() with a named function (not lambda) can be faster because it avoids creating a new function object per iteration.

The choice is readability: if you already have a named function, use map(). If you'd need a lambda, prefer a comprehension.

# map() with named function — cleaner than lambda
prices_usd = [29.99, 49.50, 124.00, 9.99, 299.00]

def usd_to_inr(usd, rate=83.5):
    return round(usd * rate, 2)

prices_inr = list(map(usd_to_inr, prices_usd))
print(prices_inr)
# [2504.17, 4133.25, 10354.0, 834.17, 24966.5]

# filter() with named function
def is_high_value(amount):
    return amount > 5000

high_value = list(filter(is_high_value, prices_inr))
print(f"High-value items: {high_value}")
# High-value items: [10354.0, 24966.5]

# Equivalent list comprehension — often preferred
prices_inr = [round(p * 83.5, 2) for p in prices_usd]
high_value = [p for p in prices_inr if p > 5000]

# Performance comparison — map with named func is fastest
import timeit
data = list(range(100_000))

t1 = timeit.timeit(lambda: list(map(str, data)), number=50)
t2 = timeit.timeit(lambda: [str(x) for x in data], number=50)
print(f"map: {t1:.3f}s, comprehension: {t2:.3f}s")
# map: 0.682s, comprehension: 0.751s (map is ~10% faster)

In a batch ETL pipeline transforming 8M records, using map() with a pre-defined transform function was 12% faster than the equivalent list comprehension — the difference between a 4-minute and 4.5-minute nightly job.

Use map() when you already have a named function. Use list comprehensions when you'd need a lambda. Both are correct — readability wins.
⚠️ Common Mistake

Candidates forget that map() and filter() return lazy iterators, not lists. Wrapping in list() is needed to see the results or get the length. Also, chaining multiple map/filter calls is less readable than a single comprehension.

🔁 Follow-Up Question

What is itertools and how is it different from map/filter? Name three useful itertools functions.

21 What is the GIL (Global Interpreter Lock) and how does it affect multithreading? advanced

The GIL is a mutex in CPython that allows only one thread to execute Python bytecode at a time, even on multi-core machines. It exists because CPython's memory management (reference counting) is not thread-safe.

Impact: CPU-bound tasks (math, image processing) get no speedup from threading — threads take turns on one core. I/O-bound tasks (network requests, file reads, database queries) do benefit because the GIL is released during I/O waits.

Workarounds: multiprocessing (separate processes, each with its own GIL), concurrent.futures, C extensions (NumPy releases the GIL), or alternative interpreters (PyPy, the upcoming free-threaded CPython 3.13+).

import time
import threading
import multiprocessing

def cpu_bound_task(n):
    """Simulate heavy CPU work — count to n."""
    total = 0
    for i in range(n):
        total += i * i
    return total

N = 10_000_000

# Single-threaded
start = time.perf_counter()
cpu_bound_task(N)
cpu_bound_task(N)
print(f"Sequential:      {time.perf_counter() - start:.2f}s")

# Multi-threaded — NO speedup due to GIL
start = time.perf_counter()
t1 = threading.Thread(target=cpu_bound_task, args=(N,))
t2 = threading.Thread(target=cpu_bound_task, args=(N,))
t1.start(); t2.start()
t1.join();  t2.join()
print(f"Threaded (GIL):  {time.perf_counter() - start:.2f}s")

# Multi-process — real parallelism, bypasses GIL
start = time.perf_counter()
p1 = multiprocessing.Process(target=cpu_bound_task, args=(N,))
p2 = multiprocessing.Process(target=cpu_bound_task, args=(N,))
p1.start(); p2.start()
p1.join();  p2.join()
print(f"Multiprocess:    {time.perf_counter() - start:.2f}s")

# Typical output:
# Sequential:      3.42s
# Threaded (GIL):  3.51s  ← no speedup!
# Multiprocess:    1.78s  ← real 2x speedup

At a computer vision startup, an image processing pipeline using threading to process 10K images took 45 minutes (GIL bottleneck). Switching to multiprocessing.Pool(workers=8) on an 8-core server reduced it to 6 minutes — a 7.5x speedup.

GIL blocks CPU-bound parallelism in threads. Use multiprocessing for CPU-bound work, threading for I/O-bound work. The GIL is a CPython-specific limitation.
⚠️ Common Mistake

Candidates say "Python can't do parallel processing." This is wrong — threading is limited by the GIL for CPU work, but multiprocessing gives full parallelism. Also, NumPy, pandas, and most C extensions release the GIL during computation.

🔁 Follow-Up Question

What is the difference between multiprocessing and concurrent.futures? Which should you choose?

22 What are metaclasses in Python and when would you use them? advanced

A metaclass is the "class of a class." Just as an object is an instance of a class, a class is an instance of a metaclass. The default metaclass is type. When you write class Foo:, Python calls type('Foo', (object,), {...}) to create the class.

By defining a custom metaclass (inheriting from type and overriding __new__ or __init__), you can control class creation — validate attributes, auto-register classes, enforce coding standards, or add methods dynamically.

Use metaclasses sparingly — 99% of the time, decorators or __init_subclass__ (Python 3.6+) are simpler alternatives.

# Metaclass that auto-registers all subclasses
class PluginMeta(type):
    """Metaclass that maintains a registry of all plugin classes."""
    registry = {}
    
    def __new__(mcs, name, bases, namespace):
        cls = super().__new__(mcs, name, bases, namespace)
        # Don't register the base class itself
        if bases:
            PluginMeta.registry[name.lower()] = cls
        return cls

class Plugin(metaclass=PluginMeta):
    """Base class — all subclasses auto-register."""
    def execute(self):
        raise NotImplementedError

class CSVExporter(Plugin):
    def execute(self):
        return "Exporting to CSV..."

class PDFExporter(Plugin):
    def execute(self):
        return "Generating PDF..."

class ExcelExporter(Plugin):
    def execute(self):
        return "Writing Excel file..."

# All plugins auto-discovered — no manual registration needed
print(PluginMeta.registry)
# {'csvexporter': <class CSVExporter>, 'pdfexporter': <class PDFExporter>, ...}

# Dynamic dispatch
exporter = PluginMeta.registry["pdfexporter"]()
print(exporter.execute())  # Generating PDF...

Django's ORM uses a metaclass (ModelBase) to convert class attributes into database column definitions. When you write class User(models.Model): name = CharField(), the metaclass intercepts this at class creation time and builds the SQL schema — this powers 500K+ Django apps worldwide.

Metaclasses control class creation — they're "classes of classes." Use them for plugin registries, ORM field mapping, or API framework magic. Prefer __init_subclass__ or decorators for simpler cases.
⚠️ Common Mistake

Candidates overuse metaclasses for problems that decorators or __init_subclass__ can solve. Tim Peters (author of Zen of Python) said: "Metaclasses are deeper magic than 99% of users should ever worry about." Use them only when you need to control class creation itself.

🔁 Follow-Up Question

What is __init_subclass__ and how does it provide a simpler alternative to metaclasses?

23 Explain Python descriptors and how they power @property. advanced

A descriptor is any object that defines __get__, __set__, or __delete__. When a descriptor is a class attribute, Python intercepts attribute access and calls the descriptor's methods instead.

Data descriptors define __set__ or __delete__ — they take priority over instance __dict__. Non-data descriptors only define __get__ — instance __dict__ takes priority. @property is a data descriptor under the hood. Functions are non-data descriptors (that's how methods bind self).

Descriptors are the mechanism behind @property, @staticmethod, @classmethod, __slots__, and super().

# Custom descriptor for validated attributes
class Percentage:
    """Descriptor that ensures a value is between 0 and 100."""
    
    def __set_name__(self, owner, name):
        self.name = name
        self.storage_name = f"__{name}"
    
    def __get__(self, obj, objtype=None):
        if obj is None:
            return self
        return getattr(obj, self.storage_name, 0)
    
    def __set__(self, obj, value):
        if not isinstance(value, (int, float)):
            raise TypeError(f"{self.name} must be a number")
        if not 0 <= value <= 100:
            raise ValueError(f"{self.name} must be 0-100, got {value}")
        setattr(obj, self.storage_name, value)

class StudentReport:
    math_score    = Percentage()
    science_score = Percentage()
    english_score = Percentage()
    
    def __init__(self, name, math, science, english):
        self.name = name
        self.math_score = math
        self.science_score = science
        self.english_score = english
    
    @property
    def average(self):
        return (self.math_score + self.science_score + self.english_score) / 3

# Usage
report = StudentReport("Priya", 92, 88, 95)
print(f"{report.name}: Average = {report.average:.1f}%")
# Priya: Average = 91.7%

# Validation works automatically
# report.math_score = 150  # ValueError: math_score must be 0-100

SQLAlchemy uses descriptors for every Column() definition — Column(Integer) creates a descriptor that validates types, handles lazy loading, and tracks changes for the unit-of-work pattern, processing millions of attribute accesses efficiently.

Descriptors control attribute access at the class level. @property is a built-in descriptor. Write custom descriptors for reusable validation logic across multiple attributes and classes.
⚠️ Common Mistake

Candidates confuse descriptors with @property. @property handles one attribute per class. Descriptors are reusable — define once, use on many attributes across many classes. If you're copy-pasting @property + validation 10 times, use a descriptor instead.

🔁 Follow-Up Question

What is the descriptor lookup chain? What happens when you access obj.attr — what does Python check and in what order?

24 How does memory management work in Python? Explain reference counting and garbage collection. advanced

CPython uses two mechanisms for memory management:

1. Reference counting: Every object has a count of references pointing to it. When the count drops to zero, the object is immediately freed. This handles most memory cleanup.

2. Cycle collector (gc module): Reference counting can't handle circular references (A → B → A). Python's garbage collector runs periodically to detect and collect these cycles using a generational approach (3 generations: gen0 for new objects, gen1, gen2 for long-lived).

You can inspect and control GC with the gc module: gc.collect(), gc.get_referrers(), gc.disable().

import sys
import gc

# Reference counting in action
a = [1, 2, 3]
print(sys.getrefcount(a))  # 2 (a + function argument)

b = a           # another reference
print(sys.getrefcount(a))  # 3

del b           # remove one reference
print(sys.getrefcount(a))  # 2

# Circular reference — ref counting can't free this
class Node:
    def __init__(self, name):
        self.name = name
        self.next = None
    def __del__(self):
        print(f"Node {self.name} freed")

# Create cycle: A → B → A
node_a = Node("A")
node_b = Node("B")
node_a.next = node_b
node_b.next = node_a  # circular!

# Delete references — ref count never hits 0 due to cycle
del node_a
del node_b
# Nothing printed yet — cycle prevents cleanup

# Force garbage collection to break the cycle
collected = gc.collect()
print(f"GC collected {collected} objects")
# Node A freed
# Node B freed
# GC collected 2 objects

# Check GC stats
print(gc.get_stats())
# [{'collections': 95, 'collected': 312, 'uncollectable': 0}, ...]

A long-running data pipeline had a memory leak — RSS grew by 200 MB/hour. Using gc.get_referrers() and objgraph, the team found a circular reference in a caching layer (Cache → Entry → Cache). Adding weakref.WeakValueDictionary eliminated the leak entirely.

Python frees objects immediately when ref count hits 0. For circular references, the GC cycle collector runs periodically. Use weakref for caches to avoid memory leaks.
⚠️ Common Mistake

Candidates say "Python has garbage collection so I don't need to think about memory." In reality, circular references can leak, __del__ finalizers can prevent collection, and long-running processes need monitoring. Also, del x doesn't free memory — it decrements the reference count.

🔁 Follow-Up Question

What are weak references and when should you use weakref.WeakValueDictionary?

25 How does async/await work in Python? Explain asyncio basics. advanced

async/await enables cooperative multitasking for I/O-bound operations. An async def function is a coroutine. await suspends the coroutine and lets the event loop run other tasks while waiting for I/O.

The event loop (asyncio.run()) manages coroutines, scheduling them when their I/O completes. asyncio.gather() runs multiple coroutines concurrently. This is not parallelism — it's concurrency through cooperative yielding on a single thread.

asyncio works best for network I/O: HTTP requests, database queries, websockets, file I/O. It does not help CPU-bound tasks (use multiprocessing for those).

import asyncio
import time

async def fetch_user(user_id):
    """Simulate API call — each takes 1 second."""
    print(f"  Fetching user {user_id}...")
    await asyncio.sleep(1)  # simulates network I/O
    return {"id": user_id, "name": f"User_{user_id}"}

async def fetch_orders(user_id):
    """Simulate database query — each takes 0.5 seconds."""
    await asyncio.sleep(0.5)
    return [{"order_id": f"ORD-{user_id}-1", "amount": 1299}]

# Sequential — slow (3 users × 1s = 3s)
async def sequential():
    start = time.perf_counter()
    for uid in [1, 2, 3]:
        user = await fetch_user(uid)
    print(f"Sequential: {time.perf_counter() - start:.2f}s")

# Concurrent — fast (3 users in parallel = ~1s)
async def concurrent():
    start = time.perf_counter()
    users = await asyncio.gather(
        fetch_user(1),
        fetch_user(2),
        fetch_user(3),
    )
    # Fetch orders for all users concurrently too
    all_orders = await asyncio.gather(
        *[fetch_orders(u["id"]) for u in users]
    )
    print(f"Concurrent: {time.perf_counter() - start:.2f}s")
    print(f"Fetched {len(users)} users with orders")

asyncio.run(sequential())   # ~3.00s
asyncio.run(concurrent())   # ~1.50s — 2x faster!

A price comparison API that fetched prices from 12 vendor APIs switched from sequential requests (12 × 0.8s = 9.6s) to asyncio.gather (all 12 concurrent = 1.1s). Response time dropped from 10s to 1.2s, and the server handled 5x more concurrent users.

async/await = concurrency for I/O-bound work on a single thread. Use asyncio.gather() to run multiple I/O operations concurrently. Not for CPU-bound work.
⚠️ Common Mistake

Candidates confuse concurrency with parallelism. asyncio runs on one thread — it doesn't bypass the GIL. It speeds up I/O waits by doing other work while waiting. For CPU-bound parallelism, use multiprocessing.

🔁 Follow-Up Question

What is the difference between asyncio.gather() and asyncio.TaskGroup? How do you handle errors in concurrent tasks?

26 What are Abstract Base Classes (ABCs) and how do you use them? advanced

Abstract Base Classes (from the abc module) let you define interfaces — classes that cannot be instantiated and that force subclasses to implement specific methods. Use ABC as a base class and @abstractmethod to mark methods that must be overridden.

ABCs are Python's way of establishing contracts: "if you inherit from this base class, you must implement these methods." This catches errors at instantiation time rather than at runtime when a method is called.

Python also provides built-in ABCs in collections.abc (Iterable, Mapping, Sequence) for duck-typing validation.

from abc import ABC, abstractmethod

class PaymentGateway(ABC):
    """Abstract interface — all payment providers must implement these."""
    
    @abstractmethod
    def charge(self, amount, currency="INR"):
        """Process a payment. Must return transaction ID."""
        pass
    
    @abstractmethod
    def refund(self, transaction_id, amount=None):
        """Refund a payment. amount=None means full refund."""
        pass
    
    def validate_amount(self, amount):
        """Concrete method — shared by all subclasses."""
        if amount <= 0:
            raise ValueError(f"Amount must be positive, got {amount}")

class RazorpayGateway(PaymentGateway):
    def charge(self, amount, currency="INR"):
        self.validate_amount(amount)
        # Razorpay-specific API call
        return f"rzp_txn_{amount}"
    
    def refund(self, transaction_id, amount=None):
        return f"rzp_refund_{transaction_id}"

class StripeGateway(PaymentGateway):
    def charge(self, amount, currency="USD"):
        self.validate_amount(amount)
        return f"stripe_pi_{amount}"
    
    def refund(self, transaction_id, amount=None):
        return f"stripe_refund_{transaction_id}"

# Cannot instantiate abstract class
# gateway = PaymentGateway()  # TypeError!

# Polymorphism — same interface, different implementations
def process_order(gateway: PaymentGateway, amount: float):
    txn_id = gateway.charge(amount)
    print(f"Charged ₹{amount} — Transaction: {txn_id}")

process_order(RazorpayGateway(), 2499.00)
process_order(StripeGateway(), 49.99)

A payment platform integrated 6 gateways (Razorpay, Stripe, PayU, Paytm, PhonePe, CCAvenue). The PaymentGateway ABC ensured every new integration implemented charge(), refund(), and verify() — catching 3 missing-method bugs during development instead of in production.

ABCs define contracts — force subclasses to implement required methods. Errors caught at class creation, not at runtime. Use for plugin systems, payment gateways, and data source interfaces.
⚠️ Common Mistake

Candidates forget that forgetting a single @abstractmethod implementation raises TypeError at instantiation, not at class definition. If you define the class but don't instantiate it, you won't see the error — tests must create instances.

🔁 Follow-Up Question

What is duck typing and how does it relate to ABCs? Can you use ABCs with duck typing?

27 What are __slots__ and when should you use them? advanced

By default, Python stores instance attributes in a __dict__ (a dictionary per instance). __slots__ replaces this dict with a fixed-size struct, which uses significantly less memory and is slightly faster for attribute access.

Define __slots__ = ('name', 'age') to restrict instances to only those attributes. Benefits: 30-50% less memory per instance, ~10% faster attribute access. Trade-offs: no dynamic attribute addition, complications with multiple inheritance, no __dict__ by default.

Use __slots__ when you have millions of instances of the same class (data processing, game entities, ORM rows).

import sys

# Without __slots__ — each instance has a dict
class PointDict:
    def __init__(self, x, y, z):
        self.x = x
        self.y = y
        self.z = z

# With __slots__ — fixed struct, no dict
class PointSlots:
    __slots__ = ('x', 'y', 'z')
    
    def __init__(self, x, y, z):
        self.x = x
        self.y = y
        self.z = z

# Memory comparison
p_dict  = PointDict(1.0, 2.0, 3.0)
p_slots = PointSlots(1.0, 2.0, 3.0)

size_dict  = sys.getsizeof(p_dict) + sys.getsizeof(p_dict.__dict__)
size_slots = sys.getsizeof(p_slots)

print(f"With __dict__:  {size_dict} bytes")   # ~152 bytes
print(f"With __slots__: {size_slots} bytes")   # ~56 bytes
print(f"Savings: {((size_dict - size_slots) / size_dict * 100):.0f}%")
# Savings: 63%

# Scale: 10 million points
# __dict__: ~1.5 GB
# __slots__: ~560 MB  — saves ~1 GB of RAM!

# Trade-off: can't add dynamic attributes
# p_slots.color = "red"  # AttributeError!

A real-time IoT platform tracking 5 million sensor readings per minute used __slots__ on SensorReading objects, reducing RAM usage from 4.2 GB to 1.4 GB — the difference between needing a 8 GB server and a 2 GB server ($200/month savings).

Use __slots__ when creating millions of instances to save 40-60% memory. Trade-off: no dynamic attributes. Perfect for data classes, points, records, and ORM models.
⚠️ Common Mistake

Candidates add __slots__ to every class "for performance." This is premature optimization — __slots__ only matters when you have thousands+ instances. For regular classes, the flexibility of __dict__ is worth the small overhead. Also, __slots__ doesn't work well with multiple inheritance unless all parents use __slots__.

🔁 Follow-Up Question

Can you combine __slots__ with inheritance? What happens if the parent class doesn't declare __slots__?

28 Explain closures in Python and how they capture variables. advanced

A closure is a nested function that remembers the variables from its enclosing scope, even after the outer function has returned. The inner function "closes over" the free variables.

Closures work because Python stores these captured variables in the function's __closure__ attribute as cell objects. The nonlocal keyword (Python 3) lets the inner function modify the captured variable, not just read it.

Closures are the mechanism behind decorators, callback patterns, and factory functions. They provide encapsulation without needing a full class.

# Closure as a factory — configurable validators
def make_range_validator(min_val, max_val, field_name="value"):
    """Factory that returns a validator function."""
    def validate(value):
        if not min_val <= value <= max_val:
            raise ValueError(
                f"{field_name} must be {min_val}-{max_val}, got {value}"
            )
        return value
    return validate  # returns inner function with captured min/max

# Create specific validators — each remembers its own range
validate_age     = make_range_validator(0, 150, "Age")
validate_score   = make_range_validator(0, 100, "Score")
validate_salary  = make_range_validator(10000, 10000000, "Salary")

print(validate_age(25))      # 25
print(validate_score(92))    # 92
# validate_age(200)  # ValueError: Age must be 0-150, got 200

# Counter closure with nonlocal
def make_counter(initial=0):
    count = initial
    def increment(step=1):
        nonlocal count  # modify enclosing variable
        count += step
        return count
    def get():
        return count
    return increment, get

inc, get = make_counter(10)
print(inc())    # 11
print(inc(5))   # 16
print(get())    # 16

# Inspect closure cells
print(validate_age.__closure__[0].cell_contents)  # 0 (min_val)
print(validate_age.__closure__[1].cell_contents)  # 150 (max_val)

A notification system used closures to create per-channel senders: make_sender("slack", webhook_url), make_sender("email", smtp_config). Each returned function captured its own config, eliminating the need for 5 separate sender classes — reduced 300 lines to 50.

Closures capture enclosing variables and remember them after the outer function returns. Use nonlocal to modify captured variables. Closures replace simple classes when you just need state + one function.
⚠️ Common Mistake

The classic closure trap — capturing a loop variable by reference:

❌ Wrong — all functions see the final loop value
funcs = []
for i in range(3):
    funcs.append(lambda: i)
print([f() for f in funcs])  # [2, 2, 2] — all see i=2!
✅ Correct — capture by default argument
funcs = []
for i in range(3):
    funcs.append(lambda i=i: i)  # default arg captures current i
print([f() for f in funcs])  # [0, 1, 2] — correct!
🔁 Follow-Up Question

How do closures compare to classes for maintaining state? When would you choose one over the other?

29 What design patterns do you commonly use in Python? Explain with examples. experienced

Design patterns in Python are simpler than in Java/C++ because of first-class functions, duck typing, and dynamic features. The most commonly used patterns:

Factory Pattern: A function/method that creates and returns objects based on input, hiding instantiation logic.
Strategy Pattern: Pass different algorithms (functions) as arguments — trivial in Python since functions are first-class.
Observer Pattern: Objects subscribe to events and get notified when state changes.
Singleton: Ensure only one instance exists — use a module-level variable (simplest) or metaclass.
Decorator Pattern: Already built into Python's @decorator syntax.

# Strategy Pattern — payment processing with different strategies
from typing import Callable

def process_payment(amount: float, strategy: Callable[[float], str]) -> str:
    """Process payment using the given strategy function."""
    if amount <= 0:
        raise ValueError("Amount must be positive")
    return strategy(amount)

# Strategies are just functions
def upi_payment(amount):
    return f"UPI: ₹{amount:,.2f} debited via UPI ID"

def card_payment(amount):
    fee = amount * 0.02  # 2% processing fee
    return f"Card: ₹{amount + fee:,.2f} charged (includes ₹{fee:,.2f} fee)"

def wallet_payment(amount):
    cashback = min(amount * 0.05, 100)  # 5% cashback, max ₹100
    return f"Wallet: ₹{amount:,.2f} paid, ₹{cashback:,.2f} cashback earned"

# Usage — strategy selected at runtime
print(process_payment(2500, upi_payment))
print(process_payment(2500, card_payment))
print(process_payment(2500, wallet_payment))

# Factory Pattern — create exporters based on format
class CSVExporter:
    def export(self, data): return "CSV output..."

class JSONExporter:
    def export(self, data): return "JSON output..."

class ExcelExporter:
    def export(self, data): return "Excel output..."

def create_exporter(format_type):
    """Factory function — hides instantiation logic."""
    exporters = {
        "csv": CSVExporter,
        "json": JSONExporter,
        "excel": ExcelExporter,
    }
    cls = exporters.get(format_type)
    if not cls:
        raise ValueError(f"Unknown format: {format_type}")
    return cls()

exporter = create_exporter("json")
print(exporter.export(data=[1, 2, 3]))

A SaaS billing system used the Strategy pattern for 8 payment methods (UPI, card, wallet, net banking, EMI, BNPL, crypto, bank transfer). Adding a new payment method required writing one function and adding it to a dict — zero changes to the core billing logic. New integrations dropped from 2 days to 2 hours.

In Python, many GoF patterns simplify to "pass a function." Strategy = function argument. Factory = dict of classes. Observer = list of callbacks. Don't over-engineer with Java-style pattern implementations.
⚠️ Common Mistake

Candidates implement Singleton with complex metaclasses when a simple module-level variable does the same thing. In Python, modules are singletons by default — import config always returns the same object. Also, candidates over-pattern: not everything needs a Factory or Abstract Factory.

🔁 Follow-Up Question

When would you use a class-based approach vs a functional approach for the Strategy pattern in Python?

30 Explain CPython internals — how does Python execute code? experienced

CPython (the standard Python implementation) executes code in stages:

1. Parsing: Source code → Abstract Syntax Tree (AST). ast module lets you inspect/modify the AST.
2. Compilation: AST → bytecode (.pyc files). Bytecode is a set of instructions for CPython's virtual machine. View with dis module.
3. Execution: The CPython VM (ceval.c) executes bytecode instructions one at a time in a giant switch-case loop. Each stack frame has its own evaluation stack.

Important internals: small integer caching (-5 to 256 are pre-allocated), string interning (common strings are reused), __pycache__ (compiled bytecode storage), and peephole optimizer (constant folding, dead code elimination at bytecode level).

import dis
import ast
import sys

# 1. View bytecode with dis
def add_numbers(a, b):
    result = a + b
    return result

print("=== Bytecode ===")
dis.dis(add_numbers)
# LOAD_FAST    0 (a)
# LOAD_FAST    1 (b)
# BINARY_ADD
# STORE_FAST   2 (result)
# LOAD_FAST    2 (result)
# RETURN_VALUE

# 2. Small integer caching
a = 256
b = 256
print(a is b)  # True — same object (cached)

c = 257
d = 257
print(c is d)  # False — different objects (not cached)

# 3. Inspect the AST
source = "prices = [p * 1.18 for p in products if p > 100]"
tree = ast.parse(source)
print(ast.dump(tree, indent=2))

# 4. Check bytecode file location
import json
print(json.__cached__)
# /usr/lib/python3.11/json/__pycache__/__init__.cpython-311.pyc

# 5. Peephole optimization — constant folding
def constants():
    x = 24 * 60 * 60  # compiler pre-calculates to 86400
    return x

dis.dis(constants)
# LOAD_CONST  1 (86400)  ← pre-computed at compile time!

A performance engineer used the dis module to discover that a hot loop was creating unnecessary temporary objects (LOAD_CONST + BUILD_LIST on every iteration). Rewriting to use a pre-allocated list and direct STORE_FAST reduced the loop time from 4.2s to 1.8s on 50M iterations.

Python compiles source → AST → bytecode → VM execution. Use dis module to understand performance. Small ints (-5 to 256) are cached. Bytecode is stored in __pycache__.
⚠️ Common Mistake

Candidates use is to compare values instead of ==. is checks identity (same object in memory), == checks equality (same value). Due to integer caching, 256 is 256 is True but 257 is 257 may be False. Always use == for value comparison.

🔁 Follow-Up Question

What is the difference between CPython and PyPy? When would you choose PyPy?

31 How do you write and integrate C extensions for Python? experienced

When Python is too slow for a critical section, you can write it in C and call it from Python. Three main approaches:

1. ctypes: Call existing C shared libraries (.so/.dll) from Python — no compilation needed. Good for wrapping existing C code.
2. Cython: Write Python-like code with type annotations, compiled to C. Easiest way to get C performance. .pyx files compile to .so.
3. C API (Python.h): Write raw C extensions using CPython's C API. Most control, most complex. Used by NumPy, pandas internally.

cffi is a modern alternative to ctypes with better ergonomics. pybind11 wraps C++ code for Python.

# Approach 1: ctypes — call existing C library
import ctypes

# Load the system C math library
libm = ctypes.CDLL("libm.so.6")  # Linux
# libm = ctypes.CDLL("libm.dylib")  # macOS

libm.sqrt.restype = ctypes.c_double
libm.sqrt.argtypes = [ctypes.c_double]
print(f"sqrt(144) = {libm.sqrt(144)}")  # 12.0

# Approach 2: Cython (save as fast_math.pyx)
# -------------------------------------------
# cdef double dot_product(double[:] a, double[:] b):
#     cdef int i, n = a.shape[0]
#     cdef double total = 0.0
#     for i in range(n):
#         total += a[i] * b[i]
#     return total
# -------------------------------------------
# Compile: cythonize -i fast_math.pyx
# Use:     from fast_math import dot_product

# Approach 3: Pure Python with NumPy (releases GIL internally)
import numpy as np
a = np.random.rand(10_000_000)
b = np.random.rand(10_000_000)

# NumPy's dot is written in C/Fortran — blazing fast
result = np.dot(a, b)  # 10M element dot product in ~5ms

# Performance comparison
import time
# Pure Python: 10M element dot product = ~4.2 seconds
# NumPy (C):   10M element dot product = ~0.005 seconds
# Speedup: ~840x

A quantitative finance team had a risk calculation in pure Python taking 45 minutes. Rewriting the inner loop (matrix multiplication of 50K × 50K) in Cython with typed memoryviews reduced it to 28 seconds — a 96x speedup. The rest of the codebase stayed in Python.

Use NumPy/pandas first (already C/Fortran under the hood). If that's not enough, Cython is the easiest path to C speed. Use ctypes/cffi to wrap existing C libraries. Raw C API is rarely needed.
⚠️ Common Mistake

Candidates jump straight to "rewrite in C" when a simple NumPy vectorization would give the same 100x speedup. Always profile first, optimize the hot path, and try NumPy before writing C code. Also, C extensions can have memory leaks and segfaults — much harder to debug than Python.

🔁 Follow-Up Question

What is pybind11 and how does it compare to ctypes/Cython for wrapping C++ code?

32 How do you design a testing strategy for a large Python codebase? experienced

A production testing strategy has multiple layers:

Unit tests: Test individual functions/classes in isolation. Use pytest with fixtures, parametrize, and mocking. Target: 80%+ coverage on business logic.
Integration tests: Test multiple components together (API + DB, service + message queue). Use test databases and fixtures.
Mocking: unittest.mock.patch replaces external dependencies (APIs, databases, time) so tests are fast and deterministic.
Property-based testing: hypothesis generates random inputs to find edge cases you didn't think of.
CI/CD: Run tests automatically on every commit with pytest + coverage reporting.

import pytest
from unittest.mock import patch, MagicMock
from datetime import datetime

# The code being tested
class PricingEngine:
    def __init__(self, tax_rate=0.18):
        self.tax_rate = tax_rate
    
    def calculate_total(self, items):
        subtotal = sum(item["price"] * item["qty"] for item in items)
        tax = subtotal * self.tax_rate
        return round(subtotal + tax, 2)
    
    def apply_coupon(self, total, coupon_code, api_client):
        """Calls external API to validate coupon."""
        discount = api_client.validate_coupon(coupon_code)
        return round(total * (1 - discount / 100), 2)

# --- Tests ---

class TestPricingEngine:
    @pytest.fixture
    def engine(self):
        return PricingEngine(tax_rate=0.18)
    
    @pytest.fixture
    def sample_items(self):
        return [
            {"name": "Laptop", "price": 50000, "qty": 1},
            {"name": "Mouse",  "price": 500,   "qty": 2},
        ]
    
    def test_calculate_total_with_tax(self, engine, sample_items):
        total = engine.calculate_total(sample_items)
        assert total == 60180.0  # (50000 + 1000) * 1.18
    
    def test_empty_cart(self, engine):
        assert engine.calculate_total([]) == 0.0
    
    @pytest.mark.parametrize("tax_rate,expected", [
        (0.0,  51000.0),
        (0.05, 53550.0),
        (0.18, 60180.0),
        (0.28, 65280.0),
    ])
    def test_different_tax_rates(self, sample_items, tax_rate, expected):
        engine = PricingEngine(tax_rate=tax_rate)
        assert engine.calculate_total(sample_items) == expected
    
    def test_apply_coupon_mocked_api(self, engine):
        """Mock external API — don't call real service in tests."""
        mock_api = MagicMock()
        mock_api.validate_coupon.return_value = 20  # 20% discount
        
        result = engine.apply_coupon(1000.0, "SAVE20", mock_api)
        
        assert result == 800.0
        mock_api.validate_coupon.assert_called_once_with("SAVE20")

A fintech team with 200K lines of Python adopted pytest + mocking + CI. Before: 3-4 production bugs per sprint, 2-hour manual testing cycles. After: 0-1 bugs per sprint, 8-minute automated test suite with 87% coverage. The test suite caught a critical rounding bug in tax calculation that would have affected 50K invoices.

Use pytest with fixtures for setup, parametrize for data-driven tests, and mock for external dependencies. Test behavior, not implementation. Aim for 80%+ coverage on business logic.
⚠️ Common Mistake

Candidates mock too much or too little. Mock external dependencies (APIs, databases, time), but don't mock the code under test. If you mock everything, you're testing your mocks, not your code. Another mistake: testing private methods instead of public behavior.

🔁 Follow-Up Question

What is the difference between mocking and patching? When should you use each?

33 How do you package and distribute a Python project? experienced

Modern Python packaging uses pyproject.toml (PEP 517/518) as the single configuration file, replacing the older setup.py + setup.cfg approach.

Key files: pyproject.toml (metadata, dependencies, build config), src/ layout (prevents import confusion), README.md, LICENSE, tests/.

Build tools: setuptools (traditional), hatchling (modern, fast), poetry (dependency management + build), flit (simplest).
Publish: Build with python -m build, upload with twine upload dist/* to PyPI.

For internal packages, use a private PyPI server (devpi, Artifactory) or direct Git dependencies.

# pyproject.toml — modern Python packaging
# [build-system]
# requires = ["hatchling"]
# build-backend = "hatchling.build"
#
# [project]
# name = "invoice-generator"
# version = "2.1.0"
# description = "Generate GST-compliant invoices"
# readme = "README.md"
# license = {text = "MIT"}
# requires-python = ">=3.9"
# authors = [{name = "Priya", email = "priya@company.com"}]
#
# dependencies = [
#     "jinja2>=3.1",
#     "weasyprint>=60.0",
#     "pydantic>=2.0",
# ]
#
# [project.optional-dependencies]
# dev = ["pytest>=7.0", "ruff>=0.1", "mypy>=1.0"]
#
# [project.scripts]
# invoice = "invoice_generator.cli:main"

# Project structure (src layout):
# invoice-generator/
# ├── pyproject.toml
# ├── README.md
# ├── LICENSE
# ├── src/
# │   └── invoice_generator/
# │       ├── __init__.py
# │       ├── cli.py
# │       ├── generator.py
# │       └── templates/
# └── tests/
#     ├── test_generator.py
#     └── conftest.py

# Build and publish commands:
# pip install build twine
# python -m build                    # creates dist/*.whl and dist/*.tar.gz
# twine check dist/*                 # validate package
# twine upload dist/*                # upload to PyPI
# pip install invoice-generator      # anyone can install it!

A data science team shared 12 internal Python packages across 8 projects using a private DevPI server. Before packaging: copy-paste code, version drift, 30 bugs/quarter from stale copies. After: pip install from internal index, automatic versioning, zero copy-paste bugs.

Use pyproject.toml + src layout for all new projects. Build with python -m build, publish with twine. For internal packages, use a private PyPI server.
⚠️ Common Mistake

Candidates still use setup.py for new projects. pyproject.toml is the standard since PEP 517/518 (Python 3.7+). Also, candidates forget to pin dependency ranges — requests>=2.28,<3 is safer than requests>=2.28 which could break with a major version bump.

🔁 Follow-Up Question

What is the difference between a wheel (.whl) and a source distribution (.tar.gz)? When does it matter?

34 How do you set up CI/CD for a Python project? experienced

CI/CD automates testing, building, and deploying Python projects on every commit.

CI (Continuous Integration): Run tests, linting, type checking, and security scans on every PR. Tools: GitHub Actions, GitLab CI, Jenkins.
CD (Continuous Deployment): Auto-deploy to staging on merge, production on release tag.

Typical Python CI pipeline: ruff check (linting) → mypy (type checking) → pytest --cov (tests + coverage) → bandit (security scan) → python -m build (package) → deploy.

Best practices: test against multiple Python versions (3.9, 3.10, 3.11, 3.12), use dependency caching, fail fast, and keep the pipeline under 5 minutes.

# .github/workflows/ci.yml — GitHub Actions
# name: Python CI
#
# on:
#   push:
#     branches: [main]
#   pull_request:
#     branches: [main]
#
# jobs:
#   test:
#     runs-on: ubuntu-latest
#     strategy:
#       matrix:
#         python-version: ["3.10", "3.11", "3.12"]
#
#     steps:
#       - uses: actions/checkout@v4
#
#       - name: Set up Python ${{ matrix.python-version }}
#         uses: actions/setup-python@v5
#         with:
#           python-version: ${{ matrix.python-version }}
#
#       - name: Cache pip packages
#         uses: actions/cache@v4
#         with:
#           path: ~/.cache/pip
#           key: ${{ runner.os }}-pip-${{ hashFiles("requirements*.txt") }}
#
#       - name: Install dependencies
#         run: |
#           pip install -e ".[dev]"
#
#       - name: Lint with ruff
#         run: ruff check src/ tests/
#
#       - name: Type check with mypy
#         run: mypy src/
#
#       - name: Test with pytest
#         run: pytest --cov=src --cov-report=xml -v
#
#       - name: Security scan
#         run: bandit -r src/ -ll

A 15-person team deployed to production manually — every release took 4 hours and broke 30% of the time. After GitHub Actions CI/CD: deploy time dropped to 12 minutes (automated), failure rate dropped to 3%, and the team shipped 3x more features per quarter.

CI pipeline: lint → type-check → test → security scan → build. Test against multiple Python versions. Cache dependencies. Keep pipeline under 5 minutes. Auto-deploy on merge to main.
⚠️ Common Mistake

Candidates skip security scanning (bandit) and dependency auditing (pip-audit) in CI. Also, testing only on one Python version is risky — a feature that works on 3.11 may fail on 3.9. Matrix testing catches version-specific bugs before production.

🔁 Follow-Up Question

How do you handle database migrations in CI/CD? How do you do zero-downtime deployments?

35 How do you architect a large Python codebase for maintainability? experienced

Large Python codebases need clear structure to stay maintainable as the team grows. Key principles:

Layered architecture: Separate presentation, business logic, and data access into distinct modules. Never let database queries leak into API handlers.
Dependency injection: Pass dependencies as constructor params, not as global imports. Makes testing trivial.
Domain-driven design: Organize code by business domain (users/, orders/, payments/), not by technical role (models/, views/, controllers/).
Type hints: Use type annotations + mypy/pyright for static analysis — catches bugs before runtime.
Configuration: 12-factor app — config from environment variables, not hardcoded.

# Domain-driven project structure
# myapp/
# ├── users/
# │   ├── __init__.py
# │   ├── models.py        # User, UserProfile
# │   ├── services.py      # Business logic
# │   ├── repository.py    # Database access
# │   ├── api.py            # HTTP handlers
# │   └── tests/
# ├── orders/
# │   ├── __init__.py
# │   ├── models.py
# │   ├── services.py
# │   ├── repository.py
# │   ├── api.py
# │   └── tests/
# ├── shared/
# │   ├── database.py
# │   ├── config.py
# │   └── exceptions.py
# └── main.py

# Dependency injection — testable service layer
from dataclasses import dataclass
from typing import Protocol

class UserRepository(Protocol):
    """Interface — any class with these methods works."""
    def get_by_id(self, user_id: int) -> dict: ...
    def save(self, user: dict) -> None: ...

@dataclass
class UserService:
    repo: UserRepository  # injected, not imported
    
    def upgrade_to_premium(self, user_id: int) -> dict:
        user = self.repo.get_by_id(user_id)
        if user["total_spent"] < 10000:
            raise ValueError("Minimum ₹10,000 spent required for premium")
        user["is_premium"] = True
        user["discount_rate"] = 0.15
        self.repo.save(user)
        return user

# In production: UserService(repo=PostgresUserRepo(db))
# In tests:      UserService(repo=FakeUserRepo(test_data))

A startup grew from 3 to 25 engineers in 18 months. Their single 15K-line app.py became unmaintainable. Refactoring to domain-driven modules (users/, orders/, payments/, notifications/) with Protocol-based interfaces reduced onboarding time from 3 weeks to 3 days and made it possible for teams to work on different domains without merge conflicts.

Organize by business domain, not technical layer. Use dependency injection (pass, don't import). Type hints + Protocol for interfaces. Config from environment variables.
⚠️ Common Mistake

Candidates create "god modules" — services.py with 5000 lines, models.py with 50 classes. Each module should have a single responsibility. Also, circular imports are a sign of poor architecture — if A imports B and B imports A, they need restructuring.

🔁 Follow-Up Question

How do you handle circular imports in a large Python project? What are the strategies to prevent them?

36 How do you profile Python code to find performance bottlenecks? performance

Profiling measures where your code spends time and memory. Python offers several profiling tools:

cProfile: Built-in CPU profiler. Shows function call counts and time per function. Run with python -m cProfile script.py or use programmatically.
line_profiler: Shows time per line within a function — much more granular than cProfile.
memory_profiler: Shows memory usage per line.
py-spy: Sampling profiler that attaches to running processes without modifying code — great for production profiling.
timeit: Micro-benchmark for small code snippets.

Rule: Profile before optimizing. 90% of runtime is usually in 10% of code. Find the hot path first.

import cProfile
import pstats
import io
from functools import lru_cache

# The code to profile
def process_transactions(transactions):
    results = []
    for txn in transactions:
        # Simulate expensive operations
        category = categorize(txn["amount"])
        risk = calculate_risk(txn["amount"], txn["merchant"])
        results.append({**txn, "category": category, "risk": risk})
    return results

def categorize(amount):
    """Intentionally slow — O(n) lookup each time."""
    categories = [(0, 500, "micro"), (500, 5000, "small"),
                  (5000, 50000, "medium"), (50000, float("inf"), "large")]
    for low, high, cat in categories:
        if low <= amount < high:
            return cat

def calculate_risk(amount, merchant):
    """Simulate computation."""
    return round(amount * 0.001 * len(merchant), 4)

# Profile it
transactions = [{"amount": i * 100, "merchant": f"shop_{i}"}
                for i in range(10000)]

profiler = cProfile.Profile()
profiler.enable()
results = process_transactions(transactions)
profiler.disable()

# Print sorted by cumulative time
stream = io.StringIO()
stats = pstats.Stats(profiler, stream=stream).sort_stats("cumulative")
stats.print_stats(10)  # top 10 functions
print(stream.getvalue())

# Quick timing with timeit
import timeit
time_taken = timeit.timeit(
    lambda: process_transactions(transactions[:100]),
    number=100
)
print(f"100 txns × 100 runs: {time_taken:.3f}s")
print(f"Per transaction: {time_taken / 10000 * 1000:.4f}ms")

An API endpoint took 4.5 seconds. cProfile revealed that 89% of time was in a single function — a JSON schema validation running on every nested object. Caching the compiled schema reduced the endpoint from 4.5s to 0.3s, a 15x improvement found in 20 minutes of profiling.

Always profile before optimizing. cProfile for function-level hotspots, line_profiler for line-level analysis, py-spy for production. 90% of time is usually in 10% of code.
⚠️ Common Mistake

Candidates optimize without profiling — they guess at bottlenecks and "optimize" code that runs once while the real bottleneck (called 100K times) is untouched. Another mistake: using time.time() instead of time.perf_counter() for benchmarks — time.time() has lower resolution and can jump due to system clock adjustments.

🔁 Follow-Up Question

How do you profile memory usage in Python? What tools detect memory leaks?

37 When should you use multiprocessing vs threading vs asyncio? performance

The choice depends on the type of work:

Threading (threading, concurrent.futures.ThreadPoolExecutor): Best for I/O-bound tasks — waiting on network, files, databases. Threads share memory so communication is easy, but the GIL prevents CPU parallelism.

Multiprocessing (multiprocessing, ProcessPoolExecutor): Best for CPU-bound tasks — number crunching, image processing, ML training. Each process has its own Python interpreter and GIL, giving true parallelism. Trade-off: higher memory usage and IPC overhead.

Asyncio: Best for high-concurrency I/O — handling 10K+ simultaneous connections (web servers, chat, websockets). Single-threaded, event-loop-based. Requires async libraries (aiohttp, asyncpg).

import time
import asyncio
import aiohttp
from concurrent.futures import ThreadPoolExecutor, ProcessPoolExecutor

# I/O-bound: Fetching 100 URLs

# 1. Threading — good for I/O  
def fetch_url_sync(url):
    import urllib.request
    return urllib.request.urlopen(url).read()[:100]

def threaded_fetch(urls):
    with ThreadPoolExecutor(max_workers=20) as pool:
        results = list(pool.map(fetch_url_sync, urls))
    return results

# 2. Asyncio — best for high-concurrency I/O
async def async_fetch(urls):
    async with aiohttp.ClientSession() as session:
        tasks = [session.get(url) for url in urls]
        responses = await asyncio.gather(*tasks)
        return [await r.read() for r in responses]

# CPU-bound: Image processing

def process_image(image_path):
    """CPU-heavy: resize, filter, compress."""
    # Simulate CPU work
    total = sum(i * i for i in range(500_000))
    return total

# 3. Multiprocessing — true parallelism for CPU work
def parallel_process(image_paths):
    with ProcessPoolExecutor(max_workers=8) as pool:
        results = list(pool.map(process_image, image_paths))
    return results

# Decision matrix:
# ┌──────────────────┬─────────────┬────────────────┬──────────┐
# │ Task Type        │ Threading   │ Multiprocessing│ Asyncio  │
# ├──────────────────┼─────────────┼────────────────┼──────────┤
# │ API calls (100)  │ ✅ Good     │ ❌ Overkill    │ ✅ Best  │
# │ File processing  │ ✅ OK       │ ✅ Best        │ ❌ No    │
# │ CPU computation  │ ❌ GIL      │ ✅ Best        │ ❌ No    │
# │ 10K connections  │ ❌ Too many │ ❌ Too many    │ ✅ Best  │
# │ Mixed I/O + CPU  │ ✅ OK       │ ✅ Best        │ ⚠️ Tricky│
# └──────────────────┴─────────────┴────────────────┴──────────┘

A document processing pipeline had 3 stages: download PDFs (I/O), extract text (CPU), upload results (I/O). Using asyncio for downloads (100 concurrent), multiprocessing for extraction (8 cores), and threading for uploads gave a 12x speedup — from 45 minutes to 3.5 minutes for 5K documents.

Threading for I/O-bound (network, files). Multiprocessing for CPU-bound (computation). Asyncio for high-concurrency I/O (10K+ connections). Profile to confirm which type your workload is.
⚠️ Common Mistake

Candidates use threading for CPU-bound work and wonder why it's not faster (GIL!). They also use multiprocessing for simple I/O tasks, wasting memory on separate processes when threads would suffice. The most common mistake: creating a new thread/process per task instead of using a pool.

🔁 Follow-Up Question

How does concurrent.futures simplify threading and multiprocessing? What is the Executor pattern?

38 How does NumPy vectorization speed up numerical computation? performance

Vectorization means performing operations on entire arrays at once using optimized C/Fortran code, instead of looping element-by-element in Python. NumPy arrays store data contiguously in memory (unlike Python lists), enabling CPU cache efficiency and SIMD instructions.

A Python for-loop over 10M elements calls the Python interpreter 10M times. A NumPy operation calls C code once for all 10M elements — 100-1000x faster for numerical work.

Key idea: replace Python loops with NumPy operations — np.sum(), np.where(), broadcasting, fancy indexing, and ufuncs (universal functions). If you find yourself writing for i in range(len(array)):, there's likely a NumPy way.

import numpy as np
import time

# Task: Calculate portfolio returns for 1M stocks over 252 trading days

# ❌ Pure Python — slow loop
def python_returns(prices):
    returns = []
    for i in range(len(prices)):
        stock_returns = []
        for j in range(1, len(prices[i])):
            r = (prices[i][j] - prices[i][j-1]) / prices[i][j-1]
            stock_returns.append(r)
        returns.append(stock_returns)
    return returns

# ✅ NumPy vectorized — no loops
def numpy_returns(prices):
    return (prices[:, 1:] - prices[:, :-1]) / prices[:, :-1]

# Generate test data: 10,000 stocks × 252 days
np.random.seed(42)
stock_prices = np.random.uniform(100, 5000, size=(10_000, 252))

# Benchmark
start = time.perf_counter()
result_np = numpy_returns(stock_prices)
t_numpy = time.perf_counter() - start

stock_list = stock_prices.tolist()
start = time.perf_counter()
result_py = python_returns(stock_list[:100])  # only 100 stocks!
t_python = time.perf_counter() - start

print(f"NumPy (10K stocks):  {t_numpy:.4f}s")
print(f"Python (100 stocks): {t_python:.4f}s")
print(f"Estimated Python (10K stocks): {t_python * 100:.1f}s")
print(f"Speedup: ~{(t_python * 100) / t_numpy:.0f}x")

# Typical output:
# NumPy (10K stocks):  0.0089s
# Python (100 stocks): 0.1823s
# Estimated Python (10K stocks): 18.2s
# Speedup: ~2045x

# More vectorization examples
data = np.random.randn(1_000_000)

# Conditional: values > 0 get doubled, others set to 0
result = np.where(data > 0, data * 2, 0)

# Aggregation across axis
matrix = np.random.rand(1000, 500)
col_means = matrix.mean(axis=0)     # mean of each column
row_maxes = matrix.max(axis=1)      # max of each row

A quant trading firm's daily risk calculation on 50K instruments × 10 years of data took 6 hours in pure Python. Vectorizing with NumPy reduced it to 8 seconds — a 2,700x speedup. The entire overnight batch job moved to a real-time dashboard.

Replace Python loops with NumPy array operations for 100-2000x speedup on numerical work. Key: operations on entire arrays, not element-by-element loops.
⚠️ Common Mistake

Candidates mix NumPy and Python loops — iterating over a NumPy array with a for loop defeats the purpose:

❌ Wrong — Python loop over NumPy array
result = np.zeros(len(data))
for i in range(len(data)):
    result[i] = data[i] * 2 + 1  # Python interpreter called 1M times
✅ Correct — vectorized operation
result = data * 2 + 1  # single C call for all 1M elements
🔁 Follow-Up Question

What is NumPy broadcasting and how does it work? Give an example with differently-shaped arrays.

39 How does functools.lru_cache work and when should you use it? performance

functools.lru_cache is a decorator that caches function return values based on arguments. When the same arguments are used again, the cached result is returned instantly instead of recomputing. LRU = Least Recently Used — when the cache is full, the oldest unused entry is evicted.

@lru_cache(maxsize=128) caches up to 128 unique argument combinations. maxsize=None means unlimited cache (use carefully). Python 3.9+ also has @cache (shortcut for @lru_cache(maxsize=None)).

Requirements: arguments must be hashable (no lists or dicts). For unhashable args, convert them to tuples or use a custom cache.

from functools import lru_cache
import time

# Classic example: recursive Fibonacci
# Without cache: O(2^n) — exponentially slow
# With cache:    O(n) — each value computed once

@lru_cache(maxsize=None)
def fibonacci(n):
    if n < 2:
        return n
    return fibonacci(n - 1) + fibonacci(n - 2)

start = time.perf_counter()
result = fibonacci(500)  # instant with cache, impossible without
elapsed = time.perf_counter() - start
print(f"fib(500) = {result} ({elapsed:.6f}s)")
print(f"Cache stats: {fibonacci.cache_info()}")
# CacheInfo(hits=498, misses=501, maxsize=None, currsize=501)

# Practical: caching expensive database/API lookups
@lru_cache(maxsize=1000)
def get_exchange_rate(from_currency, to_currency, date):
    """Simulate expensive API call — cached for same inputs."""
    print(f"  API call: {from_currency}→{to_currency} on {date}")
    # In real code: requests.get(f"https://api.exchangerate.com/...")
    rates = {"USD_INR": 83.5, "EUR_INR": 91.2, "GBP_INR": 106.3}
    return rates.get(f"{from_currency}_{to_currency}", 1.0)

# First calls hit the "API"
print(get_exchange_rate("USD", "INR", "2025-01-15"))  # API call
print(get_exchange_rate("EUR", "INR", "2025-01-15"))  # API call

# Repeated calls served from cache — instant
print(get_exchange_rate("USD", "INR", "2025-01-15"))  # cached!
print(get_exchange_rate("USD", "INR", "2025-01-15"))  # cached!

# Clear cache when needed
get_exchange_rate.cache_clear()

A pricing engine called an exchange rate API 50K times per batch. Adding @lru_cache reduced API calls from 50K to 180 (unique currency pairs × dates), cutting batch time from 25 minutes to 40 seconds and saving $500/month in API costs.

Use @lru_cache for pure functions with hashable args that are called repeatedly with the same inputs. Check cache_info() to verify hit rate. Clear with cache_clear() when data changes.
⚠️ Common Mistake

Candidates use lru_cache on functions with side effects (database writes, API POST requests) or on methods without accounting for self:

❌ Wrong — caching a method caches per-instance (self is a key)
class UserService:
    @lru_cache(maxsize=100)
    def get_user(self, user_id):
        return db.query(user_id)
# Every UserService instance has its own 'self', defeating the cache
✅ Correct — cache at module level or use __hash__
@lru_cache(maxsize=100)
def get_user(user_id):
    return db.query(user_id)
# Standalone function — cache works correctly
🔁 Follow-Up Question

What is the difference between lru_cache and a Redis/Memcached cache? When would you use each?

40 What is Cython and when should you use it for performance optimization? performance

Cython is a superset of Python that compiles to C. You write Python-like code with optional C type declarations, and Cython generates C code that's compiled into a shared library (.so/.pyd) importable from Python.

Adding type annotations (cdef int, cdef double) removes Python object overhead for numeric operations, giving C-like speed. Cython can also release the GIL with nogil, enabling true multi-threaded parallelism for numerical code.

Use Cython when: NumPy can't vectorize your logic (complex conditionals, graph algorithms, custom loops), you need 10-100x speedup over pure Python, or you want to wrap an existing C library.

# Pure Python version — slow
def python_primes(limit):
    """Find all primes up to limit using Sieve of Eratosthenes."""
    sieve = [True] * (limit + 1)
    sieve[0] = sieve[1] = False
    for i in range(2, int(limit**0.5) + 1):
        if sieve[i]:
            for j in range(i*i, limit + 1, i):
                sieve[j] = False
    return [i for i, is_prime in enumerate(sieve) if is_prime]

# Cython version (save as fast_primes.pyx):
# ─────────────────────────────────────────
# def cython_primes(int limit):
#     cdef int i, j
#     cdef list sieve = [True] * (limit + 1)
#     sieve[0] = sieve[1] = False
#     
#     for i in range(2, <int>(limit**0.5) + 1):
#         if sieve[i]:
#             for j in range(i*i, limit + 1, i):
#                 sieve[j] = False
#     
#     return [i for i in range(limit + 1) if sieve[i]]
# ─────────────────────────────────────────
# Compile: cythonize -i fast_primes.pyx
# Import: from fast_primes import cython_primes

# Benchmark comparison
import time

limit = 10_000_000

start = time.perf_counter()
primes_py = python_primes(limit)
t_python = time.perf_counter() - start
print(f"Python: {t_python:.3f}s — found {len(primes_py)} primes")

# start = time.perf_counter()
# primes_cy = cython_primes(limit)
# t_cython = time.perf_counter() - start
# print(f"Cython: {t_cython:.3f}s — found {len(primes_cy)} primes")
# print(f"Speedup: {t_python / t_cython:.1f}x")

# Typical results:
# Python: 4.200s — found 664,579 primes
# Cython: 0.180s — found 664,579 primes
# Speedup: 23.3x

A bioinformatics lab had a DNA sequence alignment algorithm in pure Python taking 12 hours per genome. Cython with typed memoryviews reduced it to 18 minutes (40x speedup). The team kept 95% of the code in Python and only Cython-ized the inner loop of the Smith-Waterman algorithm.

Cython gives C-speed to Python code by adding type annotations. Use it for hot loops that NumPy can't vectorize. Profile first — only optimize the 10% of code that takes 90% of time.
⚠️ Common Mistake

Candidates Cython-ize everything instead of just the hot loop. Cython code is harder to debug and maintain — only use it where profiling shows a clear bottleneck. Also, forgetting to add type declarations (cdef int) gives zero speedup — untyped Cython is essentially the same speed as Python.

🔁 Follow-Up Question

How does Cython compare to PyPy for performance? When would you choose one over the other?

Frequently Asked Questions

Written and reviewed by the FreeBytes Editorial Team · Last updated: June 2026