Data analyst interview questions Bangalore 2026
Interview Prep 2026

22 Data Analyst Interview Questions in Bangalore — 2026 Real-World Edition

Vinay, Founder of Vtricks Technologies

By Vinay, Founder of Vtricks Technologies

Interview Prep • May 2026

Bangalore is the analytics capital of India. Whether you are interviewing at a fast-growing startup like Swiggy, a Global Capability Centre (GCC) like Target, or an IT giant like TCS, the interview standards have drastically shifted in 2026. Rote memorization of definitions no longer cuts it. Interviewers want to see how you handle messy data, business logic, and modern AI tools.

We have compiled these 22 high-probability interview questions by analyzing actual interview experiences of students from the data analytics course in Bangalore at Vtricks Technologies. These questions bypass the outdated 2020 syllabus and focus entirely on what is being asked in Bangalore boardrooms right now.

Part 1: Advanced SQL Questions (The Make-or-Break Round)

If you fail the SQL round, you do not proceed to the HR round. Product companies in Bangalore heavily test your ability to use Window Functions, CTEs (Common Table Expressions), and complex aggregations.

1. Explain the exact difference between RANK(), DENSE_RANK(), and ROW_NUMBER() with a scenario.

Answer: This is the most frequently asked SQL question. Imagine three employees earning ₹50k, ₹50k, and ₹40k.
ROW_NUMBER() assigns a unique sequential integer (1, 2, 3), regardless of ties.
RANK() gives ties the same rank but skips the next number (1, 1, 3).
DENSE_RANK() gives ties the same rank and does not skip numbers (1, 1, 2).

2. Write a query to find the second-highest salary in a department without using LIMIT or TOP.

Answer: You must use a subquery or a CTE:
SELECT MAX(salary) FROM Employees WHERE salary < (SELECT MAX(salary) FROM Employees);

3. What is a CTE, and why would you use it over a standard Subquery?

Answer: A Common Table Expression (CTE) creates a temporary result set that you can reference within a SELECT, INSERT, UPDATE, or DELETE statement. Unlike subqueries, CTEs are far more readable, allow for recursion, and can be referenced multiple times within the same query. Bangalore startups prefer CTEs for clean, maintainable code.

4. How do you find duplicate records in a table, and how do you delete them keeping only one?

Answer: To find duplicates, you group by the columns that should be unique and use `HAVING COUNT(*) > 1`. To delete them while keeping the first instance, you use a CTE with `ROW_NUMBER() OVER(PARTITION BY [duplicate_columns] ORDER BY [id])` and then write a DELETE statement where the row number is greater than 1.

5. Differentiate between UNION and UNION ALL. Which one is faster?

Answer: Both combine result sets of two or more SELECT statements. UNION removes duplicate rows, whereas UNION ALL includes all duplicates. UNION ALL is significantly faster because the database engine does not have to spend computing power sorting the data and removing the duplicate rows.

Part 2: Power BI & Data Visualization

Companies like Flipkart, Myntra, and various GCCs in Bangalore rely entirely on Power BI for leadership dashboards. If you claim Power BI on your resume, expect deep questions on DAX and Data Modeling.

6. What is the difference between a Calculated Column and a Measure in Power BI?

Answer: A Calculated Column evaluates row-by-row, consumes RAM, and recalculates only during data refresh. A Measure evaluates dynamically based on the current filter context, consumes CPU, and recalculates whenever you interact with the report. You should always use Measures for aggregations (like total sales) to optimize report performance.

7. Explain the Star Schema and why it is crucial for Power BI performance.

Answer: A Star Schema separates business data into Fact tables (containing measurable, quantitative data like sales amounts) and Dimension tables (containing descriptive attributes like date, product, customer). Power BI's VertiPaq engine is heavily optimized for Star Schemas. It ensures faster rendering times and simpler DAX calculations compared to flat, wide tables.

8. How do you resolve a Many-to-Many relationship in a data model?

Answer: You introduce a "Bridge Table" (or junction table) that contains the unique identifiers from both conflicting tables. This breaks the many-to-many relationship into two distinct one-to-many relationships, preventing inaccurate aggregations and double-counting of values.

9. What is the CALCULATE function in DAX, and why is it so powerful?

Answer: CALCULATE is the most important function in DAX. It evaluates an expression in a modified filter context. It allows you to override, add, or ignore filters coming from the report page (slicers/visuals) to perform complex comparative analysis, such as Year-over-Year growth.

10. What is the difference between SUM and SUMX?

Answer: SUM is an aggregation function that simply adds up all the numbers in a single column. SUMX is an iterator function; it goes row by row through a specified table, evaluates an expression for each row (like multiplying Quantity * Price), and then sums up the results.

Part 3: Python & Pandas for Data Analysis

Not all roles require Python, but if you are applying for a premium role (₹8 LPA+), Python is mandatory for handling datasets that are too large for Excel to process.

11. How do you handle missing values (NaN) in a Pandas DataFrame?

Answer: Depending on the business context, you can: drop the rows using `.dropna()`, fill them with a specific value or the mean/median using `.fillna()`, or use forward/backward filling methods (`ffill`, `bfill`) for time-series data.

12. Explain the difference between `.loc` and `.iloc` in Pandas.

Answer: `.loc` is label-based indexing, meaning you select data based on the names of the rows and columns. `.iloc` is integer-based indexing, meaning you select data based on their numerical index position (e.g., row 0, column 1) regardless of the label names.

13. What is the difference between merging, joining, and concatenating DataFrames?

Answer: `concat()` stacks DataFrames vertically or horizontally. `merge()` combines DataFrames based on a common key/column (similar to SQL JOINs). `join()` is a convenient method for combining DataFrames specifically on their indexes rather than columns.

14. What are Lambda functions in Python and when would an analyst use them?

Answer: Lambda functions are small, anonymous, single-line functions. Analysts frequently use them inside the Pandas `.apply()` method to perform quick transformations on a column without formally defining a full `def` function.

Part 4: The 2026 GenAI & Scenario Round

This is the section that filters out candidates stuck in 2023. Bangalore hiring managers are looking for "AI-augmented" analysts who deliver results 5x faster.

15. How do you leverage LLMs (like ChatGPT or Claude) in your day-to-day analytics workflow?

Answer Strategy: Do not say you use it to cheat on writing SQL. Say you use it for rapid prototyping (NL-to-SQL), interpreting complex error codes in Python, generating dummy data for testing dashboards, and optimizing query execution plans.

16. Have you used Copilot in Power BI? How does it change dashboard creation?

Answer Strategy: Explain that Copilot accelerates the initial layout phase and is excellent for automatically generating DAX measures based on natural language prompts. However, emphasize that human validation is still critical for verifying the business logic of the generated DAX.

17. Scenario: Sales dropped by 15% last month. The CEO wants to know why. Walk me through your analytical process.

Answer Strategy: Demonstrate a structured approach. First, verify the data quality (are there missing days?). Second, slice the data by dimensions: geography, product line, sales channel, and time (was the drop sudden or gradual?). Third, look at external factors (holidays, competitor promotions). Conclude by visualizing the primary drivers of the drop.

18. You notice a massive outlier in your dataset right before a major presentation. What do you do?

Answer Strategy: Explain that you do not immediately delete it. You investigate its origin. Is it a data entry error (e.g., fat-finger mistake), a system glitch, or a genuine anomalous event (e.g., a bulk corporate order)? If it's an error, correct or exclude it and document the change. If it's genuine, keep it but highlight it to stakeholders so it doesn't skew the general trend.

Part 5: Statistics & Excel Fundamentals

Even with AI and Python, core statistical knowledge and Excel proficiency remain mandatory for foundational analytics roles.

19. What is the difference between VLOOKUP and XLOOKUP?

XLOOKUP is the modern successor. It defaults to an exact match, can look up values to the left of the array (which VLOOKUP cannot do), and allows you to specify a "not found" value directly in the formula without needing an IFERROR wrapper.

20. Explain standard deviation and why it matters.

Standard deviation measures how spread out the numbers are from the mean. A low standard deviation means data is clustered around the average (predictable), while a high standard deviation means data is spread across a wider range (volatile).

21. What is a P-value?

In hypothesis testing, a p-value helps determine the significance of your results. A p-value less than 0.05 typically indicates strong evidence against the null hypothesis, meaning your observed result is statistically significant and not just due to random chance.

22. Describe the difference between correlation and causation.

Correlation means two variables move together. Causation means one event directly causes the other. For example, ice cream sales and sunburns are highly correlated in summer, but eating ice cream does not cause sunburns.

Stop memorizing. Start building.

Reading questions isn't enough. You need to confidently explain these concepts in real-time. Our data analytics course in Bangalore includes **5+ rigorous mock interviews** conducted by industry experts.

Located in Vijayanagar • Focus on 2026 GenAI Tools • High Placement Rate

Book a Free Demo Class