
1. Why do you want to be a Data Analyst?
Answer:
You should list why you love data, research what the company do with data, explain how you can create insights from data.
2. Where do you see yourself in 5years?
Answer:
Your answer should be how you can use your acquired experience to influence businesses or solve complex issues by creating positive impact for the company. Do not list job titles or dream companies you want to work for.
3. What are Clustered and Non- clustered Indexes in SQL Server? Explain the difference between the two.
Answer:
A Clustered Index deals with data stored in tables with key values, while Non – Clustered Index have data different from tables, usually unrelated data.
Differences:
Clustered
- Always has an index ID of 0
- Data is stored in index order
- Data is faster to read
- Only 1 per table
Non-Clustered
- Index ID > 0
- Does not order data, only columns based on index key specified when the index is created.
- Faster for inserting and updating data
- Contains between 249 and 999+ per table.
4. How do you Validate Data?
Data Validation is the process of ensuring data quality and accuracy.
Data Screening: for screening inaccuracy of data
Data Verification: used to verify data sources and use-case scenarios.
5. List the various steps in an Analytics project.
- Problem definition
- Data exploration
- Data preparation
- Modelling
- Validation of data
- Implementation and tracking
6. What is the difference between Data mining and Data profiling?
Data mining is the act of collecting data from different sources and checking patterns/relationship within the data. These could be unusual records, dependencies, sequence discovery etc.
Data Profiling simply means categorizing data either by data types, values, frequency and so on.
7. What is Logistic Regression?
Logistic Regression is used for classification problems by estimating probabilities using logistic function.
8. What is Linear Regression?
Linear regression is used for finding out the relationship between variables and forecasting.
9. What is Decision tree?
A Decision tree is a schematic description of different decisions followed by chances of occurrence. In statistics it predicts category and continuous response variable.
List some best tools that can be used for data analysis?
Tableau
KNIME
Google Search
Solver
OpenRefine
RapidMiner
NodeXL
Io
Google Fusion tables
Wolfram Alpha’s
11. List out some common problems faced by Data Analyst?
Some of the problems data analysts face with data are:
Missing values
Duplicate entries
Varying value representation
Identifying overlapping data Common misspelling
12. What is an outlier?
An outlier is a term used by analysts to refer values between results with far difference in values. For example, 2450 is far from 9800 for result needed from same data.
There are two types of outliers:
Univariate
Multivariate
13. What is Hierarchical Clustering?
Hierarchical clustering algorithm combines and divides existing groups, creating a hierarchical structure that shows the order in which groups are divided or merged.
14. What is K-mean Algorithm?
K mean is a famous partitioning method. Objects can be classified as belonging to K groups,
In K-mean,
The clusters are spherical: the data points in a cluster are centered around that cluster.
The variance/spread of the cluster is similar: Each data point belongs to the closest cluster.
15. Mention the key skills required for Data Analysts?
A Business Data Analyst must have the following skills:
- Database knowledge
- Business domain knowledge
- Communication skill
- Analytical skill
- Querying skill
- Basic Descriptive Analysis
- Report design
- Data Visualization skill
- Insight presentation
- Data Manipulation
- Predictive Analysis
16. What are the tools used in Big-data?
Tools used in Big-data include:
- Hadoop
- Hive
- Pig
- Flume
- Mahout
- Sqoop
17. What is KPI, design of experiments and 80/20 rule?
KPI: stands for key performance indicator, it consists of any
combination of spreadsheets, reports or chats about business process.
Design of experiments: The initial process used to split data, sample and
Setup up data for statistical analysis.
80/20 rule: Means 80 percent of your income comes from 20 percent of
your clients.
18. What is Map Reduce?
Map Reduce is a framework to process large data sets, splitting them
into subsets, processing each subset on a different server and then
blending results obtained on each.
19. What is Clustering? What are the properties for clustering algorithms?
Clustering is a data classification method. Clustering algorithm divides a
data set into natural groups or clusters.
Properties for clustering algorithm are:
- Hierarchical or flat
- Iterative
- Hard and soft
- Disjunctive
20. List Statistical methods used that are useful to a Data Analyst?
Statistical methods useful to Data Analysts, Data Scientists and so on:
- Bayesian method
- Markov process
- Spatial and Cluster processes
- Imputation techniques
- Rank Statistics, percentile, outlier’s detection
- Simplex algorithm
- Mathematical optimization
21. What is Time Series Analysis?
Time series is usually found in two domains, frequency domain and time domain. Time series usual output process is based on forecast by analyzing the previous data by the help of various methods like exponential smoothing, log-linear regression method, etc.
22. What is a Hash table?
In computing, a Hash table is a map of keys to values. Also, a data
structure used to implement an associate array. Hash table is used
to compute an index into an array of slot, from which desired value can
be fetched.
23. Mention the steps of a data analysis project?
- Understanding business requirements
- Identifying reliable and verified relevant data sources for business requirement.
- Exploring data sets, manipulating, cleaning the data
- Validating Data
- Visualizing data to help make business decisions and solve problems
24. What problems do Data Analysts encounter during data analysis?
Data Analysts face a lot of challenges when analyzing data sets:
- Duplicate entries of data
- Poor quality data from unreliable sources
- Incomplete data
25. What are processes of Data Analysis?
Data Analysis is the process of collecting, cleansing, interpreting,
transforming and modelling data to get insights and generate reports for business decisions.
- Collect data
- Analyze data
- Create reports
26. When should a model be retrained?
A data model should be trained when business data keeps changing on a regular basis. Also be informed that data format does not change.
27. What problems do Data Analysts encounter while performing data
Analysis?
- Duplicate data entries.
- Poor quality data acquired from unreliable sources.
- Data extracted from multiple sources may vary in representation. Once the collected data is combined after being cleansed and organized, the variations in data representation may cause a delay in the analysis process.
- Incomplete data is another major issue in data analysis process.
28. What are the characteristics of a good data model?
For a data model to be considered good and developed, it must have
these characteristics:
- Must have predictable performance that estimates accurately.
- Data should be adaptive and responsive to changes so that it can accommodate the growing business needs from time to time.
- Data should be capable of scaling in proportion to the changes in data
- It should be consumable to allow clients reap profitable results.
29. What is the difference between Data Mining and Data analysis?
DATA MINING
- Used to recognize patterns in stored data stored.
- Results extracted from data mining are not easy to interpret.
- Mining is performed on clean and well-documented data.
DATA ANALYSIS
- Used to order and organize a meaningful manner.
- Results extracted from data analysis are easy to interpret.
- The analysis of data involves data cleaning. So, data is present in a well-documented format.
30. What is the process of Data Analysis?
Data Analysis process involves the collecting, cleansing, Interpreting,
transforming and modeling data to gather insights and generate
reports to gain business profits.
31. How can you highlight cells with negative values in Excel?
You can highlight the negative values in Excel by using the
conditional formatting.
Below are steps that you can follow:
- Select the cells which you want to highlight with the negative values.
- Go to the Home tab and click on the conditional formatting option
- Go to the Highlight cell rules and click on the Less than option.
- In the dialogue box of Less than, specify the values as 0.

32. What is a Pivot table and list the different functions of a Pivot Table?
A Pivot table is a simple feature in Microsoft Excel which allows you to
Quickly summarize huge data sets. As an easy to use tool, it requires
dragging and dropping rows/column headers to create reports.
A Pivot table is made up of four different sections:
- Value areas: Values are reported in this area
- Rows area: The headings which are present on the left of the values.
- Column area: The headings at the top of the value area makes the columns area.
- Filter area: This is an optional filter used to drill down in the data set.
33. Can you make a Pivot table from multiple tables?
Yes, we can create one pivot from multiple tables when there is a
Connection between these tables.
34. What are the most common questions you should ask a client before
creating a dashboard?
Answer depends on case-to-case basis. Here are some common
questions to ask when creating a dashboard in excel
- Purpose for building the dashboard
- Different data sources
- Usage of the excel dashboard
- The frequency at which the dashboard needs to be updated.
- The version of Microsoft office the client uses.
35. What steps can you take to handle slow Excel workbook?
There are many ways to do this. Some steps have been listed below:
- Maintain all referenced data in a single sheet.
- Try using manual calculation mode.
- Try to avoid using entire rows or columns in references.
- Often use excel tables and named ranges.
- Convert all the used formulas to values.
36. What is Interleaving in SAS?
Interleaving in SAS means combining individual sorted SAS data sets
into one sorted data set. You can interleave data sets using a SET
statement along with a BY statement.
37. What is the basic syntax for writing code in SAS?
The basic style of writing code in SAS is as follows:
- Write the DATA statement which will basically name the dataset.
- Write the INPUT statement to name the variables in the data set.
- All the statements should end with a semi-colon.
- There should be a proper space between word and a statement.
38. What are the different types of joins?

Inner Join: used to return all rows from multiple tables with matching
values.
Left Join: used to return all records from the left table with matching
records from the right table.
Right Join: used to return all records from the right table with matching
records from the left table.
Full Join: Returns all records with a match in all tables
39. List the core competencies of a Business Data Analyst.
- Analytical thinking and decision making
- Business and industry knowledge
- Business process management
- Technical and soft skills
- Problem solving and negotiating skills
40. What is a Feasibility study?
The requirements and problems of a business/project should be
studied and understood by a Business Data Analyst, and he or she
should set a scope for the business problem. Feasibility study is
identifying the possibility (success rate) of the proposed idea for a
business problem. It helps identify new opportunities and focus on the
project.
A career path in Data Analytics is very rewarding, if you need training
in your data science technical track career, check our contact us page to learn more about DataCatchup career track now.