Miscellaneous Coding Interview Questions
Apart from data structurebased questions, most of the programming job interviews also ask algorithm, design, bit manipulation, and general logicbased questions, which Iâll describe in this section.
Itâs important that you practice these concepts because sometimes they become tricky to solve in the actual interview. Having practiced them before not only makes you familiar with them but also gives you more confidence in explaining the solution to the interviewer.
If you need more such coding questions you can take help from books like Cracking The Code Interview, by Gayle Laakmann McDowellwhich presents 189+ Programming questions and solution. A good book to prepare for programming job interviews in a short time.
By the way, the more questions you solve in practice, the better your preparation will be. So, if you think 50 is not enough and you need more, then check out these additional 50 programming questionsfor telephone interviews and these books and courses for a more thorough preparation.
What Is The Difference Between A Box Plot And A Histogram
The frequency of a certain features values is denoted visually by both box plots
and histograms.
Boxplots are more often used in comparing several datasets and compared to histograms, take less space and contain fewer details. Histograms are used to know and understand the probability distribution underlying a dataset.
The diagram above denotes a boxplot of a dataset.
What Can I Do To Find Out Whether This Data Science Program Is Good For Me
It’s always beneficial to learn new talents and broaden your knowledge. This Data Science program was created in collaboration with Purdue University and is an excellent combination of a worldrenowned curriculum and industryaligned training, making this postgraduate in data science a superb choice.
Don’t Miss: When To Send Thank You Email After Interview
Using The Sample Superstore Dataset Display The Top 5 And Bottom 5 Customers Based On Their Profit
 Drag Customer Name field on to Rows, and Profit on to Columns.
 Rightclick on the Customer Name column to create a set
 Give a name to the set and select the top tab to choose the top 5 customers by sum
 Similarly, create a set for the bottom five customers by sum
 Select both the sets, rightclick to create a combined set. Give a name to the set and choose All members in both sets.
 Drag top and bottom customers set on to Filters, and Profit field on to Colour to get the desired result.
How Would You Implement The Insertion Sort Algorithm
 We assume the first element in the array to be sorted. The second element is stored separately in the key. This sorts the first two elements. You can then take the third element and do a comparison with the ones on the left of it. This process will go on until a point where we sort the array.
int a =
for {
int n = m
while {
int k = a
a = a
a = k
Don’t Miss: How To Prepare For Software Developer Interview
Have Your Questions Ready
While itâs important to be thinking about the questions youâll have to answer, itâs also essential to have some questions ready that you will ask at the end of the interview.
Many overlook this, but it is an excellent way for you to find out more about the role and decide whether it is definitely for you and show your interest in the position and company. Some examples of questions include:
â¢ What is the metric on which my performance will be evaluated?
â¢ How will the projects I work on align with key business goals?
â¢ What are the top three reasons you like working here?
â¢ What are the most immediate projects that need to be addressed?
Read more:Questions to Ask at the End of an Interview
Given A List Of Timestamps In Sequential Order Return A List Of Lists Grouped By Week Using The First Timestamp As The Starting Point
This question sounds like it should be a SQL question, doesnt it? Weekly aggregation implies a form of GROUP BY in a regular SQL or Pandas question. In either case, aggregation on a dataset of this form by week would be pretty trivial.
But as a scripting question, this task is trying to pry out if the candidate is comfortable dealing with unstructured data, as data scientists may be forced to deal with a lot of unstructured data depending on their specific role or company.
In this function, we have to do a few things:
This Python question explores the concept of stemming, which is the heuristic of chopping off the end of a word to clean and bucket it into an easier feature set.
Input:
roots=sentence="the cattle was rattled by the battery"Output:
"the cat was rat by the bat"Read Also: Do I Need Another Interview To Renew Global Entry
Python Coding Interview Question #1: Business Name Lengths
The next question is by the City of San Francisco:
Find the number of words in each business name. Avoid counting special symbols as words . Output the business name and its count of words.
Link to the question:
When answering the question, you should first find only distinct businesses using the drop_duplicates function. Then use the replace function to replace all the special symbols with blank, so you dont count them later. Use the split function to split the text into a list, and then use the len function to count the number of words.
Questions On Product Sense And Business Applications
Hadoop Interview Questions and Answers  Big Data Interview Questions  Hadoop Tutorial  EdurekaThese questions are specific to the business and how you would use data science. Answering these questions well can demonstrate your ability to apply your data science knowledge to a business capacity, rather than just understanding theory. Questions will likely be particular to the role, but use the following as a guide:

“We are looking to improve a new feature for our product. What metrics would you track to make sure itâs a good idea?”

“If we were looking to grow X metric on X feature, how might we achieve that?”

“Tell me about a time you set about aligning data projects with company goals.”

“When measuring the impact of a search toolbar change, which metric would you use?”
Read Also: How To Interview With Google
Difference Between An Error And A Residual Error
The difference between a residual error and error are defined below –
Error 

The difference between the actual value and the predicted value is called an error. Some of the popular means of calculating data science errors are –

The difference between the arithmetic mean of a group of values and the observed group of values is called a residual error. 
An error is generally unobservable. 
A residual error can be represented using a graph. 
A residual error is used to show how the sample population data and the observed data differ from each other. 
An error is how actual population data and observed data differ from each other. 
Write A Function To Return A 5
More context. Lets say we have a fivebyfive matrix num_employees where each row is a company and each column represents a department. Each cell of the matrix displays the number of employees working in that particular department at each company.
To reconstruct the new array, loop through every cell in a department and divide by the total number of employees of the whole company, which is the sum of the whole row.
dictionary=input='c'Output:
closest_key> 'm'With this question, ask: Is your computed distance always positive? Negative values for distance will interfere with getting an accurate result.
Input:
string1='mississippi'string2='mossyistheapple'The idea is that we need to try every matching substring of string1 and string2.So, for example, if we have string1 = abbc, string2 = acc, we can take the first letter of string1, a, and look for a match in string2. Once we find one, we are left with the same problem with a smaller portion of the two strings. The remaining part of string1 will be bbc and string2 cc, and we repeat the process.
 In the second iteration, we dont find a match _b_bc with cc.
 In the third iteration, we dont find a match b_b_c with cc.
 Finally, we have a match bb_c_ with _c_c.
 We finished string1, and the result is ac.
Also Check: How To Best Prepare For A Phone Interview
Explore Our Popular Software Engineering Courses
SL. No 
View all Software Engineering Courses 
In order to write a code that will return the first nonrepeated letters, we can use LinkedHashMap to store the character count. This HashMap follows the order of the insertion and characters are initialised in the same position as in the string. The scanned string must be iterated using LinkedHashMap to choose the required entry with the value of 1.
Another way to approach this problem is by using firstNonRepeatingChar. This allows the nonrepeated character which appears first to be identified in a single pass. This approach used two storage to replace an interaction. This method stores nonrepeated and repeated characters separately and when the iteration ends, the required character is the first element in the list.
2. How can you remove duplicates from arrays?
First, you must use the LinkedHashSet to retain the original insertion order of the elements into the set. You must use loops or recursion functions to solve these kinds of coding interview questions.
The main factor that we must keep in mind when dealing with arrays is not the elements that have duplicates. The main problem here is removing the duplicates instead. Arrays are static data structures that are of fixed length, thus not possible to be altered. So, to delete elements from arrays, you need to create new arrays and duplicate the content into these new arrays.
Check out Cybersecurity course to upskill yourself and gain an edge.
 replace
 replace
 replaceFirst
+ Top Mcqs On Big Data And Answers
Multiple Choice Questions on BigData.
1. As companies move past the experimental phase with Hadoop, many cite the need for additional capabilities, including _______________a) Improved data storage and information retrievalb) Improved extract, transform and load features for data integrationc) Improved data warehousing functionalityd) Improved security, workload management, and SQL support
Answer: dClarification: Adding security to Hadoop is challenging because all the interactions do not follow the classic clientserver pattern.
2. Point out the correct statement.a) Hadoop do need specialized hardware to process the datab) Hadoop 2.0 allows live stream processing of realtime datac) In Hadoop programming framework output files are divided into lines or recordsd) None of the mentioned
Answer: bClarification: Hadoop batch processes data distributed over a number of computers ranging in 100s and 1000s.
3. According to analysts, for what can traditional IT systems provide a foundation when theyre integrated with big data technologies like Hadoop?a) Big data management and data miningb) Data warehousing and business intelligencec) Management of Hadoop clustersd) Collecting and storing unstructured data
Answer: aClarification: Data warehousing integrated with Hadoop would give a better understanding of data.
Answer: aClarification: To use Hive with HBase youll typically want to launch two clusters, one to run HBase and the other to run Hive.
Read Also: How To Ace Coding Interview
Q110 What Are The Variants Of Back Propagation

Stochastic Gradient Descent: We use only a single training example for calculation of gradient and update parameters.

Batch Gradient Descent: We calculate the gradient for the whole dataset and perform the update at each iteration.

Minibatch Gradient Descent: Its one of the most popular optimization algorithms. Its a variant of Stochastic Gradient Descent and here instead of single training example, minibatch of samples is used.
Tensorflow, Pytorch 
Binary Tree Coding Interview Questions
So far, we have looked at only the linear data structure, but all information in the real world cannot be represented in linear fashion, and thatâs where tree data structure helps.
Tree data structure is a data structure that allows you to store your data in a hierarchical fashion. Depending on how you store data, there are different types of trees, such as a binary tree, where each node has, at most, two child nodes.a
Along with its close cousin binary search tree, itâs also one of the most popular tree data structures. Therefore, you will find a lot of questions based on them, such as how to traverse them, count nodes, find depth, and check if they are balanced or not.
A key point to solving binary tree questions is a strong knowledge of theory, e.g. what is the size or depth of the binary tree, what is a leaf, and what is a node, as well as an understanding of the popular traversing algorithms, e.g. pre, post, and inorder traversal.
Here is a list of popular binary treebased coding questions from software engineer or developer job interviews:
You May Like: What Are The Most Common Job Interview Questions
Q26 What Is The Difference Between Point Estimates And Confidence Interval
Point Estimation gives us a particular value as an estimate of a population parameter. Method of Moments and Maximum Likelihood estimator methods are used to derive Point Estimators for population parameters.
A confidence interval gives us a range of values which is likely to contain the population parameter. The confidence interval is generally preferred, as it tells us how likely this interval is to contain the population parameter. This likeliness or probability is called Confidence Level or Confidence coefficient and represented by 1 alpha, where alpha is the level of significance.
Q27. What is the goal of A/B Testing?
It is a hypothesis testing for a randomized experiment with two variables A and B.
The goal of A/B Testing is to identify any changes to the web page to maximize or increase the outcome of interest. A/B testing is a fantastic method for figuring out the best online promotional and marketing strategies for your business. It can be used to test everything from website copy to sales emails to search ads
An example of this could be identifying the clickthrough rate for a banner ad.
Q28. What is pvalue?
When you perform a hypothesis test in statistics, a pvalue can help you determine the strength of your results. pvalue is a number between 0 and 1. Based on the value it will denote the strength of the results. The claim which is on trial is called the Null Hypothesis.
Probability of not seeing any shooting star in 15 minutes is
= 1 P= 1 0.2 = 0.8
Explain The Difference Between Structured Data And Unstructured Data
Data engineers must turn unstructured data into structured data for data analysis using different methods for transformation. First, you can explain the difference between the two.
Structured data is made up of welldefined data types with patterns that make them easily searchable, whereas unstructured data is a bundle of files in various formats, such as videos, photos, texts, audio, and more.
Unstructured data exists in unmanaged file structures, so engineers collect, manage, and store it in database management systems turning it into structured data that is searchable. Unstructured data might be inputted through manual entry or batch processing with coding, so ELT is the tool used to transform and integrate data into a cloudbased data warehouse.
Second, you can share a situation in which you transformed data into a structured format, drawing from learning projects if youâre lacking professional experience.
Recommended Reading: How Do I Ace An Interview
Difference Between Normalisation And Standardization
Standardization 
Normalization 




X = / Here, Xmin – features minimum value, Xmax – features maximum value. 
X = / 
What Tools Did You Use On The Project
What theyâre really asking: How did you arrive at your decision to use certain tools?
Data engineers must manage huge swaths of data, so they need to use the right tools and technologies to gather and prepare it all. If you have experience using different tools such as Hadoop, MongoDB, and Kafka, youâll want to explain which one you used for that particular project.
You can go into detail about the ETL systems you used to move data from databases into a data warehouse, such as Stitch, Alooma, Xplenty, and Talend. Some tools work better for backend, so if you can communicate strong decisionmaking abilities, then youâll shine as a candidate whoâs confident in their skills.
The interviewer might also ask:

What are your favorite tools to use, and why?

Compare and contrast two or three tools that you used on a recent project.
Don’t Miss: How To Reject A Good Candidate After Interview