What Is Principal Component Analysis
Principal Component Analysis or PCA is a multivariate statistical technique that is used for analyzing quantitative data. The objective of PCA is to reduce higher dimensional data to lower dimensions, remove noise, and extract crucial information such as features and attributes from large amounts of data.
Q2 Which Library Would You Prefer For Plotting In Python Language: Seaborn Or Matplotlib Or Bokeh
Python Libraries Machine Learning Interview Questions Edureka
It depends on the visualization youre trying to achieve. Each of these libraries is used for a specific purpose:
- Matplotlib: Used for basic plotting like bars, pies, lines, scatter plots, etc
- Seaborn: Is built on top of Matplotlib and Pandas to ease data plotting. It is used for statistical visualizations like creating heatmaps or showing the distribution of your data
- Bokeh: Used for interactive visualization. In case your data is too complex and you havent found any message in the data, then use Bokeh to create interactive visualizations that will allow your viewers to explore the data themselves
What Is The Difference Between Stochastic Gradient Descent And Gradient Descent
Gradient Descent and Stochastic Gradient Descent are the algorithms that find the set of parameters that will minimize a loss function.The difference is that in Gradient Descend, all training samples are evaluated for each set of parameters. While in Stochastic Gradient Descent only one training sample is evaluated for the set of parameters identified.
Don’t Miss: Interview Attire Womens
Dont Mention Methods Youre Not Able To Explain
Example: Youâre explaining logistic regression and state that âweâre using logistic regression for binary classification problems. For a multi-class problem we would use a softmax regression.â In this scenario, you can expect the interviewer to ask: âcould you explain softmax regression?â
What Is Naive In A Naive Bayes
The Naive Bayes method is a supervised learning algorithm, it is naive since it makes assumptions by applying Bayes theorem that all attributes are independent of each other.
Bayes theorem states the following relationship, given class variable y and dependent vector x1 through xn:
Using the naive conditional independence assumption that each xiis independent: for all I this relationship is simplified to:
P = P
Since, P is a constant given the input, we can use the following classification rule:
P = P ni=1PP and we can also use Maximum A Posteriori estimation to estimate Pand P the former is then the relative frequency of class yin the training set.
P P ni=1P
y = arg max Pni=1P
The different naive Bayes classifiers mainly differ by the assumptions they make regarding the distribution of P: can be Bernoulli, binomial, Gaussian, and so on.
Don’t Miss: Mailscoop Io
What Exactly Is Google Looking For
At the end of each interview, your interviewer will grade your performance using a standardized feedback form that summarizes the attributes Google looks for in a candidate. That form is constantly evolving, but we have listed the main components we know of at the time of writing this article below.
A) Questions asked
In the first section of the form, the interviewer fills in the questions they asked you. These questions are then shared with your future interviewers so you don’t get asked the same questions twice.
B) Attribute scoring
Each interviewer will assess you on the four main attributes Google looks for when hiring:
In this middle section, Google’s interviewers typically repeat the questions they asked you, document your answers in detail, and give you a score for each attribute .
C) Final recommendation
Q1 Describe How You Would Build A Model To Predict Uber Etas After A Rider Requests A Ride
Many times, this can be scoped down into a specific portion of the model building process. For instance, taking the example above, we could instead reword the problem to:
- How would you evaluate the predictions of an Uber ETA model?
- What features would you use to predict the Uber ETA for ride requests?
The main point of these case questions is to determine your knowledge of the full modeling lifecycle and how you would apply it to a business scenario.
We want to approach the case study with an understanding of what the machine learning and modeling lifecycle should look like from beginning to end, as well as creating a structured format to make sure were delivering a solution that explains our thought process thoroughly.
Read Also: Preparing For System Design Interview
Why Is The Machine Learning Trend Emerging So Fast
Machine Learning solves Real-World problems. Unlike the hard coding rule to solve the problem, machine learning algorithms learn from the data.
The learnings can later be used to predict the feature. It is paying off for early adopters.
A full 82% of enterprises adopting machine learning and Artificial Intelligence have gained a significant financial advantage from their investments.
According to Deloitte, companies have an impressive median ROI of 17%.
Tips And Tricks For Answering Machine Learning Interview Questions
The best way to prepare for the questions you’ll face in your machine learning interview is to rehearse them beforehand. You could enlist the help of a friend for this step, and they don’t even need to be as well-versed in machine learning as you are. For instance, you could have the questions with answers printed out for them on scrap sheets of paper, or they could quickly Google the answer to verify it’s correct.
If you’ve ever practiced for a speech or presentation, then you know the best way to prepare is to talk it out just like you would during the real deal. Doing so helps you collect your thoughts and ensures that you’re able to speak clearly and confidently during your interview. Remember, getting the answer right is only one piece of the pie â you also need to communicate effectively.
After you’ve practiced answering the machine learning questions above, there are two other tips you can employ to get ready for your interview. The first is to keep coding. Practice your machine learning skills by continuing to work on projects or by taking a machine learning course. There’s no better way to cement concepts into your mind than through application.
The second tip is to remember that not knowing the answer isn’t the end of the interview. You can be honest. Say, “I’m not sure the exact answer off the top of my head, but here’s how I’d find outâ¦”
Also Check: System Design Interview Preparation
What Is The Current State Of The Ai And Machine Learning Job Market
The AI job market is thriving and is estimated to hit $191 billion by 2024 globally. The growth of AI could create 58 million jobs across multiple industry sectors in the next few years, according to a report by the World Economic Forum. AI is already being used in several markets and it makes sense to upskill and embark on a career in AI.
Q7 Explain False Negative False Positive True Negative And True Positive With A Simple Example
Lets consider a scenario of a fire emergency:
- True Positive: If the alarm goes on in case of a fire.Fire is positive and prediction made by the system is true.
- False Positive: If the alarm goes on, and there is no fire.System predicted fire to be positive which is a wrong prediction, hence the prediction is false.
- False Negative: If the alarm does not ring but there was a fire.System predicted fire to be negative which was false since there was fire.
- True Negative: If the alarm does not ring and there was no fire.The fire is negative and this prediction was true.
You May Like: Questions To Ask The Cfo In An Interview
Q15 What Is The Difference Between Gini Impurity And Entropy In A Decision Tree
- Gini Impurity and Entropy are the metrics used for deciding how to split a Decision Tree.
- Gini measurement is the probability of a random sample being classified correctly if you randomly pick a label according to the distribution in the branch.
- Entropy is a measurement to calculate the lack of information. You calculate the Information Gain by making a split. This measure helps to reduce the uncertainty about the output label.
Q3 Lets Say That You Work At A Bank That Wants To Build A Model To Detect Fraud On The Platform
The bank wants to implement a text messaging service in addition that will text customers when the model detects a fraudulent transaction in order for the customer to approve or deny the transaction with a text response.
How would we build this model?
Need a hint? We know that since were working with fraud, there has to be a case where there either is a fraudulent transaction or there isnt.
We should summarize our findings by building out a binary classifier on an imbalanced dataset.
A few considerations we have to make are:
- How accurate is our data? Is all of the data labeled carefully? How much fraud are we not detecting if customers dont even know theyre being defrauded?
- What model works well on an imbalance dataset? Generally tree models come to mind.
- How much do we care about interpretability? Building a highly accurate model for our dataset may not be the best method if we dont learn anything from it. In the case that our customers are being comprised without us even knowing, then we run into the issue of building a model that we cant learn from and feature engineer for in the future.
- What are the costs of misclassification? If we look at precision versus recall, we can understand which metrics we care given the business problem at hand.
Also Check: Questions To Ask A Cfo During An Interview
Q6 What Do You Understand By Precision And Recall
Let me explain you this with an analogy:
- Imagine that, your girlfriend gave you a birthday surprise every year for the last 10 years. One day, your girlfriend asks you: Sweetie, do you remember all the birthday surprises from me?
- To stay on good terms with your girlfriend, you need to recall all the 10 events from your memory. Therefore, recall is the ratio of the number of events you can correctly recall, to the total number of events.
- If you can recall all 10 events correctly, then, your recall ratio is 1.0 and if you can recall 7 events correctly, your recall ratio is 0.7
However, you might be wrong in some answers.
- For example, lets assume that you took 15 guesses out of which 10 were correct and 5 were wrong. This means that you can recall all events but not so precisely
- Therefore, precision is the ratio of a number of events you can correctly recall, to the total number of events you can recall .
- From the above example , you get 100% recall but your precision is only 66.67%
Visualize Interview Tasks And Practice How You Might Approach Them
It is common for interviewers to give a sample task they have done and ask how you would approach it. System design interviews are almost always set up this way. ASOS for example can ask the candidate to design a system for predicting a consumers likelihood of not returning to their website. Another common question is design a web crawler that gathers training samples for an NLP model.
You May Like: Interview Attire Women
I What To Expect In The Machine Learning Algorithms Interview
The interviewer will try to uncover how deeply you understand machine learning algorithms. Hereâs a list of interview questions Workera candidates have been asked onsite:
- Derive the binary cross-entropy loss function.
- How does Logistic Regression differ from Linear Regression?
- What is the difference between Batch Gradient Descent and Stochastic Gradient Descent?
- Explain a classic machine learning algorithm, among the following list: Linear Regression, Logistic Regression, Decision Trees, Random Forest, XGBoost, Support Vector Machines, K-means, K-Nearest Neighbors, Neural Networks, Principal Component Analysis, Naive Bayes Classifier, L1/L2 regularization, etc.
- Why is the EM algorithm useful?
- Why is the Naive Bayes classifier called Naive?
- How does a discriminative model differ from a generative model?
- In K-Nearest Neighbors, how does the value of K impact bias and variance?
- In Support Vector Machines, what is the kernel trick?
What Happens Behind The Scenes
If things go well at your onsite interviews, here is what the final steps of the process look like:
- Interviewers submit feedback
- Senior leader and compensation committee review
- Final executive review
- You get an offer
After your onsite, your interviewers will all submit their feedback usually within two to three days. This feedback will then be reviewed by a hiring committee, along with your resume, internal referrals, and any past work you have submitted. At this stage, the hiring committee will make a recommendation on whether Google should hire you or not.
If the committee recommends that you get hired, you’ll usually start your team matching process. In other words, you’ll talk to hiring managers and one or several of them will need to be willing to take you in their team in order for you to get an offer from the company.
In parallel, the hiring committee recommendation will be reviewed and validated by a senior manager and a compensation committee who will decide how much money you are offered. Finally, if you are interviewing for a senior role, a senior Google executive will review a summary of your candidacy and compensation before the offer is sent to you.
As you’ve probably gathered by now, Google goes to great lengths to avoid hiring the wrong candidates. This hiring process with multiple levels of validations helps them scale their teams while maintaining a high caliber of employee. But it also means that the typical process can spread over multiple months.
You May Like: How To Prepare System Design Interview
Differentiate Between Regression And Classification
Regression and classification are categorized under the same umbrella of supervised machine learning. The main difference between them is that the output variable in the regression is numerical while that for classification is categorical .
Example: To predict the definite Temperature of a place is Regression problem whereas predicting whether the day will be Sunny cloudy or there will be rain is a case of classification.
What Is A Generative Model
The advanced machine learning interview questions deal with more specific topics in ML. Your interviewer wont always be an expert in generative models or any other preciseness thing. The most valuable ability is to explain complex things to laypersons, even if they arent. So, detail the application, what famous machine learning model is generative, and why.
Discriminative models and generative models vary fundamentally in that:
- Discriminative models learn the border between classes.
- The distribution of individual classes may be modeled using generative models.
Because they learn the specific boundaries between classes, SVMs and decision trees are discriminatory. The SVM is a maximum margin classifier, which implies that it retains a decision boundary that maximizes the distance between samples of the two classes, given a kernel. A flexible classifier may be created using the distance between a sample and the learned decision boundary. Recursively dividing the space to maximize information acquisition is how DTs learn the decision boundary .
You may conduct a generative kind of logistic regression in this manner. Note that you are not utilizing the whole generative model to make categorization judgments.
Discriminative models also often dont function for extreme value detection, whereas generative models frequently do. Of course, what is optimal for a particular application should be considered depending on the application.
Don’t Miss: Prepare For System Design Interview
What Is Artificial Intelligence Or Ai
AI is often used to describe machines that mimic “cognitive” functions that humans associate with the human mind, such as learning and problem-solving. AI has spread its wings across sectors and industries, including healthcare, finance, education, retail, eCommerce, and more, making it a lucrative field. And to grab the best opportunities in the field, a world-renowned certification acts as a career catalyst.
How Do You Make Sure Which Machine Learning Algorithm To Use
It completely depends on the dataset we have. If the data is discrete we use SVM. If the dataset is continuous we use linear regression.
So there is no specific way that lets us know which ML algorithm to use, it all depends on the exploratory data analysis .
EDA is like âinterviewingâ the dataset As part of our interview we do the following:
- Classify our variables as continuous, categorical, and so forth.
- Summarize our variables using descriptive statistics.
- Visualize our variables using charts.
Based on the above observations select one best-fit algorithm for a particular dataset.
Don’t Miss: What Should Females Wear To A Job Interview
Can You Explain The Difference Between Bias And Variance
The interviewer is likely testing your background knowledge on processes with this question.
Example:”Both are errors in the algorithms used, but bias differs from variance because it’s an error due to oversimplified assumptions in the algorithm. A variance error is due to too much complexity in the algorithm.”
Machine Learning Interview Questions: Programming
These machine learning interview questions test your knowledge of programming principles you need to implement machine learning principles in practice. Machine learning interview questions tend to be technical questions that test your logic and programming skills: this section focuses more on the latter.
Q26: How do you handle missing or corrupted data in a dataset?
Answer: You could find missing/corrupted data in a dataset and either drop those rows or columns, or decide to replace them with another value.
In Pandas, there are two very useful methods: isnull and dropna that will help you find columns of data with missing or corrupted data and drop those values. If you want to fill the invalid values with a placeholder value , you could use the fillna method.
More reading: Handling missing data
Q27: Do you have experience with Spark or big data tools for machine learning?
Answer: Youll want to get familiar with the meaning of big data for different companies and the different tools theyll want. Spark is the big data tool most in demand now, able to handle immense datasets with speed. Be honest if you dont have experience with the tools demanded, but also take a look at job descriptions and see what tools pop up: youll want to invest in familiarizing yourself with them.
More reading: 50 Top Open Source Tools for Big Data
Q28: Pick an algorithm. Write the pseudo-code for a parallel implementation.
More reading: Writing pseudocode for parallel programming
Read Also: Best Interview Attire For A Woman