Machine Learning Engineer Interview Question #: Regression Definition
While the range of theoretical questions asked to machine learning engineers at technical interviews is very broad, there are a couple of types of questions that come up more frequently. One of them is to explain a certain machine learning concept in very simple terms, such as if we were explaining it to an executive with no technical background or to a child. Belvedere Trading is one of the companies that ask questions such as this to find the words to explain Regression to 8 years old.
Link to the question:
Even for beginner machine learning engineers, it should be clear that the regression is an analysis that we use to predict an unknown event, a continuous value, based on evidence that we’ve gathered in the past. But how to explain a concept like that to a child?
We can provide them with a simple example that can be easily followed like this:
- Let’s say that you want to go to school. You notice that it takes you only 1 minute to reach the school because it’s just across the street.
- Next, you ask your friend, John: how long does it take for him to reach the school? And he says: 20 minutes. When you ask why he answered: because his house is 5 km away from the school.
- Next, you ask your other friend, Andy, the same question. He answered: It takes him 40 minutes to reach the school. Now, if you guess that Andy’s house is more than 5 km away from school, then you’re doing regression analysis.
Machine Learning Project Checklist
Alexey: Speaking of this mock interview a while ago, I had a mock interview with Valerii, where Valerii interviewed me. The question was about designing a fraud detection system.
Valerii: Who couldve imagined that.
Alexey: Yeah. At this interview, you showed a machine learning project checklist. Can you talk a bit about that document? Whats in there and why is it helpful for designing ML systems?
Valerii: Back in the days of Facebook, a number of practitioners decided that there were many, many machine learning services. Probably, we need to write some comprehensive list of checks that we need to pass the service through. It’s actually a very good preparation guide for system design, because it covers exactly these points. Well, it’s very comprehensive, like a 16-page document. However, you could also go and find the book from O’Reilly, written by people from Google, about a nail design practice, or something like that.
Alexey: Using machine learning to design patterns?
Valerii: Yeah, something like that. So you see, to some extent, you might have these checklists you might just extend it to the whole book but it means the same. Again, model coupling/decoupling, A/B tests, features, losses, model times, online/offline, batch processing, whatever. If you know the basic points, then you go from A to B, from B to C, from C to D. Its the same for system design.
Machine Learning Engineer Interview Question #: Recommendation System
This machine learning engineer interview question has recently been asked during technical interviews at Meta and is a perfect example of what machine learning engineers may expect to solve. The candidate is given a list of Facebook friends and the list of Facebook pages that users follow. The task is to create a new recommendation system for Facebook. For each Facebook user, we should find pages that this user doesn’t follow, but at least one of their friends does. The output should include the user ID and the page ID that should be recommended to this user.
Link to the question:
As mentioned in the text of the question, to solve this problem, we should use two relatively simple datasets:
- users_friends has two columns and is a list of Facebook friends,
- users_pages also has two columns and is a list of Facebook pages that users follow
This is what the top rows of the users_friends dataset may look like. The first row, for instance, means that a user with ID 1 is a friend of a user with ID 2.
And this is what the top rows of the users_pages dataset may look like. We can find out, for example, that the user with ID 1 follows two pages with IDs 21 and 25.
friends_pages = users_friends.merge
You May Like: How To Perform In Interview
Ace Your Computer Vision Engineer Interview With A Job
Constant learning is the key to becoming a successful computer vision engineer as the field of artificial intelligence advances every day. Practicing practical machine learning and computer vision projects is the only way to ensure that you don’t fall behind the ML industry. ProjectPro helps you practice and grow your computer vision skills through solved end-to-end data science and machine learning projects that will give you an edge in your career as a computer vision engineer.
Most Watched Projects
Machine Learning Engineer Vs Data Scientist
I mentioned that people use these terms interchangeably. Its a mistake to do so because there is a difference between the two posts. In fact, the main work of Data scientists is more about building a good model where Machine Learning engineers tend to focus on the deployment of the model and how to ship it in the production environment.
Read Also: Online Bootcamp For Coding Interviews
Whats The Difference Between A Type I And Ii Error
This is the type of basic question that could trip someone up in an interview, just because the wording of your answer could be a bit confusing. A Type I error is of course a false positive when you think something has happened and it really hasnt while a Type II is a false negative, or a situation where something is happening and its missed.
Machine Learning Engineer Interview Question #: Words With Two Vowels
Aside from the questions concerning naive forecasting or building rule-based recommendation systems, the candidates for the machine learning engineering positions are sometimes asked to manipulate text. This type of question is especially common if the position deals with natural language processing. One example is this machine learning engineer interview question from Google, where we are being asked to find all words which contain exactly two vowels in any list in the table.
Link to the question:
The dataset is also rather simple and contains two columns. Eventually, each cell in the dataset, no matter in which column it is, contains a list of several words divided by a comma.
There can be any number of words in a list, and lists from two different cells may have different lengths. This is what this dataset may look like:
Even though both the task and the dataset appear simple at first glance, this question is actually quite difficult to solve in Python and requires a number of steps to obtain the required output:
Also Check: How To Prepare For Code Review Interview
General Interview Questions On Computer Vision
18) What purpose does grayscaling serve?
Grayscaling helps to reduce the dimension of the image and thus allows for reduced computation time and effort. Further, it reduces the complexity of models and functions required for various operations. Some functions like edge and contour detection and machine learning problems Optical Character Recognition perform better or are implemented for working only with grayscale images.
19) What color to grayscale conversion algorithm does OpenCV employ? What is the logic behind this?
The color to grayscale algorithm in OpenCV uses the formula Y=0.299*R+0.587*G+0.114*B. This makes it similar to the luminosity method, which averages the color intensity values weighting them in accordance to human perception of different colors, i.e., it accounts for the fact that humans perceive green more strongly than red, and red more strongly than blue, which is apparent from the weightage given to each color’s pixel intensity. Additionally, the OpenCV grayscaling algorithm takes into consideration the nonlinear operation used to encode images.
Get confident to build end-to-end projects.
Access to a curated library of 250+ end-to-end industry projects with solution code, videos and tech support.
20) What is translational equivariance? What brings about this property in Convolutional Neural Networks?
21) What is the basis of the popular EAST text detector?
22) What is the basis of the state-of-the-art object detection algorithm YOLO?
Who Goes Through An Ml System Design Interview
Alexey: Okay. Let’s talk about machine learning system design. This is a part of the interview process and you said you did a lot of interviews as the interviewer. I imagine also, when you were joining Facebook before that, you also had to take this interview. So can you tell us about that? What is machine learning system design, and why is it an important step in the interview process?
Valerii: Okay. Before doing that, let’s try to review who needs to go through a machine learning interview. First of all, if you’re applying to Facebook, Amazon, or Google, I think other big tech companies as well, because these three are the largest ones in terms of number of people working there and market cap. So if you’re applying for a data scientist position, what would you do? You’d write SQL code, work with metrics, and dashboards.
Valerii: If you expect that data scientists have some relations to machine learning in these companies, you are mistaken. People who do machine learning are called machine learning engineers. Right? And these people have to pass through the software engineer loop at Facebook, and some additional rounds of interviews. For machine learning, and again, for a software engineer, there are different stages, but there are, I would say, a couple of interviews that are very important in terms of assessing your level.
Alexey: Level five is like a Senior, right?
You May Like: What Are The Main Questions Asked In A Job Interview
Computer Vision Engineer Interview Questions On Deep Learning: Convolutional Neural Network
1) Explain with an example why the inputs in computer vision problems can get huge. Provide a solution to overcome this challenge.
Consider a 500×500 pixel RGB image fed to a fully connected neural network for which the first hidden layer has just 1000 hidden units. For this image, the number of input features will be 500*500*3=750,000, i.e. the input vector will be 750,000 dimensional. The weight matrix at the first hidden layer will therefore be a 1000×750,000 dimensional matrix which is huge in size for both computations as well as storage. We can use convolution operation, which is the basis of convolutional neural networks, in order to address this challenge.
2) What are the features likely to be detected by the initial layers of a neural network used for Computer Vision? How is this different from what is detected by the later layers of the neural network?
The earlier layers of the neural network detect simple features of an image, such as edges or corners. As we go deeper into the neural network, the features become increasingly complex, detecting shapes and patterns. The later layers of the neural network are capable of detecting complex patterns such as complete objects.
3) Consider a filter used for convolution.What edges will this filter extract from the input image?
Get Closer To Your Dream of Becoming a Data Scientist with 70+ SolvedEnd-to-End ML Projects
For the given problem, therefore, we will need to choose p=/2 =2.
Machine Learning Engineer Interview Question #: Naive Forecasting
This question was asked during technical machine learning interviews at Uber. This is a great example of how companies expect candidates to perform prediction without using a complicated machine learning model but rather by encoding a number of simple rules. In this way, the interviewer not only checks the candidateâs approach to prediction tasks but also verifies the ability to code, e.g., in Python.
Link to the question:
The task is to develop a naÃ¯ve forecast for a new metric called “distance per dollar”, defined as the in our dataset, and measure its accuracy. To develop this forecast, we are asked to sum “distance to travel” and “monetary cost” values at a monthly level before calculating “distance per dollar”. This value becomes the actual value for the current month. The next step is to populate the forecasted value for each month. This can be achieved simply by getting the previous month’s value in a separate column. Once we have actual and forecasted values, we should evaluate our model by calculating an error matrix called root mean squared error . RMSE is defined as sqrt). We are asked to report the RMSE rounded to the 2nd decimal spot.
To solve this task, we should use a table uber_request_logs that has 6 columns with various datatypes:
This is what the top rows of this dataset uber_request_logs may look like:
Machine Learning Engineer Interview Question #: Confusion Matrix
But the interview questions asked for machine learning engineer positions are not always strictly about the models. Quite often, you may encounter theoretical questions concerning the evaluation of predictions or statistics, such as this one from General Assembly. We are being asked why a confusion matrix is useful for evaluating the performance of a classifier.
Link to the question:
The reason why the confusion matrix is so frequently used when evaluating the performance of classifying models is that it gives us a summary of how good our classification model is in predicting the actual value of our target variable in the form of a table. It shows the true positive, false positive, true negative, and false negative values or ratios in a condensed and easy-to-read form.
But what you should remember about the machine learning engineer interview questions such as this one is that you can score extra points if, aside from just explaining the theory, you also include a drawing supporting what you are explaining. In the case of the confusion matrix, a drawing is very easy to make and quickly shows that you understand what you are explaining.
Machine Learning Engineer Interview Question #: Supervised And Unsupervised Machine Learning
While there exist types of questions that appear at technical interviews significantly more often, there are also a number of specific individual questions that are asked by interviewers more frequently than the others. One of such questions that a candidate may expect is to describe the difference between supervised and unsupervised machine learning, such as in the question that was asked by Rosetta.
Link to the question:
The answer is that supervised learning uses labeled input and output datasets, which give the model the instructions when training the algorithms. Meanwhile, unsupervised learning doesn’t use the labeled datasets, which means the models work independently to discover information hidden in the datasets.
However, in questions such as this one, even if not explicitly asked for this, itâs a good practice to list a few examples of algorithms from each set. For supervised learning, the examples can be Classification, Logistic Regression, Linear Regression or Support-Vector Machine. For unsupervised learning, the examples include K-Means Clustering, Hierarchical Clustering, Apriori Algorithm or Principal Component Analysis.
Recommended Reading: What Are Good Interview Questions
Machine Learning Engineer Interview Questions
Are you a machine learning engineer looking for a new job? Find out what kind of questions major tech companies ask the candidates for this very position!
Machine learning engineers can be thought of as specialized data scientists whose main task is to prepare models for deployment. In many cases, their goal is to predict values or labels given a large dataset. This may have an application in finances, medicine, meteorology, and countless other areas. Other tasks that machine learning engineers may expect to work on include creating intelligent recommendation systems or performing clustering on sets of unlabeled data.
Because of this heavy focus on machine learning models in their everyday work, the questions that the machine learning engineers are asked at technical interviews are different from other data-related positions and concern modeling a lot. Nevertheless, the candidates for this position should expect both simple coding tasks, usually to be solved using Python, R, or SQL, as well as purely theoretical questions. The latter regard typically the basics of the machine learning discipline, the definitions of well-known models or evaluation methods, but may also be about probability or statistics. Meanwhile, the coding challenges are not particularly complicated as they are meant to be solved within a couple of minutes and often concern performing an estimation based on a few simple rules.
What To Do After You Set A Goal
Alexey: Okay. So we do this, and then you also mentioned A/B tests. We define a metric, and then we say how exactly we are going to measure this metric. What do we do next?
Valerii: Let’s say we know what we would like to do. We know how we can try to optimize it in this way. What does that mean? That means that if my model improves, there is a high chance that my metric of interest will be better. Now, I need to think about the labels, but that’s obvious, right? Theres a proxy metric, you can say it’s a label. I will construct my labels. We know that you can say that labels are ys, now we need to think about access.
Valerii: What are the features? Okay, what features do we have? We have this, this, and that feature? They might make sense, right? We have x and y, now we need a model. What kind of model? We have a target, we have labels. What about the loss function? Can we just put in the loss function directly or not? Now lets come back to the features we have basic features do we think they interact with each other? Do we need to do some pre-processing? Okay, think about that. Now let’s say we can put the model, we have x, we have y, we can train it, right? So what happens here? Let’s do that.
Alexey: Perhaps if you cover all these parts during your system design interview, you’re already in quite a good position. Right?
Alexey: By crazy do you mean it outputs random stuff?
You May Like: What’s An Exit Interview