Why Would You Prune Your Tree
In the context of data science or AIML, pruning refers to the process of reducing redundant branches of a decision tree. Decision Trees are prone to overfitting, pruning the tree helps to reduce the size and minimizes the chances of overfitting. Pruning involves turning branches of a decision tree into leaf nodes and removing the leaf nodes from the original branch. It serves as a tool to perform the tradeoff.
Explain False Negative False Positive True Negative And True Positive With A Simple Example
True Positive : When the Machine Learning model correctly predicts the condition, it is said to have a True Positive value.
True Negative : When the Machine Learning model correctly predicts the negative condition or class, then it is said to have a True Negative value.
False Positive : When the Machine Learning model incorrectly predicts a negative class or condition, then it is said to have a False Positive value.
False Negative : When the Machine Learning model incorrectly predicts a positive class or condition, then it is said to have a False Negative value.
What Is A Confusion Matrix
Confusion matrix is used to explain a models performance and gives a summary of predictions of the classification problems. It assists in identifying the uncertainty between classes.
Confusion matrix gives the count of correct and incorrect values and error types. Accuracy of the model:
For example, consider the following confusion matrix. It consists of values as true positive, true negative, false positive, and false negative for a classification model. Now, the accuracy of the model can be calculated as follows:
So, in the example:
Accuracy = / = 0.78
This means that the models accuracy is 0.78, corresponding to its True Positive, True Negative, False Positive, and False Negative values.
Recommended Reading: Hotel Sales Coordinator Interview Questions
Deep Learning With Pytorch Cheat Sheet
The PyTorch official cheat sheet consists of commands and API for handling the data and building deep learning models. It is a straightforward API for experienced Pytroch users.
The cheat sheet consists of:
What Is The Difference Between A Parametric And A Non
A parametric model is a statistical model that makes assumptions about the data distribution. These assumptions allow the model to make predictions based on several parameters, making the model easier to interpret and faster to fit. Non-parametric models, on the other hand, make fewer assumptions about the data and do not require a fixed number of parameters. As a result, non-parametric models can be more flexible but may require more data to make accurate predictions.
Also Check: Soft Skill Interview Questions For Developers
What Is The Difference Between A Static And A Dynamic Programming Language
A static programming language is a programming language in which the type of a variable must be declared at compile time before the program is executed. On the other hand, a dynamic programming language is a programming language in which the type of a variable is determined at runtime based on the value it is assigned. Python is an example of a dynamic programming language, while Java is an example of a static programming language.
What Is Support Vector Machine In Machine Learning
SVM is a Machine Learning algorithm that is majorly used for classification. It is used on top of the high dimensionality of the characteristic vector.
The following is the code for SVM classifier:
# Introducing required librariesfrom sklearn import datasetsfrom sklearn.metrics import confusion_matrixfrom sklearn.model_selection import train_test_split# Stacking the Iris datasetiris = datasets.load_iris# A -> features and B -> labelA = iris.dataB = iris.target# Breaking A and B into train and test dataA_train, A_test, B_train, B_test = train_test_split# Training a linear SVM classifierfrom sklearn.svm import SVCsvm_model_linear = SVC.fitsvm_predictions = svm_model_linear.predict# Model accuracy for A_testaccuracy = svm_model_linear.score# Creating a confusion matrixcm = confusion_matrix
You May Like: How To Watch Meghan And Harry Oprah Interview
Explain How The Adaboost Model Works
Adaboost starts by building a forest of stumps or weak learners. A tree with just one node or one predictor and two leaves is called a stump. Each stump build by adaboost has different weights and the weight is determined by the number of errors it makes. Each tree generated in the forest gets a say in the final prediction, but the amount of say is determined by the weights each tree gets.
Who Has Designed These Interview Preparation Tools And Content
Lavanya holds a PhD in Machine Learning and a masters in Computer Graphics. She has worked for over 10 years with companies like Amazon, InMobi and Myntra. In addition, she has also done collaborative projects with ML teams at various companies like Xerox Research, NetApp and IBM. During her career she has interviewed over a 100 candidates. She has also done sevaral mock interviews for job aspirants and is trying to build tools to address common challenges candidates face during job interviews.
When I was trying to get into an ML role, even getting interviews was challenging, since my background was in Industrial Engineering. One session with MachineLearningInterview and I realized what mistakes I was making in my resume that were leading to bad interviews. It took me just 3 more interviews to crack a new job. Now Ive transformed into becoming a Data Scientist. Im definitely going to take their support when I change jobs next time.
In the meantime, here are some interview questions and answers to get started with your ML Interview Preparation! . You have access to more free content by subscribing to our mailing list.
How do you design a system that reads a natural language question and retrieves the closest FAQ answer?
There are multiple approaches for FAQ based question answering
Don’t Miss: Food Service Manager Interview Questions
Nailing Machine Learning Concepts
This work in progress repo is designed to help you get familiar with machine learning, both the concepts and its operationalization .
You can use this repo to prepare for machine learning and data science interviews. If you have interviews coming up then you may spend some time going through the topics and answers I provided. Make sure you understand all the topics and can explain them well.
Let’s get started!!!
- How Does a Model Learn?
- How to Reduce Loss?
- Gradient Descent vs. Stochastic Gradient Descent?
- Overfitting vs. Underfitting, Bias-Variance Tradeoff?
- Bayes Error Rate?
- How to Address Bias and Variance?
- Plotting Error/Learning Curves
- Data Splitting: Training, Validation, Test Sets?
- Feature Representation?
- Regularization, L1 & L2, Lasso & Ridge?
- Why Accuracy Can Be Misleading?
- True Positive, True Negative, False Positive, False Negative?
- Precision vs. Recall, and F1 Score?
- ROC Curve and AUC?
- How to Decide on Metrics
- Changing Validation Data and the Metric
- Mutal Information vs. Correlation Coefficient?
- CatBoost vs. XGBoost vs. LightGBM
- Setting up an End-To-End ML System in the Real World
- Ways to Deploy ML Models
- Ways to Speed Up Model Inference
- Data Collection
Explain The Difference Between Lasso And Ridge
Lasso and Ridge are the regularization techniques where we penalize the coefficients to find the optimum solution. In ridge, the penalty function is defined by the sum of the squares of the coefficients and for the Lasso, we penalize the sum of the absolute values of the coefficients. Another type of regularization method is ElasticNet, it is a hybrid penalizing function of both lasso and ridge.
You May Like: How To Prepare For Dba Interview
Q5 How Would You Predict Who Will Renew Their Subscription Next Month What Data Would You Need To Solve This What Analysis Would You Do Would You Build Predictive Models If So Which Algorithms
- Lets assume that were trying to predict renewal rate for Netflix subscription. So our problem statement is to predict which users will renew their subscription plan for the next month.
- Next, we must understand the data that is needed to solve this problem. In this case, we need to check the number of hours the channel is active for each household, the number of adults in the household, number of kids, which channels are streamed the most, how much time is spent on each channel, how much has the watch rate varied from last month, etc. Such data is needed to predict whether or not a person will continue the subscription for the upcoming month.
- After collecting this data, it is important that you find patterns and correlations. For example, we know that if a household has kids, then they are more likely to subscribe. Similarly, by studying the watch rate of the previous month, you can predict whether a person is still interested in a subscription. Such trends must be studied.
- The next step is analysis. For this kind of problem statement, you must use a classification algorithm that classifies customers into 2 groups:
- Customers who are likely to subscribe next month
- Customers who are not likely to subscribe next month
What Is The Difference Between Lasso And Ridge Regression
Lasso and Ridge regression are two popular regularization techniques that are used to avoid overfitting of data. These methods are used to penalize the coefficients to find the optimum solution and reduce complexity. The Lasso regression works by penalizing the sum of the absolute values of the coefficients. In Ridge or L2 regression, the penalty function is determined by the sum of the squares of the coefficients.
Looking forward to a successful career in AI and Machine learning. Enrol in our AI and ML Course in collaboration with Caltech University now.
Recommended Reading: Microservices Interview Questions For 15 Years Experience
How Long Does It Take To Become A Data Scientist
It does not take too long to become a Data Scientist. Once you complete the Data Science training with us, execute all the projects successfully, and meet all the requirements, you will receive an industry-recognized Data Science course completion certificate. Further, with the help of our placement team, who will prepare your resume and conduct mock interviews before your job interviews, you will be able to crack your interview and land a high-paying job as a Data Scientist.
Dont Mention Methods Youre Not Able To Explain
Example: Youâre explaining logistic regression and state that âweâre using logistic regression for binary classification problems. For a multi-class problem we would use a softmax regression.â In this scenario, you can expect the interviewer to ask: âcould you explain softmax regression?â
You May Like: How To Interview A Ux Designer
Whats A Fourier Transform
Fourier Transform is a mathematical technique that transforms any function of time to a function of frequency. Fourier transform is closely related to Fourier series. It takes any time-based pattern for input and calculates the overall cycle offset, rotation speed and strength for all possible cycles. Fourier transform is best applied to waveforms since it has functions of time and space. Once a Fourier transform applied on a waveform, it gets decomposed into a sinusoid.
What Is Meant By Correlation And Covariance
Correlation is a mathematical concept used in statistics and probability theory to measure, estimate, and compare data samples taken from different populations. In simpler terms, correlation helps in establishing a quantitative relationship between two variables.
Covariance is also a mathematical concept it is a simpler way to arrive at a correlation between two variables. Covariance basically helps in determining what change or affect does one variable has on another.
Recommended Reading: Software Engineer Interview Coding Questions
What Do You Mean By The Roc Curve
Receiver operating characteristics : ROC curve illustrates the diagnostic ability of a binary classifier. It is calculated/created by plotting True Positive against False Positive at various threshold settings. The performance metric of ROC curve is AUC . Higher the area under the curve, better the prediction power of the model.
Which Type Of Sampling Is Better For A Classification Model And Why
Ans. Stratified sampling is better in case of classification problems because it takes into account the balance of classes in train and test sets. The proportion of classes is maintained and hence the model performs better. In case of random sampling of data, the data is divided into two parts without taking into consideration the balance classes in the train and test sets. Hence some classes might be present only in tarin sets or validation sets. Hence the results of the resulting model are poor in this case.
Also Check: What To Know Before A Job Interview
Why Naive Bayes Is Naive
Regardless of its practical applications, particularly in text mining, we consider Naive Bayes naive because it makes superposition which is practically impossible to see in real-time data. We calculate the conditional probability in the form of the product of the separate probabilities of the components.
What Is Overfitting In Machine Learning And How Can It Be Avoided
Overfitting happens when a machine has an inadequate dataset and tries to learn from it. So, overfitting is inversely proportional to the amount of data.
For small databases, overfitting can be bypassed by the cross-validation method. In this approach, a dataset is divided into two sections. These two sections will comprise the testing and training dataset. To train a model, the training dataset is used, and for testing the model for new inputs, the testing dataset is used.This is how to avoid overfitting.
Also Check: Interview Questions And Answers For Food Service Worker
Q7 Explain False Negative False Positive True Negative And True Positive With A Simple Example
Lets consider a scenario of a fire emergency:
- True Positive: If the alarm goes on in case of a fire.Fire is positive and prediction made by the system is true.
- False Positive: If the alarm goes on, and there is no fire.System predicted fire to be positive which is a wrong prediction, hence the prediction is false.
- False Negative: If the alarm does not ring but there was a fire.System predicted fire to be negative which was false since there was fire.
- True Negative: If the alarm does not ring and there was no fire.The fire is negative and this prediction was true.
What Is The Difference Between Type1 And Type2 Errors
Type 1 error is classified as a false positive. I.e. This error claims that something has happened but the fact is nothing has happened. It is like a false fire alarm. The alarm rings but there is no fire.
Type 2 error is classified as a false negative. I.e. This error claims that nothing has happened but the fact is that actually, something happened at the instance.
The best way to differentiate a type 1 vs type 2 error is:
You May Like: How To Write An Exit Interview
Write Clearly Draw Charts And Introduce A Notation If Necessary
The interviewer will judge your scientific rigor.
Example: Youâre asked to write the binary cross entropy cost function. Instead of writing $\mathcal= -$, write $\mathcal = \frac \sum_^m \mathcal = – \frac \sum_^m $. In this fashion, youâll display your meticulous understanding of cost functions, their arguments, and how they differ from loss functions.
Explain The Difference Between Deep Learning Artificial Intelligence And Machine Learning
This question tests your knowledge in the field. The interviewer may want to know that you can explain the subtle differences between each concept to ensure that you have a strong grasp of foundational machine learning knowledge.
How to answer: Make it clear that you understand that machine learning is a subset of AI and that deep learning is a subset of machine learning by describing each. Rather than just stating the obvious, use examples in your response to show that you have total mastery of these important concepts.
Read more:Deep Learning vs. Machine Learning
Also Check: Junior Developer Technical Interview Questions
Machine Learning In Practice
Machine learning algorithms are only a very small part of using machine learning in practice as a data analyst or data scientist. In practice, the process often looks like:
It is not a one-shot process, it is a cycle. You need to run the loop until you get a result that you can use in practice. Also, the data can change, requiring a new loop.
Visualise Tasks You Might Be Expected To Carry Out And Practice How You Might Approach Them
It is common for interviewers to give a sample task they have done and ask how you would approach it. System design interviews are almost always set up this way. ASOS for example can ask the candidate to design a system for predicting a consumers likelihood of not returning to their website. Another common question is design a web crawler that gathers training samples for an NLP model.
Read Also: Python For Data Engineering Interview Questions