Site Reliability Engineer Questions About Experience And Background
Site reliability engineers perform an important role in software deployment, helping to improve processes between development and information technology teams. Employers often want to hire people for site reliability engineer positions who demonstrate they have the experience to function effectively as this liaison. Questions about your professional background can help hiring managers assess the level of experience you would bring to the position. Here are some questions employers may ask about your experience and background:
What do you enjoy or find appealing about being a site reliability engineer?
How do you communicate with other teams?
How much experience do you have in software development?
How familiar are you with systems administration?
What programming languages do you know?
What are some projects you’ve worked on in your other positions?
How familiar are you with the principles of networking?
What kind of experience do you have writing automation code for deployment?
How do you manage stressful situations at work?
What do you think is the most important responsibility of a site reliability engineer?
How Do I Get Good
Do lots of problems.
There’s no other way around it. You sign up for LeetCode and start working on problems. I’d recommend paying for premium – but it’s not necessary.
Start solving some problems. Use Python if you’re not a software engineer already. The problems that you’ll encounter in a phone screen are generally going to be between an easy and medium difficulty. I find it helpful to start out going through easy problems and learn some tricks and patterns before progressing to more difficult problems.
As you’re progressing through problems, work on them for 30-45 minutes. If you haven’t solved the problem by then, look at the answers. You’ll find the solutions along with an explanation beside the problem if you paid for premium, but you can also go look in the attached problem discussion for a solution. Often the implementations and explanations provided in the discussion section are easier to understand.
Read the answers, then go back and try to work through the logic of the solution by manually typing the code in.
Another approach for doing LeetCode is to use some of their paid tracks to progressively be instructed on an algorithm or problem solving technique. For technical phone screens it’s less common to need to solve a particularly difficult problem.
Either way, the best way to get more proficient at the problems is to:
Give A Definition Of Virtualization Containers And Kubernetes And Tell How The Three Relate To And Differ From Each Other
Bonus points if they start by talking about a bare metal server.
Virtualization installs a control layer on top of a set of bare metal servers to create a pool of resources from the combination of the physical resources of those servers. It then allows you to create “virtual machines” that have a varied combination of memory, storage, and processor resources according to need, each machine with its own operating system. Virtual machines can be created and destroyed quickly and easily.
Containers are similar, except they do not contain the base layer operating system. Instead the control layer provides the operating system access while also keeping the containers and their processes isolated from one another. Containers include software such as a microservice along with all of the software dependencies required to run that software. This provides isolation and flexibility.
Kubernetes adds an orchestration layer to containers, making the management of them, especially large systems, easier.
Recommended Reading: Servicenow Cmdb Interview Questions And Answers
Consider Your Answers To Common Interview Questions
While you wont be able to predict every question youll be asked in an interview, there are a few common questions you can plan answers for. You might also consider developing an elevator pitch that quickly describes who you are, what you do and what you want.
There are some jobs that may involve a test or evaluation during the interview process. For example, if you are interviewing for a computer programming, development or analytics role, you might also be asked to write or evaluate lines of code. It might be helpful to consult with colleagues in the industry for examples of tests theyve been given to prepare.
You should also prepare to discuss your salary expectations, just in case. If youre unsure about what salary is appropriate to ask for the position youre applying to, visit to get a free, personalized pay range based on your location, industry and experience.
Here are a few examples of common interview questions:
Why do you want to work here?
The best way to prepare for this question is to learn about the products, services, mission, history and culture of the company. In your answer, mention the aspects of the company that appeal to you and align with your career goals.
Example:Id love the opportunity to work with a company thats making a difference. Finding a company with a positive work environment and values that align with my own has remained a priority throughout my job search, and this company ranks at the top of the list.
Question : How Do You Go About Setting Slos And Slis And How Do You Make Adjustments When Necessary
Service Level Objectives and Service Level Indicators are foundational metrics for SREs. SLOs are the goals for a particular application SLIs are the actual measurement of performance against those goals.
Lachhman notes that the SRE function is often at the heart of defining and refining SLOs and SLIs oftentimes, developers dont necessarily know the norm or baseline for the applications they build and maintain, particularly if SRE is a relatively new dimension of the broader team.
Hiring managers should dig into how the candidate identifies and defines SLOs and SLIs if youre the candidate, you should be prepared to speak about how you approach these metrics. Moreover, make sure you can discuss a thoughtful process for reevaluating and optimizing those measurements over time.
Like any metric, they need to evolve, Lachhman says. Negotiating changes to SLO/SLI measurements is par for the course.
Read Also: Where To Take A Video Interview
Playing The Fair Game
I struggled a bit with the Skills section. I knew that everything listed on aresume is considered fair game to ask in an interview. Interviewers like to go deep on thetopics a candidate claims to know or have mastered. Given Googles hiring track recordchances are they have an expert interviewer for any possible topic. So it is in generalsafer to only claim skills that I was sufficiently proficient in. But I also wanted to showthat I am constantly learning and exploring new topics. I decided to add a footnote toindicate skills that I did not master yet. I found this to be a good compromise.
Sample Site Reliability Engineer Job Description
Do you enjoy working with a highly motivated and talented team to deliver mission critical software? is growing our Site Reliability Engineering team to help deploy, manage, troubleshoot, and enhance our complex cloud-based services for a wide variety of customers.
As a Site Reliability Engineer you will design and implement web applications and REST API services using a microservice-based infrastructure to replace our current monolith implementation. The new technology stack includes , , , , and . Your focus will be on maximizing system uptime. Team members all participate in an on-call rotation.
You will build innovative automated solutions and tools to help debug and resolve problems in production and prevent them from recurring. Further, you will proactively seek out system weaknesses and find ways to fix them before they cause production issues using monitoring data, watching trends, and using Chaos Engineering.
Read Also: Best Questions To Ask In Sales Interview
Perform Research On The Company And Role
Researching the company youre applying to is an important part of preparing for an interview. Not only will it help provide context for your interview conversations, but it will also help you when preparing thoughtful questions for your interviewers.
Researching the company and role as much as possible will give you an edge over the competition. Not only that, but fully preparing for an interview will help you remain calm so that you can be at your best. Here are a few things you should know before you walk into your interview:
Research the product or service
Even if the role isn’t directly related to the company’s product or service, you’re still looking to be part of the team. It’s important to learn all you can about the product or service the company produces and promotes. You don’t necessarily need to understand each and every detail, especially if it’s a technical product, and you’re interviewing for a non-technical position, but you should have a basic understanding of the main products or services the company offers.
If possible, request a sample of the product to familiarize yourself with the customers perspective. The more you can tell them about the product from both a company and customer standpoint, the better you’ll perform in your interview.
Research the role
Research the company culture
What Is A Linked List
It’s a data structure where each data element is a separate element in a list. Elements are connected using pointers. The list starts with a head, which is a reference to the first node in the list. The head is followed by nodes, which include a data element and a reference to the next data element. The final node, the tail, includes the data element and a reference to null, indicating the end of the list.
You May Like: How To Write An Interview Paper
Whats The Relationship Between Your Itops And Engineering Teams How Could That Relationship Improve
Because of SREs involvement in so many aspects of the engineering organization and business, its important that you can identify human bottlenecks in productivity. With this question, the interviewer is trying to determine how you would go about solving issues between cross-functional teams. Most of the time, its as simple as finding ways to improve the communication and visibility across different departments helping people find the information they need when they need it.
What Is Expected From A Site Reliability Engineer
A site reliability engineer excels at the production side of software. They are expected to ensure that software is delivered and deployed flawlessly. Additionally, SREs are responsible for availability, latency, performance, efficiency, change management, monitoring, emergency response, and capacity planning
The SRE model hinges on effective standardization and automation. Engineers are tasked with ideating and implementing methods to enhance and automate operational tasks, thus streamlining development and deployment processes.
Like system administrators, SREs must have some software development experience, but their primary strengths are network engineering, troubleshooting, deployment, configurations. They must also be effective multitaskers, as they must ensure multiple system components collaborate and deliver results consistently.
For greater clarity, letâs look at the average day of a site reliability engineer:
Bear in mind that due to its relatively recent origin, the SRE role is highly subjective when it comes to specific responsibilities. At some companies, SREs play a key role in software development and programming, while at others they might be expected to focus specifically on the operations side.
Read Also: How To Get Interviews For Your Book
How Can An Organization Improve Its Observability
Observability measures the output of a system to analyze the efficiency of its process, using tools like metrics, logs and tracing. In the software development life cycle, site reliability engineers usually take responsibility for observability and incident response. This question helps a hiring manager gauge your understanding of observability and how you could help their organization implement this approach. In your answer, try to provide details about the steps you would take to improve observability.
Example:”I believe there are three steps to improving observability within an organization. First, you should understand the data that applies to your observability goal. Next, use that data to determine a strategy for evaluating system performance. Finally, implement that observability strategy to gain insights into how it can improve the overall performance and processes of a development operations team.”
Follow Up: What Is An Sla An Sli
A service-level agreement is the uptime promise that we make to a customer. These are often legally-defined with penalties for missing the target availability. For this reason, SLAs are generally set using figures that are easier to meet than SLOs.
A service-level indicator is something you can measure with precision to help you think about, define, and determine whether you are meeting SLOs and SLAs. They are generally reported as the ratio between the number of good events divided by the total number of events. A simple example would be the number of successful HTTP requests / total HTTP requests. SLIs are frequently reported as a percentage with 0% meaning everything is broken and 100% meaning everything is working perfectly.
Its a data structure where each data element is a separate element in a list. Elements are connected using pointers. The list starts with a head, which is a reference to the first node in the list. The head is followed by nodes, which include a data element and a reference to the next data element. The final node, the tail, includes the data element and a reference to null, indicating the end of the list.
Queue, stack, heap, hash table, binary tree, etc.
Depending on your needs, this could be followed up with a question about data algorithms.
Recommended Reading: Product Management Interview Case Study
What Is A Typical Sre Job Description
Before looking at SRE interview questions, take a step back to look at the job description itself to ensure youâre capturing what the role entails. SREs are typically expected to communicate with different departments, including engineers and product owners, to create targets and measures based on business context. Service level objectives and Service level agreements are part of this job function, as thatâs how measures are translated into customer happiness. In addition, SREs need to work with teams to establish target levels of reliability and realistic measures for availability.
SREs will also need to structure error budgets that take into account risk, availability, and feature development to create more structure for teams to develop. Another core part of the role is to eliminate repetitive manual tasks and automate and standardize processes where possible to reduce manual work as part of their bid for efficiency. And, of course, throughout all of this, SREs need to feel comfortable writing and deploying code to improve system resiliency and infrastructure.
Not all SREs may have all of these skills. Having an SRE team where each member specializes in different functions is also a valid way to achieve reliability excellence.
Unix And Linux Internals
On top of the technical skills, the Site Reliability Manager interview track has to testfor management and leadership capabilities. Three out of my seven interviews were in themanagement realm leaving only four interview slots to assess me technically. Maybe that isthe reason why my troubleshooting interview also dug into UNIX and Linux internals. I heardthat for some roles in SRE a dedicated UNIX and Linux system internals interview isscheduled.
Since it had been a while since I had read or kernel code I was in for arefresh. Here are my starting points as good as I can remember:
Recommended Reading: Where To Buy A Women’s Suit For Interview
My Path To Site Reliability Management
On my way to space I am currently taking a little stop to help organizing the worlds information and doing my part in making ituniversally accessible and useful. Since my rocket building talents are limited Idevoted my energy to the wonderful challenge of Site Reliability Management . That is,empowering people on top of Site Reliability Engineering. Basically what I love combinedwith what I enjoy. Plus meetings.
Due to popular demand I decided to publish my lessons learned on the long journey fromalmost school drop-out to being a Site Reliability Manager at Google.
What Is Cloud Computing
Many companies store some of their services on cloud platforms, and a site reliability engineer may be part of the team that manages the cloud-based system. Hiring managers ask this question to evaluate your knowledge of cloud computing and experience with these platforms. In your answer, try to explain the definition of cloud computing and provide a few details on why companies may use a cloud platform. If you have experience with cloud computing, consider offering details on how you’ve used it.
Example:”Cloud computing provides remote access to services and resources, such as databases and networking, over the internet. It’s useful because you can access those services anywhere you have an internet connection, rather than through physical servers. Cloud computing can help IT teams share resources more quickly and reduce the costs associated with traditional infrastructure. It also can improve productivity because teams can share information with one another instantaneously. In my previous position, I helped one of our engineers with the process of migrating some of our technical infrastructures to the cloud.”
Recommended Reading: What Questions To Ask A Project Manager Interview
My Growing Island Of Knowledge
I figured that there were three components in my learning equation. The seemingly infiniteocean of available knowledge which is unknown to me. These were my unknown unknowns. Thenthere was my starting point of things I knew, my knowledge. Lets think of knowledge asan island somewhere in the ocean of unknowns. The coastline of this island is where I knewthere was more stuff to learn. Knowing that there are so many things I did not know aboutwas intimidating and demotivating.
Let me give you an example: I knew that multiple CPU architectures existed. I was quitefamiliar with an older Infineon chip and the x86 architecture. This was all comfortablysitting on my island of knowledge. Closer to the coastline were some facts I rememberedabout ARMv7. Something close to my island but I would need to extend the coastline a littlebit to claim that piece of land from the ocean of unknown. I also had a bookmark aboutOpenRISC, so while this architecture was pretty unclear to me I knew where to start to makeit part of my island as well.
The interesting part is what happened when I rapidly grew my island of knowledge. The moreI learned the less educated I felt. I learned about a new technology, a different approachto a well-understood problem, a crazy programming language leveraging unconventionalconcepts.