Possible expired job

This job was posted 5 months ago and may be expired now. If that's the case, you can browse similar jobs here. Apologies for the inconvenience.

Senior Site Reliability Engineer, Scalability: Practices

Title: Senior Site Reliability Engineer, Scalability: Practices

Location: Remote, North America

The GitLab DevSecOps platform empowers 100,000+ organizations to deliver software faster and more efficiently. We are one of the world’s largest all-remote companies with 2,000+ team members and values that foster a culture where people embrace the belief that everyone can contribute. Learn more about Life at GitLab.

As a Site Reliability Engineer at GitLab, you are responsible for keeping all user-facing services and other GitLab production systems running smoothly. Our SREs are a blend of pragmatic operators and software craftspeople that apply sound engineering principles, operational discipline, and mature automation to our operating environments and the GitLab codebase.

GitLab.com is a unique site and it brings unique challenges it’s the biggest GitLab instance in existence. In fact, it’s one of the largest single-tenancy SaaS sites on the internet developed and run with transparency in mind. GitLab.com runs using the same tools we provide to GitLab customers running self-managed installations. The experience of our team feeds back into other engineering groups within the company, as well as self-managed customers.

SRE’s with Scalability specialization focus on how to scale both the application and the infrastructure supporting the application. This is the main difference between a Scalability SRE and other SRE’s at GitLab, is that your day to day will be spent looking at how to enable other engineers at GitLab to incorporate availability, reliability and performance considerations into their daily work. We do this by providing access to information and systems in a safe and sustainable way. We do this for our application and aim to extend this offering to our infrastructure and services. .

Some examples of projects you could work:

  • Working from the Scalability team’s issue tracker, driving changes required to scale GitLab at GitLab.com size.
  • Architectural Blueprints used to define how the component fits into a bigger picture
  • Developer Documentation and Guidelines used to ensure consistency between multiple components
  • Self-service deployment and management tools, used to ensure simpler maintenance of the component
  • Service Maturity Model, ensuring that the component has consistent interfaces necessary to operate a robust and reliable component
    • Observability and Forecasting Tools
    • Logging and Alerting Guidelines
    • Operational Runbooks

What you’ll do

  • Be on a PagerDuty rotation to respond to GitLab.com availability incidents and provide support for service engineers with customer incidents.
  • Analyze existing, create and maintain new GitLab.com Service Level Objectives.
  • Troubleshoot, evaluate and resolve operational challenges contributing to defined SLO’s.
  • Define, improve, and engage in adapting architectural application bottlenecks as observed on GitLab.com.
  • Work with other engineering stakeholders on resolving larger architectural bottlenecks and participate by offering GitLab.com point of view.
  • Work in close collaboration with software development teams to shape the future roadmap and establish strong operational readiness across teams.
  • Scale systems through automation, improving change velocity and reliability.
  • Leverage technical skills to partner with team members and be comfortable diving into a problem as needed.
  • Work with counterparts in other teams of the Infrastructure department to improve infrastructure running with Chef, Terraform and Kubernetes.
  • Make monitoring and alerting alert on symptoms and not on outages.
  • Document every action so your findings turn into repeatable actions and then into automation.
  • Debug production issues across services and levels of the stack.

What you’ll bring

  • Strong programming skills – Preferably with Ruby and/or Go.
  • Production experience with the Kubernetes ecosystem
  • SRE experience in running and operating distributed systems is a bonus
  • Are able to reason about large systems – how they work on large scale, edge cases, failure modes, behaviors.
  • Know your way around Linux and the Unix Shell.
  • Have experience in collaborating and communicating asynchronously.
  • Have an urge to document all the things so you don’t need to learn the same thing twice.
  • Have an enthusiastic, go-for-it attitude. When you see something broken, you can’t help but fix it.
  • Have a strong sense for action and know how to iterate through a problem quickly.
  • Share our values, and work in accordance with those values.
  • Have experience with Nginx, HAProxy, Docker, Terraform, or similar technologies.
  • A solid understanding of, and experience with, implementing and working with SLI/SLO
  • Are able to leverage GitLab as your day-to-day go-to tool.

About the team

The Practices team focuses on tools and frameworks that enable the stage groups to support their features on our production systems. These challenges exist in high-load, critical services without dedicated owners, shared architectures, and complex operational configurations. The expertise within the Practices team helps overcome these challenges by responding to technical needs, promoting uniform processes, and increasing engineering efficiency by eliminating toil.

How GitLab will support you

  • Benefits to support your health, finances, and well-being
  • All remote, asynchronous work environment
  • Flexible Paid Time Off
  • Team Member Resource Groups
  • Equity Compensation & Employee Stock Purchase Plan
  • Growth and development budget
  • Parental leave
  • Home office support

Please note that we welcome interest from candidates with varying levels of experience; many successful candidates do not meet every single requirement. Additionally, studies have shown that people from underrepresented groups are less likely to apply to a job unless they meet every single qualification. If you’re excited about this role, please apply and allow our recruiters to assess your application.

The base salary range for this role’s listed level is currently for residents of listed locations only. Grade level and salary ranges are determined through interviews and a review of education, experience, knowledge, skills, abilities of the applicant, equity with other team members, and alignment with market data. See more information on our benefits and equity. Sales roles are also eligible for incentive pay targeted at up to 100% of the offered base salary.

Colorado/Washington pay range

$124,300—$239,700 USD

California/New York/New Jersey pay range

$124,300—$266,400 USD

Country Hiring Guidelines: GitLab hires new team members in countries around the world. All of our roles are remote, however some roles may carry specific location-based eligibility requirements. Our Talent Acquisition team can help answer any questions about location after starting the recruiting process.

Privacy Policy: Please review our Recruitment Privacy Policy. Your privacy is important to us.

GitLab is proud to be an equal opportunity workplace and is an affirmative action employer. GitLab’s policies and practices relating to recruitment, employment, career development and advancement, promotion, and retirement are based solely on merit, regardless of race, color, religion, ancestry, sex (including pregnancy, lactation, sexual orientation, gender identity, or gender expression), national origin, age, citizenship, marital status, mental or physical disability, genetic information (including family medical history), discharge status from the military, protected veteran status (which includes disabled veterans, recently separated veterans, active duty wartime or campaign badge veterans, and Armed Forces service medal veterans), or any other basis protected by law. GitLab will not tolerate discrimination or harassment based on any of these characteristics. See also GitLab’s EEO Policy and EEO is the Law. If you have a disability or special need that requires accommodation, please let us know during the recruiting process.