Staff Site Reliability Engineer – Observability

Title: Staff Site Reliability Engineer – Observability

Location: UT-Salt Lake City

Fastly helps people stay better connected with the things they love. Fastly’s edge cloud platform enables customers to create great digital experiences quickly, securely, and reliably by processing, serving, and securing our customers’ applications as close to their end-users as possible — at the edge of the Internet. The platform is designed to take advantage of the modern internet, to be programmable, and to support agile software development. Fastly’s customers include many of the world’s most prominent companies, including Vimeo, Pinterest, The New York Times, and GitHub.

We’re building a more trustworthy Internet. Come join us.

Fastly’s Observability team is looking for a Staff Site Reliability Engineer who is passionate about building, scaling, and automating our internal platforms to provide global visibility to the health and performance of our networks. You will be working alongside other engineering and support teams, to provide insights and recommendations on how we make our services and software stacks more observable. Your focus in logging, metrics, distributed tracing and monitoring will be vital in this role to help Fastly grow our observability platforms.

What You’ll Do:

  • Focus on improving and scaling our logging pipelines, telemetry collection, and monitoring systems
  • Improve the performance and reliability of the observability platform infrastructure
  • Create and instrument critical business metrics for insights and transparency
  • Collaborate with other Fastly engineers to implement solutions that deliver value for our internal customer teams
  • You’ll participate in incident reviews to build improved alerts for detection and potential proactive mitigations

What We’re Looking For:

  • Extensive experience scaling out Prometheus architecture i.e. you are not just a user of Prometheus but have actually built the underlying infrastructure
  • Comfortable working with tools like OpenTelemetry, Grafana, Loki, Tempo, and Mimir
  • Extensive experience working with Linux operating systems focusing on metric collection and instrumentation
  • Implementing and scaling observability pipelines using self-managed, on premises, and open source software
  • Experience developing automation, orchestrations, and writing infrastructure as code for platform management
  • Comfortable working with scripting and interpreted languages, and test driven development
  • Excellent communication and listening skills, as well as a high degree of emotional intelligence

We’ll be super impressed if you have experience in any of these:

  • Deep understanding of challenges with high cardinality, churn, data volumes to anticipate capacity needs
  • A track record of working across multiple cloud platforms and physical environments to provide global visibility
  • Experience working with Clickhouse for time series data
  • Development of metrics exporters for the Prometheus ecosystem

Work Hours:

  • This position will require you to be available during core business hours
  • You’ll participate in a on-call rotation to support platform availability

Work Locations & Travel Requirements:

This position is open to both hybrid and remote locations.

The preferred locations for this position are:

  • San Francisco, CA
  • Los Angeles, CA
  • Denver, CO
  • New York City, NY

Fastly currently embraces a largely hybrid model for most roles which allows employees flexibility to split their time between the office and home.

We are willing to consider remote candidates in US (Remote).

This position may require travel as required by your role or requested by your manager.


The estimated salary range for this position is $181,220 to $226,520.

Starting salary may vary based on permissible, non-discriminatory factors such as experience, skills, qualifications, and location.

This role may be eligible to participate in Fastly’s equity and discretionary bonus programs.


We care about you. Fastly works hard to create a positive environment for our employees, and we think your life outside of work is important too. We support our teams with great benefits that start on the first day of your employment with Fastly. Curious about our offerings?

We offer a comprehensive benefits package including medical, dental, and vision insurance. Family planning, mental health support along with Employee Assistance Program, Insurance (Life, Disability, and Accident), a Flexible Vacation policy and up to 18 days of accrued paid sick leave are there to help support our employees. We also offer 401(k) (including company match) and an Employee Stock Purchase Program. For 2024, we offer 10 paid local holidays, 11 paid company wellness days.