Title: Senior Site Reliability Engineer
Location: Remote – USA
Clover is reinventing health insurance by working to keep people healthier.
We value diversity in backgrounds and in experiences. Healthcare is a universal concern, and we need people from all backgrounds and swaths of life to help build the future of healthcare. Clover’s engineering team is empathetic, caring, and supportive. We are deliberate and self-reflective about the engineering team and culture that we are building, looking for engineers that are not only strong in their own aptitudes but care deeply about supporting each other’s growth.
We are looking for someone who is experienced in site reliability and infrastructure to join our engineering team. You will start at the ground floor, taking responsibility to shape a team, processes, and norms to build out and manage Clover’s technology infrastructure. You will partner with technical leads in other engineering disciplines, as well as data scientists, and technology professionals to develop and maintain a modern, scalable infrastructure platform that hosts domestic and international work loads with a variety of compute, storage, and networking needs. We’re looking for someone with prior experience deploying and maintaining containerized infrastructure and workloads. Kubernetes competency is highly valued.
As a Senior Site Reliability Engineer, you will:
- Build systems for declarative application and infrastructure lifecycle management: continuous deployment, continuous integration, Kubernetes cluster management, service and workload inventory.
- Prioritize and help troubleshoot problems, downtime, and alerts.
- Contribute to setting the direction for the Site Reliability Engineering team, clearly establish goals that are aligned with Clover’s company-level goals.
- Foster a healthy, motivated, and inter-disciplinary culture that is the bedrock of high performing teams.
- Simplify the process by automating the delivery pipeline and database changes.
You will love this job if:
- You enjoy working in a fluid, collaborative environment, defining and taking ownership of priorities that add to our larger goals. You can bring clarity to ambiguity while remaining open-minded to new information that might change your mind.
- You are not hesitant to jump in to help fix things that are broken and you get a sense of accomplishment from making sustainable systems. You are happy to fill in the gaps to reach a goal where necessary, even if it does not always fit your job description.
- You want to be part of building a team that emphasizes delivery, reliability and security.
- You have a genuine interest in what good technology can do to help people and take pride in tackling hard problems in an important industry.
You should get in touch if:
- You have 5+ years of programming experience and are proficient in at least one of the following programming languages: Python, Go, or Shell Script.
- You have in-depth knowledge of containerization technology and orchestration, such as Docker, Containerd, and Kubernetes, as well as experience with CNCF-based technologies like Helm, gRPC, and Prometheus.
- You have experience with public cloud platforms such as GCP and/or AWS.
- You are knowledgeable in basic networking such as TCP/IP, UDP, firewall, routing, DNS, and load balancing.
- You have experience with Linux system administration and basic knowledge of Linux’s design.
- You understand the key concepts in SRE such as monitoring, performance tuning, and automation.
- You are able to work autonomously with limited guidance.
- You have excellent communication and collaboration skills, with the ability to work effectively with cross-functional teams and are able to adapt quickly to new challenges and technologies.
Pursuant to the San Francisco Fair Chance Ordinance, we will consider for employment qualified applicants with arrest and conviction records. We are an E-Verify company.