Wasoko is transforming communities across Africa by revolutionizing access to essential goods and services. By connecting small merchants to the digital economy, we fix inefficient supply chains and provide services previously unavailable to informal businesses. Wasoko aims to provide everything a retailer needs, no distributors, or banks necessary.
Thousands of retailers across Kenya, Tanzania, Uganda, Cote d’Ivoire, Senegal, Zambia and Rwanda use Wasoko's mobile ordering and delivery platform to receive the goods they need as quickly and cheaply as possible while also accessing growth financing for the first time. We’re looking to grow our team with highly talented and motivated employees who are excited to work in a fast-paced and dynamic startup environment.
Position: SRE Manager, reporting to Global Head of Engineering
The Site Reliability Engineering Manager at Wasoko fills the mission-critical role of ensuring that our complex, large-scale systems are healthy, monitored, automated, and designed to scale. As a manager on this team, you'll use your background as an operations technologist to work closely with operations, tech support and our development teams from the early stages of design all the way through identifying and resolving production issues. Additionally, you'll be responsible for people management for a team of 2-4 SREs, making sure they are fulfilled, productive, and have the opportunity to transform themselves.
Locations: Bangalore, India or Nairobi, Kenya
Duties & Responsibilities:
- Strong adaptive problem solving, program management skills around planning, execution, communication, risk management, and stakeholder management.
- Supervise a team of SREs, ensuring that production applications your team supports are stable, reliable, and well-documented.
- Build and maintain robust & sophisticated production system monitoring for quickly identifying issues preemptively before surfacing it through operations & applications.
- Work closely with tech support, operations, product, engineering managers and development teams to ensure that platforms are designed with scale and operability in mind. Interface with bugs, issues, tasks related to production performance & get it resolved through SRE team or engineering teams.
- Resolve all the issues within the committed SLA’s of each issue bucket (P0, P1, P2, P3 & P4 etc)
- Troubleshoot and debug complex issues in production applications
- Assist in the roll-out and deployment of new product features and installations to facilitate our rapid iteration and constant growth
- Develop tools to improve our ability to rapidly deploy and effectively monitor custom applications in a large-scale environment
- Be available anytime for escalations affecting your products; serve as the face of your team to other teams at Sokowatch.
- Function well in a fast-paced, rapidly-changing environment
- Communicate effectively with people at all levels of the organization
- Be a mentor on Agile/Scrum/Kanban, SRE, Product & Development processes.
- Excellent written and oral communication skills
- 10+ years prior experience in large company wide site reliability engineering and management.
- Hands-on experience in DevOps & using Google Cloud Platform (GCP), Docker, Kubernetes with proper metrics instrumentation in software components, to help facilitate real time and remote troubleshooting/performance monitoring.
- Strong B2B, B2C/B2B2C e-commerce domain knowledge.
- Bachelor or Masters degree in a quantitative field from a premier institute.
- Excellent problem solving, prototyping articulation & communication skills