Senior Infrastructure Engineer
关于Toptal
Toptal is a global network of top talent in business, 设计, technology that enables companies to scale their teams, 随需应变. With $200+ million in annual revenue and team members based around the globe, Toptal是 world’s largest fully remote workforce.
我们将虚拟团队的最佳元素与鼓励创新的支持结构相结合, 社会互动, 和有趣的. 我们不分国界,快速发展,从不害怕打破常规.
工作总结:
我们正在寻找一名经验丰富的工程师,在我们的基础设施团队中构建和扩展云环境中的服务. Our Infrastructure Engineers work with a high-energy, 负责支持整个Toptal的计划和运营的快节奏团队.
This is a remote position. We do not offer visa sponsorship or assistance. 重新开始s and communication must be submitted in English.
职责:
以下信息旨在描述正在执行的工作的一般性质和级别. 它并不打算是所有职责、责任或所需技能的详尽清单.
- Toptal services are deployed across hundreds of servers. You will be responsible for 设计ing, 建筑, 部署, 以及维护基于Kubernetes的高可用性生产系统.
- 与开发团队协作,帮助他们简化部署过程, 可观察性, self-service capabilities.
- We are embracing DevOps practices, where the Infrastructure team 开发s 系统, 自动化, 工具, 工作流程和咨询/指导开发团队,使他们能够拥有他们正在制作的软件的整个生命周期.
- Implement monitoring for automated system health checks, 开发程序, 维护系统故障排除和维护文档.
- 定期与工程团队合作,改进公司的工程工具, 系统, 程序, 数据安全, not just administer clusters and cloud services.
- Join daily scrum standups (GMT-3 to GMT+5). Expect pair programming, engaging in peer code reviews, using collaboration tools like Slack and Zoom.
- 设计, 开发, 文档, 分析, 创建, test or modify computer or cloud based 系统 or programs.
In the first week, expect to:
- Join our boot camp team and begin onboarding into Toptal.
- 了解我们团队的流程,熟悉维护基础设施资源的代码.
In the first month, expect to:
- 深入了解我们的系统拓扑结构以及整个系统的结构.
- Understand our monitor 系统, alerting 系统, 安全.
- 参加团队会议,熟悉正在进行的项目和计划.
- Talk and meet with people from the operations squad.
In the first three months, expect to:
- 开始从事支持任务,以熟悉核心工具, 设置, everyday challenges.
- Exercising discretion and independent judgment, 通过有效的沟通和协作,了解并解决团队的需求和期望,同时了解我们的基础设施,为客户提供优质的服务.
- Deliver internal infrastructure and services such as monitoring, 日志记录, 自动化, data services targeted at our internal users.
- 支持CD管道和下一代基于kubernetes的基础设施平台的开发.
In the first six months, expect to:
- Support Infrastructure 设计, architecture, implementation.
- Have opportunities to be involved in 系统 设计, identify new technologies to support the business, 并在出现基础设施兼容性和性能问题时解决它们.
- 参与值班轮岗计划(工作时间和下班时间),为所有基础设施相关系统提供支持.
- Report any downtime or performance issues the system faces, investigate to determine what caused them, coordinate with other teams to resolve them.
- Handle incident resolution if a 开发er is not needed.
- Participate in our Disaster Recovery and incident analyses.
In the first year, expect to:
- Communicate with key partners on project engagements.
- 与我们的工程团队紧密合作,开发专注于可扩展性的基础设施自动化和管理解决方案, 可观察性, 自动化, 可靠性, 安全, quality in Google 云 Platform.
- 计划和协调变更、升级、补丁、新版本和新服务的测试.
- 参与技术计划,使开发人员能够以最小的摩擦和高质量向我们的客户交付他们的服务.
Qualifications and Job Requirements:
- 5+ years of experience with Kubernetes environments, including production operations, 故障排除, 调试, cluster provisioning, 和管理.
- 有通过代码管理基础设施配置和供应的经验, distributed 系统 on public cloud platforms (AWS, GCP).
- Solid understanding of Linux 调试, LAN and WAN networking, IP寻址, 负载平衡, vpn, 和路由.
- 对现代系统和服务相关的安全方法有深刻的理解.
- 具有系统和应用程序度量收集和警报服务(如Graphite)的实际经验, Grafana, 普罗米修斯, InfluxDB, 美国标准, 等. A keen focus on what makes a system observable.
- Proficient in scripting languages like Python, Bash, Ruby, 等.
- You have experience with continuous integration, deployment patterns, tools like Jenkins or Argo CD.
- 熟练使用Ansible、terraform和版本控制等工具部署自动化.
- 有使用Docker, Docker Compose和构建优化Docker文件的经验.
- Experience running RDBMS. PostgreSQL experience is an added advantage.
- Excellent 故障排除 skills. 具有通过各种故障排除协议和流程解决复杂问题的经验.
- 渴望帮助队友,与他们分享知识,并向他们学习.
- Outstanding written and verbal communication skills.
- Ability to work in a fast-paced, 快速成长的公司,能够处理各种各样的挑战, 最后期限, a diverse array of contacts.
- 你必须是一个世界级的个人贡献者才能在Toptal茁壮成长. You will not be here just to tell other people what to do.