Office Location: Lower Manhattan, NY - Hybrid

Accrete AI is a dynamic and innovative company focused on transforming the future of artificial intelligence. Accrete specializes in the construction of intelligent knowledge engines, i.e., systems that semantically unify a variety of sources across multiple modalities, learn over time from end-users, and provide a flexible interface for AI agents to automate high-value operations.

Accrete’s knowledge engines are licensed across government and private sectors and span a range of applications including social media intelligence, supply chain risk intelligence and IT change risk management. Our knowledge engines use a proprietary AI Platform which enables the configuration of intelligent agents for knowledge management and decision automation.

Role Overview

We are seeking a Senior DevOps Engineer with 8+ years of experience to join our growing engineering team. In this role, you will be responsible for designing, implementing, and maintaining our cloud infrastructure, CI/CD pipelines, and monitoring systems. You will collaborate closely with Engineering, Security, and Operations teams to ensure high availability, scalability, and security of our AI-driven platforms.

Responsibilities

  • Design, deploy, and manage cloud infrastructure (AWS) to support AI/ML applications.
  • Develop and maintain CI/CD pipelines for automated software deployment and testing.
  • Implement monitoring, logging, and alerting solutions to ensure system reliability and performance.
  • Manage container orchestration tools like Kubernetes and Docker for efficient application deployment.
  • Automate infrastructure provisioning using tools such as Terraform, Ansible, or CloudFormation.
  • Ensure security best practices in cloud infrastructure, networking, and application deployment.
  • Optimize system performance, scalability, and cost-efficiency through proactive improvements.
  • Collaborate with software engineers and data scientists to enhance development workflows and infrastructure.
  • Troubleshoot production issues and implement robust incident response strategies.

Requirements

  • 8+ years of experience in DevOps, Site Reliability Engineering (SRE), or Cloud Engineering roles.
  • Expertise in AWS.
  • Strong experience with Kubernetes, Docker, and container orchestration.
  • Hands-on experience with Terraform or CloudFormation for infrastructure as code (IaC).
  • Proficiency in CI/CD tools like Jenkins, GitLab CI/CD, GitHub Actions, or ArgoCD.
  • Deep understanding of networking, security best practices, and cloud cost optimization.
  • Experience with monitoring and logging tools such as Prometheus, Grafana, ELK, or Datadog.
  • Proficiency in scripting and automation using Python, Bash, or Go.
  • Strong problem-solving skills with a proactive approach to infrastructure improvements.
  • Excellent collaboration and communication skills.

Preferred Qualifications

  • Experience with AI/ML workloads and infrastructure optimization for data-heavy applications.
  • Familiarity with service mesh technologies such as Istio or Linkerd.
  • Exposure to security frameworks such as SOC2, ISO 27001, or NIST compliance.
  • Certifications in AWS, Kubernetes (CKA/CKS), or Terraform.

Salary Range: 180k-200k

Benefits:

  • Comprehensive, competitive benefits: health, dental, vision, prescription, long- and short-term disability and life insurance, and a 401(k) (Traditional & Roth)
  • Flexible PTO & all U.S. federal holidays off
  • Daily catered lunch and a kitchen stocked with snacks and beverages
  • Company events include happy hours, team bonding, and more.

Accrete is an equal opportunity employer. We evaluate all applications without regard to sex, sexual orientation, race, color, religion, national origin, disability, protected Veteran status, age, or any other characteristic protected by law.