Professional Summary
Senior Site Reliability Engineer with 15+ years at Fortune 500 companies including Apple, GoDaddy, and 20th Century Fox. Promoted to SRE Supervisor at GoDaddy in two years; led an enterprise migration from Icinga 2 to Prometheus,
Thanos, and Grafana across Domains, Registrar, and Investors while supervising five engineers and remaining hands-on. Relocating to Bucharest in February/March 2027 — seeking hybrid roles in Bucharest to integrate locally; open to
fully remote for the right opportunity outside the city. Deep experience in observability, OpenStack, AWS, Ansible, Terraform, Kubernetes, and Python, with a strong track record in sprint leadership, mentoring, performance reviews,
and bridging SRE and development teams. Available for immediate start.
Core Skills
- Observability: Prometheus, Thanos, Grafana, Datadog, Icinga
- Cloud Platforms: AWS, OpenStack, CloudStack
- Development: Python, Django, Golang, PHP, APIs
- Automation: Ansible, Chef, Terraform
- Containerization: Docker, Kubernetes
- Linux: RHEL, Ubuntu, CentOS
- CI/CD & Incident Response: Pipelines, On-call, SLO/SLI
- Team Leadership: Mentoring, Performance Reviews, Sprint Planning
Professional Experience
- Enhanced infrastructure monitoring for CloudStack by implementing Apple's internal monitoring tools, improving system visibility and incident response.
- Developed comprehensive documentation and automated onboarding processes for new engineers.
- Designed and implemented an end-to-end testing framework for cloud components.
Supervisor, Site Reliability Engineering (DRI)
July 2022 – June 2024
- Led enterprise migration of Domains, Registrar, and Investors monitoring from Icinga 2 to Prometheus, Thanos, and Grafana — secured senior management buy-in, built infrastructure on OpenStack with Ansible, and reduced alert fatigue while improving visibility.
- Supervised a team of 5 SREs supporting DRI infrastructure, conducting performance reviews, mentoring engineers, and driving promotions.
- Managed sprint planning, daily stand-ups, retros, and Jira workflow while bridging SRE and development teams.
Site Reliability Engineer II
July 2020 – July 2022
- Joined as a contractor and converted to full-time employee after 18 months.
- Improved infrastructure performance by analyzing system metrics and implementing optimization strategies, reducing response times by 35%.
- Automated repetitive tasks for the Production Engineering team using Python and Ansible, increasing team efficiency by 50%.
- Participated in 24/7 on-call rotation ensuring 99.95% uptime and rapid incident response within SLA targets.
- Developed automated testing environments for OpenStack infrastructure.
- Participated in Agile sprint planning and conducted code reviews for high-quality software delivery.
- Enhanced server reliability through infrastructure automation and monitoring solutions.
- Led migration of critical payment processing code from Python 2 to Python 3, ensuring compatibility before EOL while maintaining 99.99% service availability for fraud detection systems.
- Upgraded fleet of Ubuntu servers to newer LTS versions, implementing security patches and enhancements while ensuring strict PCI compliance for financial transaction processing.
- Established Python best practices through technical workshops and code reviews, mentoring junior engineers and improving team code quality by standardizing development patterns.
- Managed infrastructure hosted on AWS and in on-premises data centers using Chef and Ansible.
- Handled Linux system administration and engineering across hybrid environments.
- Collaborated with cross-functional teams and participated in on-call rotations to resolve incidents promptly.
- Managed vendor requests for Apache configurations, databases, S3 buckets, and Git repositories.
- Automated tasks using Python scripts for GitHub, AWS, and Splunk.
- Monitored infrastructure with Datadog, New Relic, Splunk, and AWS for optimal performance.
- Enforced quality standards for Mirantis OpenStack by reviewing product requirements and participating in design discussions.
- Designed and automated test cases, maintained CI infrastructure, and performed reliability and performance stress testing.
- Implemented benchmarking tools and supported the Fuel engineering team.
- Monitored thousands of servers in an OpenStack environment, rapidly responding to alerts and incidents.
- Coordinated with data centers on maintenance and repairs in a Linux environment.
- Collaborated with service teams to diagnose and resolve problems across the platform.
- Managed support queues, handling escalated tickets and server issues across thousands of CentOS servers with cPanel.
- Conducted software installations and content restoration for customers.
- Handled server reboots and maintenance through the data center management interface.
Military Experience
Deployed to Iraq (2010–2011) in support of Operation Iraqi Freedom and Operation New Dawn.
Decorations & Awards
- Combat Action Badge
- Army Commendation Medal
- Iraq Campaign Medal (Campaign Star)
- Global War on Terrorism Service Medal
- Armed Forces Reserve Medal (M Device)
- National Defense Service Medal
- Overseas Service Ribbon
Honorably discharged