Daniel Shafer — Senior Site Reliability Engineer

Professional Summary

Senior Site Reliability Engineer with 15+ years at Fortune 500 companies including Apple, GoDaddy, and 20th Century Fox. Promoted to SRE Supervisor at GoDaddy in two years; led an enterprise migration from Icinga 2 to Prometheus, Thanos, and Grafana across Domains, Registrar, and Investors while supervising five engineers and remaining hands-on. Relocating to Bucharest, Romania is my primary goal — seeking hybrid roles there to integrate locally and build long-term roots in the city. Flexible on move date and available to relocate sooner for the right opportunity. Open to fully remote roles based outside Bucharest when the fit is strong. Deep experience in observability, OpenStack, AWS, Ansible, Terraform, Kubernetes, and Python, with a strong track record in sprint leadership, mentoring, performance reviews, and bridging SRE and development teams. Available for immediate start.

Recognition

Featured in GoDaddy’s Pride and Dedication: Meet Daniel Shafer (GoDaddy Life, 2023)

Core Skills

Observability: Prometheus, Thanos, Grafana, Datadog, Icinga
Cloud Platforms: AWS, OpenStack, CloudStack
Development: Python, Django, Golang, PHP, APIs
Automation: Ansible, Chef, Terraform
Containerization: Docker, Kubernetes
Linux: RHEL, Ubuntu, CentOS
CI/CD & Incident Response: Pipelines, On-call, SLO/SLI
Team Leadership: Mentoring, Performance Reviews, Sprint Planning

Professional Experience

Apple (ASE Cloud Compute Team)

Site Reliability Engineer (Contract)

Oct 2024 – Mar 2025

Enhanced infrastructure monitoring for CloudStack by implementing Apple's internal monitoring tools, improving system visibility and incident response.
Developed comprehensive documentation and automated onboarding processes for new engineers.
Designed and implemented an end-to-end testing framework for cloud components.

GoDaddy

Supervisor, Site Reliability Engineering (DRI)

July 2022 – June 2024

Led enterprise migration of Domains, Registrar, and Investors monitoring from Icinga 2 to Prometheus, Thanos, and Grafana — secured senior management buy-in, built infrastructure on OpenStack with Ansible, and reduced alert fatigue while improving visibility.
Supervised a team of 5 SREs supporting DRI infrastructure, conducting performance reviews, mentoring engineers, and driving promotions.
Managed sprint planning, daily stand-ups, retros, and Jira workflow while bridging SRE and development teams.

Site Reliability Engineer II

July 2020 – July 2022

Joined as a contractor and converted to full-time employee after 18 months.
Improved infrastructure performance by analyzing system metrics and implementing optimization strategies, reducing response times by 35%.
Automated repetitive tasks for the Production Engineering team using Python and Ansible, increasing team efficiency by 50%.
Participated in 24/7 on-call rotation ensuring 99.95% uptime and rapid incident response within SLA targets.

A10 Networks Inc

Python Developer

Jan 2019 – Dec 2019

Developed automated testing environments for OpenStack infrastructure.
Participated in Agile sprint planning and conducted code reviews for high-quality software delivery.
Enhanced server reliability through infrastructure automation and monitoring solutions.

Kount

Site Reliability Engineer

Apr 2018 – Oct 2018

Led migration of critical payment processing code from Python 2 to Python 3, ensuring compatibility before EOL while maintaining 99.99% service availability for fraud detection systems.
Upgraded fleet of Ubuntu servers to newer LTS versions, implementing security patches and enhancements while ensuring strict PCI compliance for financial transaction processing.
Established Python best practices through technical workshops and code reviews, mentoring junior engineers and improving team code quality by standardizing development patterns.

MediaMath

Site Reliability Engineer

May 2017 – Apr 2018

Managed infrastructure hosted on AWS and in on-premises data centers using Chef and Ansible.
Handled Linux system administration and engineering across hybrid environments.
Collaborated with cross-functional teams and participated in on-call rotations to resolve incidents promptly.

Twentieth Century Fox

DevOps Engineer

Jul 2015 – Feb 2017

Managed vendor requests for Apache configurations, databases, S3 buckets, and Git repositories.
Automated tasks using Python scripts for GitHub, AWS, and Splunk.
Monitored infrastructure with Datadog, New Relic, Splunk, and AWS for optimal performance.

Mirantis

OpenStack Engineer

Feb 2015 – Jun 2015

Enforced quality standards for Mirantis OpenStack by reviewing product requirements and participating in design discussions.
Designed and automated test cases, maintained CI infrastructure, and performed reliability and performance stress testing.
Implemented benchmarking tools and supported the Fuel engineering team.

HP Helion

Cloud Data Center NOC Engineer

Feb 2014 – Jan 2015

Monitored thousands of servers in an OpenStack environment, rapidly responding to alerts and incidents.
Coordinated with data centers on maintenance and repairs in a Linux environment.
Collaborated with service teams to diagnose and resolve problems across the platform.

HostGator

Linux System Administrator

Apr 2013 – Jan 2014

Managed support queues, handling escalated tickets and server issues across thousands of CentOS servers with cPanel.
Conducted software installations and content restoration for customers.
Handled server reboots and maintenance through the data center management interface.

Military Experience

United States Army

Combat Engineer (12B), Wheeled Mechanic (91B)

Jan 2008 – Oct 2012

Deployed to Iraq (2010–2011) in support of Operation Iraqi Freedom and Operation New Dawn.

Decorations & Awards

Combat Action Badge
Army Commendation Medal

Iraq Campaign Medal (Campaign Star)
Global War on Terrorism Service Medal
Armed Forces Reserve Medal (M Device)
National Defense Service Medal
Overseas Service Ribbon

Honorably discharged