An e-commerce company with fragmented monitoring across 7 tools couldn't diagnose persistent performance issues. We implemented Datadog across infrastructure, APM, and network layers, uncovering a 1-second latency penalty on every request.
Industry
E-commerce
Location
Europe
Time
9.2025
A mid-size e-commerce company running on on-premise Kubernetes was experiencing persistent performance issues visible to end users. Response times varied wildly, from 200ms to over 3000ms for the same transactions, but the team couldn’t pinpoint the root cause.
Their monitoring was fragmented across seven separate tools with no correlation between them:
Key problems with this setup included aggressive sampling on the APM side (dropping critical performance data for cost reasons), no frontend instrumentation, no end-to-end transaction tracing from CloudFlare through to the backend, and partial ownership of monitoring systems by the infrastructure vendor, meaning historical data would be lost if the relationship ended.
The infrastructure itself was a Kubernetes cluster running on Proxmox across 3 physical servers in a colocation facility, with CloudFlare as the CDN and TLS termination layer.
We conducted an infrastructure audit covering network analysis, Kubernetes configuration, backend performance, storage, and the existing monitoring architecture. Based on the findings, we implemented Datadog as a unified observability platform:
Infrastructure Monitoring: Kubernetes API metrics, database performance (SQL and Redis), host-level metrics from the Proxmox hypervisor, and storage performance monitoring for both block storage (ZFS over LVM) and object storage (Minio).
APM (Application Performance Monitoring): Replaced Azure Application Insights with Datadog APM, providing full distributed tracing across the backend application without the aggressive sampling that was hiding performance issues.
Network Observability: End-to-end traffic analysis from the on-premise Kubernetes cluster, through the hypervisor network layer, across the Nginx load balancer, and out to CloudFlare. This was the critical missing piece, as no previous tool was watching the full network path.
This project demonstrates our cloud observability services and Datadog consulting capabilities applied to a complex on-premise environment.
Within hours of the initial Datadog deployment, the team gained visibility they had never had before. The first and most significant finding: a ~1000ms latency penalty on every single request between the colocation data centre and CloudFlare. A full second of network transit time that was invisible to all previous monitoring tools because none of them were watching that specific path.
This discovery reframed the entire performance conversation. The issue wasn’t application code, Kubernetes configuration, or database queries. It was a network routing problem between the colocation provider and CloudFlare that could only be seen with correlated end-to-end observability.
Additional findings from the audit included:
The engagement opened a broader conversation about the fundamental limitations of the on-premise setup for an e-commerce business that needs to scale rapidly for paid advertising campaigns. The client has since engaged with us on planning a migration to AWS using the AWS Migration Acceleration Program (MAP).
Datadog implementation provided end-to-end visibility within hours, immediately surfacing a 1000ms network latency issue invisible to the previous fragmented monitoring setup. The client has since engaged in planning a full cloud migration to AWS.