Scaling Connected Device Management to 10 Million Concurrent Connections with EMQX

Executive Summary
A leading global technology manufacturer operating a fleet of 50 million devices needed a mission-critical MQTT infrastructure to support remote device management and technical support operations. Facing scalability limits with cloud-native IoT solutions and complexity challenges with alternative brokers, the organization deployed EMQX Enterprise on Kubernetes to handle 10 million concurrent connections with sub-second latency and QoS 2 guarantees. The result: unified device management across North America with plans to scale globally.
The Evolution of Connected Device Management
For technology manufacturers operating at a global scale, connected device fleets are now core infrastructure. Managing millions of endpoints in the field requires real-time bidirectional communication between devices and cloud backends to enable:
- Proactive device health monitoring through telemetry and diagnostic data
- Secure remote updates for firmware, system software, and security patches
- Technical support integration enabling rapid issue detection and resolution
- Device compliance verification across distributed user bases
The Challenge
This organization's device support strategy generated unprecedented scale requirements:
Scale and Complexity
- 10 million concurrent connections across a global device fleet
- Over 5 million messages daily with packet sizes up to 64KB
- Daily data throughput reaching 128GB across distributed endpoints
- Complex telemetry payloads (40KB+ per diagnostic agent scan) containing detailed hardware inventory and diagnostic information
Platform Limitations
The organization initially relied on 50+ public cloud IoT hub instances for cloud connectivity. While functional, this approach created operational complexity and limited flexibility. Previous evaluation of alternative brokers revealed scalability ceilings that couldn't support production requirements.
Protocol Requirements
Support operations demanded both QoS 0 and QoS 2 reliability guarantees, particularly critical for cloud-to-device commands requiring guaranteed delivery. The managed IoT hub's QoS 0-only limitation forced workarounds that added latency and complexity to update and remote-action workflows.
Infrastructure Constraints
The organization's Kubernetes-first strategy required:
- Isolated namespace deployments for production workloads
- Proven clustering for high availability and disaster recovery
- Seamless integration with existing Kafka data pipelines
- Multi-region expansion capability for APAC and EMEA operations
Why MQTT for Connected Device Management
MQTT emerged as the ideal foundation because it addresses all requirements that public cloud IoT hubs struggled to deliver at scale:
- Lightweight protocol minimizes bandwidth for bandwidth-constrained connections
- Bidirectional messaging enables both device-to-cloud telemetry and cloud-to-device commands in a single connection
- Flexible QoS levels (0, 1, 2) allow optimization for different message types without protocol switching
- Industry standardization ensures interoperability across hardware platforms and support systems
- Persistent sessions enable offline buffering, which is critical when devices experience temporary connectivity loss
The EMQX Solution: Scalable Architecture for Global Device Management
To address these challenges, the organization deployed EMQX Enterprise in a Kubernetes-native architecture designed for massive, global device fleets.
Cloud-Native Infrastructure for Global Scale
- Kubernetes-Native Management: Utilizing the EMQX Operator to automate cluster provisioning, scaling, and lifecycle management. This eliminates orchestration complexity while ensuring the system can handle 10 million concurrent connections with predictable performance.
- High Availability & Multi-Region Strategy: The architecture features active clusters with dedicated HA/DR failover. It is designed as Multi-Region Ready, leveraging Cluster Linking to support planned expansion into APAC and EMEA through cross-region message routing and regional failover.
Hardened Enterprise Security
- Hardware-Rooted Authentication: mTLS with hardware-backed certificates ensures every endpoint is authenticated at the silicon level, preventing device cloning and credential theft.
- Granular Access Control: Comprehensive RBAC and Topic-based ACLs ensure that devices can only interact with their specific management topics, preventing cross-tenant data leaks.
Real-Time Data Integration & Pipeline
- Real-Time Data Streaming: A pre-built Kafka Bridge streams device telemetry and support events directly into Kafka topics for analytics and customer portals.
- System Synchronization: Supports bi-directional syncing with on-premises databases for internal reporting and historical analysis, alongside a Rule Engine for dynamic message filtering and enrichment based on device profiles.
Reliability & Efficiency for Critical Operations
- Guaranteed Message Delivery: Full support for MQTT 5.0 and QoS 2 (Exactly Once) delivery, which is essential for the integrity of security patches and software updates.
- Durable Sessions: Automatic message buffering prevents data loss during network transitions, ensuring telemetry is captured even if a device temporarily loses its connection.
- Optimized Workload Distribution: Using Shared Subscriptions to load-balance command distribution across multiple backend support systems, preventing processing bottlenecks.
Results and Business Value
The migration to EMQX Enterprise transformed the organization’s device management from a fragmented infrastructure into a unified, high-performance backbone.
Operational Excellence
- Infrastructure Simplified: Consolidated 50+ disparate IoT Hub instances into a single, unified management plane.
- Global Readiness: Established a blueprint architecture for seamless expansion into APAC and EMEA markets without redesign.
- Carrier-Grade Reliability: Guaranteed 99.99% uptime with automated failover, meeting the highest service level agreements.
Technical Performance
- Zero-Loss Updates: Achieved exactly-once (QoS 2) delivery for critical patches, eliminating custom retry logic and edge-case failures.
- Sub-second latency: Direct MQTT connections outperform REST-based IoT hub alternatives
- Scalable data pipelines: Kafka integration enables real-time analytics on 128GB daily throughput without custom ETL
Cost Optimization
- TCO Reduction: Eliminated surprise billing from usage-based cloud models with predictable, transparent scaling costs.
- Lean Operations: Leveraged Kubernetes automation to slash manual monitoring and custom maintenance overhead.
Conclusion
This case demonstrates EMQX Enterprise's ability to serve as a global-scale mission-critical infrastructure for massive connected device fleets.
With QoS 2 reliability, horizontal scalability to 10 million concurrent connections, and seamless Kafka integration, EMQX helped this tech manufacturer consolidate infrastructure, boost reliability, and support worldwide growth while cutting operational complexity versus public cloud alternatives and proprietary brokers.