Why did EMQX Tables choose GreptimeDB over traditional time-series databases?

EMQX Tables selected GreptimeDB due to its design optimized for modern IoT challenges: cost-effective object storage as primary storage, unified data model supporting metrics, logs, and traces, open-source distributed architecture, and compatibility with popular protocols like InfluxDB Line Protocol and PromQL. These features address scalability, cost, and observability needs better than traditional databases such as InfluxDB and TimescaleDB.

What are the main challenges that IoT data infrastructure faces today?

IoT data infrastructure faces three key challenges: massive data explosion with flat or limited budgets requiring cost-efficient storage; the need to unify metrics, logs, and traces for complete observability instead of managing multiple siloed systems; and elastic scaling to handle fluctuating data volumes without expensive upfront capacity planning.

How does GreptimeDB leverage object storage to reduce costs?

GreptimeDB is designed from the ground up to use object storage like AWS S3 as its primary storage layer, allowing for significantly lower storage costs compared to traditional SSD or block storage. This design enables affordable long-term data retention, multi-tier caching to mitigate latency, and lowers total cost of ownership for massive IoT datasets.

What advantages does GreptimeDB offer regarding observability data?

Unlike many traditional time-series databases that focus solely on metrics, GreptimeDB supports a unified data model that includes metrics, logs, and traces within the same engine. This integration simplifies troubleshooting by enabling single SQL queries to correlate diverse observability data, eliminating the need to operate and synchronize multiple systems.

How does GreptimeDB support scalability and deployment flexibility for IoT platforms?

GreptimeDB features an open-source distributed cluster architecture from day one, unlike competitors requiring enterprise licenses for clustering. This ensures no vendor lock-in and consistent architecture across open-source and managed deployments. It supports elastic scaling to accommodate fluctuating IoT data volumes, allowing users to scale on demand and pay only for resources used.

Why EMQX Tables Chose GreptimeDB: Rethinking Time-Series Databases for the IoT

EMQX Cloud Team

Dec 23, 2025Product

Why EMQX Tables Chose GreptimeDB: Rethinking Time-Series Databases for the IoT

EMQ recently launched EMQX Tables, a fully managed time-series database service built into EMQX Cloud. Among all the time-series databases available, EMQX Cloud chose GreptimeDB as the underlying engine.

Behind this decision lies a fundamental question: What kind of time-series database does IoT actually need?

The traditional answer is InfluxDB or TimescaleDB. Both are solid products that solved their era's problems. But IoT data infrastructure requirements are changing, and these changes are exactly what GreptimeDB was designed to address.

Three New Challenges in IoT Data

Challenge 1: Data Explosion, Flat Budgets

A decade ago, a mid-sized IoT project might have had a few thousand devices generating GB-scale data daily. Today, the same project might have hundreds of thousands of devices producing TB or even PB-scale data.

But infrastructure budgets haven't grown proportionally. Storage cost has become the primary factor in time-series database selection.

Most traditional time-series databases were designed in early cloud days, defaulting to local SSDs or block storage (like AWS EBS). GreptimeDB, by contrast, was designed from the ground up with object storage (like AWS S3) as its primary storage layer. Using AWS as an example:

Storage Type	Price ($/GB/month)	10TB Annual Cost
EBS gp3 (SSD)	$0.08	$9,600
S3 Standard	$0.023	$2,760
S3 Glacier	$0.004	$480

That's a 3–20x difference. At PB scale, this gap determines project viability.

Challenge 2: Beyond Metrics: Logs and Traces Too

Early IoT projects focused on device metrics, like temperature, pressure, and status codes. But modern IoT observability needs go further:

Metrics: Device operational status
Logs: Device events, error messages
Traces: Request chains, call relationships

The traditional solution: deploy multiple systems. InfluxDB for metrics, Elasticsearch for logs, Jaeger for traces. Three systems mean triple the operational cost, and data silos make troubleshooting painful. When a device fails, you're jumping between three systems, manually correlating timestamps.

Challenge 3: Elastic Scaling, Not Capacity Planning

IoT data volumes fluctuate significantly. Product launches, promotions, and seasonal factors cause data spikes.

Traditional databases require upfront capacity planning with reserved resources. Over-provision and you waste money; under-provision and services degrade. Modern IoT infrastructure needs to use databases like utilities: scale on demand, pay for what you use.

Why Traditional Solutions Struggle

InfluxDB: Great Start, Uncertain Direction

InfluxDB pioneered the time-series database space and defined industry standards (like Line Protocol). But its evolution has confused users:

Version fragmentation: v1, v2, and v3 are essentially three different products. v1 uses InfluxQL, v2 pushed Flux, v3 returns to SQL. Each upgrade is a migration project.
Flux's sunset: The Flux language that InfluxDB v2 championed has been officially deprecated. All investment in learning Flux and Flux code needs reevaluation.
Open-source ceiling: InfluxDB open-source only supports single-node deployment; clustering requires enterprise licensing. At large scale, this is a hard constraint.
Metrics only: InfluxDB focuses on time-series metrics and doesn't natively support logs or traces. For complete observability, you still need Elasticsearch and Jaeger.

InfluxDB still works for small-to-medium scale, metrics-only scenarios. But for IoT platforms needing long-term evolution, version uncertainty and feature limitations are risks.

TimescaleDB: PostgreSQL's Maturity and Limits

TimescaleDB builds on PostgreSQL, its biggest advantage. Mature ecosystem, familiar SQL, rich tooling. If your team knows PostgreSQL well, TimescaleDB's learning curve is nearly zero.

But PostgreSQL's architecture brings constraints:

Scalability requires Enterprise: Open-source is mainly vertical scaling. Distributed Hypertables are enterprise-only.
Cloud-native isn't native: PostgreSQL was designed for on-premise. It runs on cloud, but object storage support is limited; it can't fully leverage cloud storage cost advantages.
Single data model: Focused on time-series; logs and traces need other systems.

TimescaleDB works well for "PostgreSQL-first" teams, or where time-series is just a small application feature. But for IoT platforms where time-series is core, its architecture isn't optimal.

GreptimeDB's Design Choices

GreptimeDB is young. That's both a disadvantage (less mature ecosystem) and an advantage (designed from scratch for current needs).

Design Choice 1: Object Storage First

From day one, GreptimeDB uses object storage (S3/GCS/MinIO) as the primary storage layer, not a feature bolted on later. Data is stored in Apache Parquet columnar format, naturally suited for analytical queries.

Object storage latency is solved through multi-tier caching, write cache handles hot data, read cache accelerates historical queries, metadata cache optimizes query planning.

Beyond cost savings, object storage enables extended data retention. Traditional solutions with expensive storage often retain only months of data. With object storage, you can affordably keep years of historical data. This matters for:

Data analysis: Long-term trend analysis, seasonal pattern discovery
Compliance: Meeting regulatory data retention requirements
Incident investigation: Tracing historical failures, reproducing issue scenarios

If you can afford to store it, you can actually use it.

Storage architecture details coming in Part 2: "Cloud-Native Storage Engine."

Design Choice 2: Performance-to-Cost Ratio

Object storage is the foundation, but what users ultimately care about is the performance-to-cost ratio: how much performance can the same budget deliver?

In benchmark tests under identical hardware conditions, GreptimeDB shows significant advantages:

Write throughput: Multiple times higher than InfluxDB open-source
Query latency: Lower average response time, more stable P99 tail latency
Reliability: All queries complete successfully, no timeouts

Combined with storage costs (S3 $0.023/GB vs EBS $0.08/GB), GreptimeDB delivers more data, faster queries, and longer retention on the same budget.

We'll provide a complete performance benchmark report in Part 4 "The Secret to Faster IoT Queries," including detailed comparisons of write throughput, query latency, and resource consumption.

Design Choice 3: Unified Data Model

GreptimeDB isn't just a time-series database; it can also serve as an observability database. Metrics, Logs, and Traces are stored and queried in the same engine.

This isn't just about fewer systems to operate. It's about connecting the data. When a device fails, one SQL query retrieves metric spikes, related logs, and call traces, no manual piecing across three systems.

By contrast, InfluxDB only handles Metrics. TimescaleDB can store arbitrary data but lacks native optimization for logs and traces. GreptimeDB is designed from storage format to query engine for observability use cases.

Design Choice 4: Open-Source Means Distributed

This is a key differentiator from InfluxDB and TimescaleDB: the open-source version includes full distributed cluster support.

InfluxDB clustering requires Enterprise licensing. TimescaleDB's Distributed Hypertables are also enterprise-only. GreptimeDB's distributed architecture has been open-source from day one—Frontend, Datanode, and Metasrv components are all available under Apache 2.0.

This means:

No vendor lock-in: You have full control over your data and infrastructure, with flexibility to choose your deployment model based on business needs
Architectural consistency: Whether you choose open-source deployment or managed services (like GreptimeCloud or EMQX Tables), the underlying architecture is identical—migration is frictionless

Distributed architecture details coming in Part 3: "Supporting Million-Device Data Ingestion."

Design Choice 5: Protocol Compatibility and PromQL Support

Migration cost is a hidden cost in database selection. GreptimeDB supports:

InfluxDB Line Protocol: Existing InfluxDB/Telegraf pipelines work unchanged
Prometheus Remote Write: Seamless Prometheus ecosystem integration
OpenTelemetry (OTLP): Cloud-native observability standard
SQL + PromQL: Query languages covering both major camps

PromQL support deserves special mention. If your monitoring stack is built on Prometheus, existing Grafana dashboards and alert rules migrate directly to GreptimeDB, no rewrites needed. InfluxDB doesn't support PromQL, meaning migrating from the Prometheus ecosystem to InfluxDB requires rewriting all queries.

Migrating from InfluxDB? Keep using Line Protocol for writes while gradually moving queries from InfluxQL to SQL. Migration is incremental, not big-bang.

GreptimeDB also provides capabilities that make post-migration evolution easier:

Schemaless writes: No need to define table schemas upfront—tables and columns are created automatically on write, ideal for IoT scenarios with diverse device types
Range Query: Native SQL support for time-window aggregation, similar to PromQL's Range Selector, with clean unified syntax
Flow Engine: Built-in stream processing—define continuous aggregation with SQL, compute as data arrives
Trigger Alerting: Define alerting rules with SQL, send notifications via Webhook, compatible with Prometheus Alertmanager

Why EMQX + GreptimeDB

EMQX is a leading global MQTT Broker; GreptimeDB is a next-generation observability database. The combination logic is clear:

EMQX solves connectivity: Million-device MQTT access, message routing, and protocol conversion.
GreptimeDB solves storage: Efficient storage of massive time-series data, real-time queries, long-term archival.
EMQX Tables solves integration: Native MQTT integration, automatic schema inference, unified management console.

For IoT developers, the complete pipeline from device connection to data analysis runs on a single platform.

Series Preview

This is Part 1 of the "How GreptimeDB Powers EMQX Tables" series. Upcoming articles dive into technical details:

Part 2: Cloud-Native Storage Engine — Parquet format, multi-tier caching, compression strategies
Part 3: Distributed Architecture — Supporting million-device concurrent writes
Part 4: Query Optimizer — Complete performance benchmarks + the technical principles behind faster IoT queries

Looking for an IoT time-series solution? Try EMQX Tables for a quick experience, or contact our expert for a customized solution.

Watch the video to learn more about EMQX Tables: https://www.emqx.com/en/events/introducing-emqx-tables

Talk to an Expert