The Apache Foundation approves the Apache IoTDB project

Apache IoTDB was incubated in 2018 by the Apache Foundation and is the result of a research project at Tsinghua University in Beijing, China. The Open Source Foundation presents it as a solution for large IoT and IIoT projects.

Continuation of the article below

“The Internet of Things, especially the Internet of Things (IoT), has turned the world upside down with unimaginable amounts of data,” said Xiangdong Huang, Vice President of Apache IoTDB, who is pleased about the status of “Top Level” in a press release. “To date, relational and key value-based database solutions have struggled to meet the needs of IoT data management. Apache IoTDB is the missing link between current IoT data and IoT applications and redefines the way in which IoT data is managed both in the cloud and on the edge, ”he promises.

Apache IoTDB, a database for AIoT

IoT data requires that Apache IoTDB is a time series database developed in Java (which must be installed with Maven) that stores information with timestamp by column. The project managers promise economical lending and high compatibility with the Big Data Apache ecosystem, including Flink, Hadoop, Spark, Grafana, but also PLC4X, a universal protocol for PLCs or even Apache StreamPipes, a self-service analytics project. Dedicated to IIoT.

“Apache IoTDB is an open source project and an innovation in software technology designed to meet the needs of AIoT / Big Data applications,” said Professor Jianmin Wang, dean of the School of Software at Tsinghua University, who originally was decided to donate the project to ASF.

Thus, IoTDB can be connected to machine learning frameworks such as MatLab.

Like any time series database, Apache IoTDB offers “very fast” read and write access, “efficient” data compression technology, and a tree-based storage scheme (LSM) that can handle the complexity of the data. Metadata.

Project managers have added a query engine that includes fuzzy logic to facilitate metadata searches and a command line interface (CLI).

Data is extracted from the sensors via MQTT. It is possible to save and sync files in TSFile format in edge computing mode, or even merge them with a Hadoop or Spark computing platform, hosted in the cloud or not via native APIs or JDBC. In this way, machine learning models such as the detection of anomalies can be applied.

The database supports six data types (BOOLEAN, INT32, INT64, FLOAT, DOUBLE and TEXT). Data can be compressed using the Snappy method. The TSFile format itself is largely inspired by Parquet, a columnar format originally developed by Cloudera and Twitter for the Hadoop ecosystem.

The cost of storage would be significantly reduced and the managers assure that the compression is 15% more efficient than what InfluxDB offers.

Note that in July 2019, InfluxDB was twice as efficient at writing and reading, while IoTDB offered much better query performance according to a benchmark.

In contrast to KairosDB (Cassandra) or OpenTSDB (HBase), which are based on NOSQL databases, IoTDB uses an SQL query engine and its own storage engine. The engineers behind the project ensure that 32 MB and an ARM7 processor are sufficient to provide an embedded version of the TSFile module.

Ready to support up to 100,000 connected objects

Apache IoTDB is primarily a university project, as the press release proves. But it is also adopted in large groups. In a presentation of the database last year, XiangDong Huang presented its use by operators of the Shanghai Metro to monitor trains.

The previous system based on Cassandra and KairosDB enabled the acquisition of 3200 data points every 500 milliseconds per train for 144 trains. By connecting the KDB REST services to an IoTDB instance, the same number of data points or 414 billion data points per day was recorded for 300 trains every 200 milliseconds. The project fills a 1 TB hard drive per month.

According to the presentation by Jialin Qiao, a graduate student in software engineering, at ApacheCon 2020, the database would be used in a power plant and a tobacco factory in China.

“With the continuous growth of intelligent devices, the data generated by the machines is increasing day by day, presenting the storage process, query speed and storage space with extraordinary challenges,” says Dawei Liu, architect at AutoAI Inc, a subsidiary of NAVINFO, a Chinese company Company specializing in smart cars and a member of the Management Committee of the Apache IoTDB project.

“We tried different solutions and finally selected IoTDB as the central database for its high performance, openness to business and the active community. We built our Wecloud platform on the basis of Apache IoTDB, which has served BMW, Toyota and Great Wall Motors among others. “He adds in the press release.

Although it passed the top level within the Apache Foundation and the project seems to have proven itself in production, IoTDB is still in its infancy. Converted to version 0.10 (previously in 0.9.3), improvements are still required with regard to the LSM memory structure, RAM management (memory overhead) and the “data out of order” phenomenon. the fact that the timestamped data is not arriving in the correct order. Manual optimizations are currently required for this. Finally, the CPU resources must be configured correctly to ensure high parallelism. Jilian Qiao recommends not to process data from more than 100,000 devices connected to the TSDB.