Data Archiving Reference¶
Data Archiving Logic¶
The storage directory for data from the real-time message channel can be generated based on the data event time or system time while the storage directory for data from the offline message channel and real-time alert record can be generated based on system time only. The generated archive files will be synchronized to the specified storage system as per the storage path information. When setting the storage path, you can determine the file generation mode by choosing to generate the directory by event time or system time.
If you choose to generate the directory by the event time, the data content will be parsed and the data event time will be obtained. Next, the data in the same time partition will be written to the same file. Finally, the generated files will be synchronized to the corresponding directory.
If you choose to generate the directory by the system time, the data in the same time partition will be written to the same file as per the system timestamp and the generated files will be synchronized to the corresponding directory.
Storage Path Partition Parameters¶
After specifying the root directory for storing the archive files and the path generation method, you can select different time partition parameter formats. Four time partition parameter formats are supported currently.
Parameter Format |
Description |
Example |
---|---|---|
YYYYMMDD |
Generate directories by day. |
/bucketName/samplePath/20190101/ |
YYYYMMDD/HH |
Generate directories by day/hour. |
/bucketName/samplePath/20190101/00/ |
YYYY/MM/DD |
Generate directories by year/month/day. |
/bucketName/samplePath/2019/01/01/ |
YYYY/MM/DD/HH |
Generate directories by year/month/day/hour. |
/bucketName/samplePath/2019/01/01/00/ |
Archiving Cycle¶
Currently, the data archiving policy supports an archiving cycle of 1 hour, 12 housr, or 24 hours. When a data archiving cycle starts, the system starts reading data from the specified message channel. If the archived data falls in the same archiving cycle, the data will be saved in 1 file and sliced by the specified file size if it exceeds the limit.
However, if no data is cached in a data archiving cycle, no archive file will be generated.
Different archiving cycles mean different scheduled starting time to trigger the archiving jobs as well as different data range. For example:
Archiving Cycle |
Scheduled Job Starting Time |
Archived Data |
---|---|---|
1 hour |
00:00:00, 01:00:00, 02:00:00, …, 23:00:00 |
Taking 01:00:00 for example, data to be archived falls in [00:00:00, 01:00:00). |
12 hours |
00:00:00, 12:00:00 |
Taking 12:00:00 PM for example, data to be archived falls in [00:00:00, 12:00:00 PM). |
24 hours |
00:00:00 |
Taking 2019-01-02 00:00:00 for example, data to be archived falls in [2019-01-01 00:00:00, 2019-01-02 00:00:00) |
Note
The data archiving cycle adopts the system time stamp. If the system time stamp of data is within the current archiving cycle, the data will be archived as per the policy configuration and synchronized to the corresponding storage path.
Generation of Archived Files¶
In an archiving cycle, the archiving job will be triggered to generate the archive file only when the first data record arrives. If no data arrives in the archiving cycle, no file or path will be created.
When Real-time Message Channel and Generate path by event time are selected in the archiving policy, if the event time of the uploaded data is 1 hour later than the system time or 360 hours earlier than the system time, the archived files will be stored in a folder named archive_recycling_${filename}
under the specified root directory (in which filename
is the specified name for the archived file in the policy).
The columns of the archived file are as follows:
Field Name |
Field Description |
---|---|
orgId |
The organization ID. |
modelId |
The model ID. |
assetId |
The asset ID. |
measurepoints |
The measurement point name. |
timestamp |
The event time of the measurement point. |
value |
The value of the measurement point. |
quality |
The data quality. |