Category: 4 SPL Big Data Computing
SPL Practices: Parse and Filter Multilevel RESTful JSON
Background It is convenient to exchange data via RESTful, but there is a little troublesome about how to compute the received data. SPL offers HTTP interface to directly read RESTful data and compute it. As the following example shows, in the ecommerce business, orders data access is encapsulated as the REST interface to be accessed by other business systems. Find the example in the following website:http://111.198.29.168:8503/getOrders http://111.198.29.168:8503/getOrders/bDate/eDate Where bDate and eDate are path parameters, which can be formed directly in URL; the parameters specify the beginning date and the ending date for the query. Below is the JSON-format data returned
Routine for real-time data appending – avoid small tables by means of memory
Background and method Refer to the following article:Routine for real-time data appending In the above article we adopt the method of storing real-time data in multiple layers of zone tables (hereinafter referred to as ‘table’ unless otherwise specified), and use small table with shorter time interval to meet the scenarios where data needs to be frequently appended in order to ensure that new data can be written out quickly and timely. However, this method will result in more files, making it more difficult to manage. To solve this problem, we design an optimization method in this routine based on the
Routine for real-time data updating
Background and method This routine has similar applicable scenarios as “Routine for real-time data appending”, except that the data needs to be updated. This routine is applicable to the following scenarios: the real-time requirement for data maintenance is very high, the cycle period for updating data is short, and data may be updated at any time; the data need to be stored in multiple zone tables (hereinafter referred to as ‘table’ unless otherwise specified) of a multi-zone composite table in layers; only the update mode is supported. Key differences from append routine: Definitions and concepts Key differences from append routine:
Routine for real-time data appending
Routine for real-time data appending (zone table) Background and method This routine is applicable to the following scenarios: the real-time requirement for data maintenance is very high, the cycle period for appending data is short, and data may be appended at any time; the data need to be stored in multiple zone tables (hereinafter referred to as ‘table’ unless otherwise specified) of a multi-zone composite table in layers; only the append mode is supported, and the data appended at a time is relatively small and can be stored in a table sequence. Method: 1. In order to meet the requirement
Routine for regular maintenance of multi-zone composite table
Background and method This routine is applicable to the following scenarios: data maintenance has no real-time requirements and can be performed regularly in a specific period (usually in hours or day); the total data is very large and needs to be split and stored in multiple zone tables (hereinafter referred to as ‘table’ unless otherwise specified); two modes, append and update, are supported, and the data amount for maintenance each time may be large, and may be passed in as a cursor. Methods: Append mode: the incoming data is required to be ordered by the time field. Maintenance steps: i)split
Routine for regular maintenance of single composite table
Background and method This routine is applicable to the following scenarios: data maintenance has no real-time requirements and can be performed regularly in a specific period (usually in hours or day); the total data is not large so that it can be stored in a single composite table; two modes, append and update, are supported, and the data amount for maintenance each time may be large, and may be passed in as a cursor. Method: use a current composite table for query, and merge the received new data with the current composite table to generate a backup composite table. After
Routine for regular and active update of small amounts of data
Composite table is an important file storage format of SPL, yet the composite table file does not support simultaneous read and write operations and, it also often requires storing the data in order in order to ensure high performance. In practice, however, data is not static and needs to be continuously appended or modified, and the order of newly generated data often differs from that required by composite table. In this case, how to avoid affecting the ongoing query and keep the order of data while maintaining the data of composite table becomes a problem we have to face. This
SPL Practice: integerization during data dump
Using SPL for performance optimization, such as converting data types like string to integer during data dump, can reduce storage space and improve computing performance. This article will present how to implement integerization through a practical example. Problem description The following table is the data structure of a certain space-time collision problem: Field name Field type Field meaning Remarks Sample data no String Object flag Unique flag of object 100000000009 ct Int Timestamp Unix timestamp (seconds) 1690819200 lac String Space flag 1 40000 ci String Space flag 2 66000000 After understanding the business, we know that all the values of







