4 SPL Big Data Computing - esProc SPL Official Blog

SPL Practices: Parse and Filter Multilevel RESTful JSON

Background It is convenient to exchange data via RESTful, but there is a little troublesome about how to compute the received data. SPL offers HTTP interface to directly read RESTful data and compute it. As the following example shows, in the ecommerce business, orders data access is encapsulated as the REST interface to be accessed by other business systems. Find the example in the following website:http://111.198.29.168:8503/getOrders http://111.198.29.168:8503/getOrders/bDate/eDate Where bDate and eDate are path parameters, which can be formed directly in URL; the parameters specify the beginning date and the ending date for the query. Below is the JSON-format data returned

...

4 SPL Big Data Computing 2023-12-19 Tag：SPL Practices

SPL practice: space-time collision problem that renders MPP powerless to solve

Problem description Definition of space-time collision A certain time interval (such as 7 days) is divided into multiple time slices with fixed time length (like 15 minutes). If object a and object b once appeared at the same location in the same time slice, we call it one collision. Rule 1: Multiple collisions in the same time slice are counted as one collision. Rule 2: In the same time slice, if the last appearance locations of two objects are different, it is called a mismatch. Only when the number of mismatched time slices does not exceed 20 would the collisions

...

4 SPL Big Data Computing 2023-12-14 Tag：SPL practice

Routine for real-time data appending – avoid small tables by means of memory

Background and method Refer to the following article:Routine for real-time data appending In the above article we adopt the method of storing real-time data in multiple layers of zone tables (hereinafter referred to as ‘table’ unless otherwise specified), and use small table with shorter time interval to meet the scenarios where data needs to be frequently appended in order to ensure that new data can be written out quickly and timely. However, this method will result in more files, making it more difficult to manage. To solve this problem, we design an optimization method in this routine based on the

...

4 SPL Big Data Computing 2023-12-07

Routine for real-time data updating

Background and method This routine has similar applicable scenarios as “Routine for real-time data appending”, except that the data needs to be updated. This routine is applicable to the following scenarios: the real-time requirement for data maintenance is very high, the cycle period for updating data is short, and data may be updated at any time; the data need to be stored in multiple zone tables (hereinafter referred to as ‘table’ unless otherwise specified) of a multi-zone composite table in layers; only the update mode is supported. Key differences from append routine: Definitions and concepts Key differences from append routine:

...

4 SPL Big Data Computing 2023-12-03

Routine for real-time data appending

Routine for real-time data appending (zone table) Background and method This routine is applicable to the following scenarios: the real-time requirement for data maintenance is very high, the cycle period for appending data is short, and data may be appended at any time; the data need to be stored in multiple zone tables (hereinafter referred to as ‘table’ unless otherwise specified) of a multi-zone composite table in layers; only the append mode is supported, and the data appended at a time is relatively small and can be stored in a table sequence. Method: 1. In order to meet the requirement

...

4 SPL Big Data Computing 2023-11-30

Routine for regular maintenance of multi-zone composite table

Background and method This routine is applicable to the following scenarios: data maintenance has no real-time requirements and can be performed regularly in a specific period (usually in hours or day); the total data is very large and needs to be split and stored in multiple zone tables (hereinafter referred to as ‘table’ unless otherwise specified); two modes, append and update, are supported, and the data amount for maintenance each time may be large, and may be passed in as a cursor. Methods: Append mode: the incoming data is required to be ordered by the time field. Maintenance steps: i)split

...

4 SPL Big Data Computing 2023-11-22

Routine for regular maintenance of single composite table

Background and method This routine is applicable to the following scenarios: data maintenance has no real-time requirements and can be performed regularly in a specific period (usually in hours or day); the total data is not large so that it can be stored in a single composite table; two modes, append and update, are supported, and the data amount for maintenance each time may be large, and may be passed in as a cursor. Method: use a current composite table for query, and merge the received new data with the current composite table to generate a backup composite table. After

...

4 SPL Big Data Computing 2023-11-17

Routine for regular and active update of small amounts of data

Composite table is an important file storage format of SPL, yet the composite table file does not support simultaneous read and write operations and, it also often requires storing the data in order in order to ensure high performance. In practice, however, data is not static and needs to be continuously appended or modified, and the order of newly generated data often differs from that required by composite table. In this case, how to avoid affecting the ongoing query and keep the order of data while maintaining the data of composite table becomes a problem we have to face. This

...

4 SPL Big Data Computing 2023-11-11

SPL Practice: integerization during data dump

Using SPL for performance optimization, such as converting data types like string to integer during data dump, can reduce storage space and improve computing performance. This article will present how to implement integerization through a practical example. Problem description The following table is the data structure of a certain space-time collision problem: Field name Field type Field meaning Remarks Sample data no String Object flag Unique flag of object 100000000009 ct Int Timestamp Unix timestamp (seconds) 1690819200 lac String Space flag 1 40000 ci String Space flag 2 66000000 After understanding the business, we know that all the values of

...

4 SPL Big Data Computing 2023-10-28

SPL practice: data flow during speeding up batch job

Speeding up batch jobs is one of the major optimization scenarios of SPL, and storing the data of batch job into SPL’s high-performance file is an important step in the optimization process. The data that needs to be dumped usually involves two parts: historical cold data and periodic incremental data (added, deleted or modified data). This article will present how to dump and calculate these two parts of data, as well as how to perform periodic update and regular reorganization. I. Dump the historical data Composite table is a high-performance storage format provided by SPL; its principle is to sort

...

4 SPL Big Data Computing 2023-10-26 Tag：SPL practice

Category: 4 SPL Big Data Computing