Category: 5 SPL Learning Material
esProc Elastic Compute Service Work Procedure
esProc Elastic Compute Service (ECS) is a general computing software running on enterprise-class LAN and proprietary cloud. It has three components. The service-side consists of QVA and QVM; the application-side is made up of the esProc ECS application (hereinafter called APP) and QVS; the storage-side is NFS, HDFS or an object storage system compatible with S3 protocol. Both the service-side and the application-side involve the SPL script. At the service-side, the SPL script is executed on QVM and it is also called QVM script. On the application-side, a SPL script is needed to call the QVM script and it is
SPL Multizone Composite Tables
There are generally not many and frequent updates on the target data of OLAP. Usually, the update actions happen when new data is appended or when data is inserted, modified and deleted. SPL offers the multizone composite table that can effectively shorten the time of handling data updating while ensuring the computing performance. A multizone composite table is made up of multiple composite table files. We call these composite tables the multizone composite table’s zone tables. Each zone table has its own zone table number. 1. Append-type multizone composite tables In order to increase performance, SPL needs to store data
Column-wise computing of SPL
In-memory column-wise computing What is columnar storage The table sequence in memory generally adopts the row-based storage. For example, the employee table contains three fields ‘id, name and birthday’, which are stored in memory roughly as follows: Each row (i.e., each record) is stored as an Object array, including three member objects: [Integer,String,Date]. In general, each column (field) contains the same type of data. Under this premise, SPL can store data by column. For example, if the data in the id column are all integers, they can be stored as an int array; if the data in the name column
SPL time key
What is a time key? While relatively stable, the data of dimension table may still change. For example, the city where a certain customer is located changed from New York to Chicago on May 15, 2020. When associating the order table with customer table, the order before this date should be associated with the old customer record (that is, the city should still be New York), while the order on and after this date should be associated with the new customer record (that is, the city should be Chicago). In other words, we need to find the correct customer record
New association calculation methods of SPL
“Association calculation in SPL – In-memory join” presents the classification of association calculations in SPL and the programming methods for in-memory join. “Association calculation in SPL – external storage join” presents the programming methods for external storage join. This article will continue to present new association calculation methods of SPL, including the fjoin function and composite table cursor association & filtering mechanism for foreign key join, as well as the pjoin and new/news functions for primary key join. When used in appropriate scenarios, these new methods can achieve better performance than those introduced in the previous two articles. However, the
Association calculation in SPL – external storage join
The previous article “Association calculation in SPL – In-memory join” (In-memory join for short) presents the classification of association calculations in SPL and the programming methods for in-memory join. When one or more association tables have a large amount of data and need to be stored in external storage, the in-memory join algorithms cannot be used. For this reason, SPL specifically provides external storage join algorithms. When solving external storage join problems, there are similarities with in-memory join: 1. Clearly distinguish the type of join, and find the (logical) primary key participating in association; 2. Choose different SPL functions to
Association calculation in SPL – In-memory join
The association calculation in SPL differs significantly from that in SQL. SQL defines join as an operation that first calculates the Cartesian product and then filters. SPL also provides this operation, yet it has better alternatives in most scenarios, so this operation is not recommended. Programming in SPL to implement association calculation needs to subdivide join into different types first, and then select the corresponding function to code. Classification of association calculations The equivalence JOIN in the figure refers to the join whose filter condition is that the field of one table is equal to the corresponding field of associated







