Category: 4 SPL Big Data Computing
Generate TPCH data using SPL
The TPC official website provides a C language generation program, which can be downloaded, compiled, and executed to generate TPCH data. However, many people are not familiar with the C language and its environment, and are stuck at this step. According to official rules, a SPL script has been written here to generate TPCH data. This way, as long as esProc is installed, data can be easily generated. TPCH has 8 tables, with 5 fixed records in the region table and 25 fixed records in the nation table. The remaining 6 tables are generated based on the Scale Factor (SF).
SPL Practices:Mixed-Source Computing
Course data download Background The sources and storage forms and types of data in an application are diverse, including not only the traditional relational databases, but also more data sources such as NoSQL databases, cloud storage, APIs, file systems, etc. Therefore, combining and analyzing data originating these different sources constitutes the problem of mixed data source computations. Logical data warehouses can facilitate mixed-source computations to some extent, because most of them are SQL-based and can access RDB data sources through table mapping. But it is difficult for them to access other types of data sources. Even with the help of
SPL Practices: Cross-database SQL Migration
Background Applications may need to work based on different databases. Although the SQL syntax for various databases is generally consistent, differences still exist, which necessitates modifications to related SQL statements. Such modifications often require manual adjustments, which involve heavy workload and are error-prone. Fully automating the SQL transformation is nearly impossible due to the varying functionalities of different databases. However, upon closer examination, it becomes clear that most issues stem from differences in the syntax of SQL functions. Especially for functions related to dates and strings, there isn’t a standard in the field, and each database has its own approach.
SPL Practice: Structured Text computing
Download Test Data Structured Text Computing Requirements Structured text files are a common data storage method, such as this score.txt file, which records the scores of students in all classes. The first row is the column name, and each subsequent row is the data of a student, separated by tab within each row. This structured text also has various computational and processing requirements. For example, selecting the girls with the English score of 90 or above, to calculate the total score of each student and sort them in descending order, to calculate the average Chinese language score for each class.
Hot data caching routine
Overview Data maintenance routine can implement regular maintenance and update of data. For real-time hot data, however, they can only be read instantly during query and then returned after being merged with historical data. Based on this situation, the ability to quickly return results when querying real-time hot data and to handle frequent concurrent access is required, but this will impose a significant burden on the business system. If the real-time hot data can be stored separately in memory and read and returned directly from memory during query, it can greatly speed up the query speed and, it can also







