Category: 2-SPL-Application-Development
SPL Reporting: Lightweight, Low-Cost, Realtime Hot Data Reporting Solution
Realtime hot data reports are reports that can query all data, including both hot and cold, in real-time. In earlier times when businesses are based on one single TP database, there isn’t too much difficulty in generating such reports. But as data gets accumulated and needs to be split to special AP databases, things become different. After separation of cold data and hot data, developing a real-time report based on all data involves multi-database mixed computations. Moreover, the AP database and the TP database are different in type, and this further complicates the reporting. The HTAP database combines TP and
SPL Reporting: Make Lightweight, Mixed-Source Reports without Logical Data Warehouse
Background Often a report has diverse data sources, which can be RDB, NoSQL, text, Excel and MQ. Logical data warehouses can facilitate mixed-source computations to some extent. But logical data warehouse frameworks are too heavy and complex, and cumbersome pre-processing is needed for operations, maintenance and management, making them more suitable for creating platform solutions for large-scale organizations. This is too heavy for an ordinal report. The cost of specifically building a logical data warehouse is not worth the gain. SPL approach SPL provides real-time mixed-data source computing capability, enabling mixed-source computations on any accessible data sources. SPL’s this functionality
SPL Reporting: Cross-Database Migratable Reports
Background Compared with a general transaction processing system (TP), SQL in reports uses more computing functions and involves more complicated logics, hence the application relies more on the database language. The report development may have to deal with switchover from one database to another. Though different databases use similar SQL, there are differences in syntactic details. SQL of reports needs to be modified to adapt to different types of database. The work is heavy and error-prone.100% auto-transformation of SQL statements is infeasible because different functionalities between databases make it difficult to directly migrate certain complicated computations. However, some examinations show
Calculate Client Churn Rate
Problem All records about the sales contracts of an enterprise are listed as below: The lost clients in a year are the clients whose sales values in the Amount field for the previous year are not 0 but those for the current year are 0. By dividing the number of lost clients in a year by the total number of clients of the previous year, you can get the churn rate in that year. Please count the lost clients in 1998 and calculate the client churn rate. Tip General steps: Because one or more contracts may have been signed with
Big data technology that is orders of magnitude faster than SQL
SQL often runs very slowly SQL is still the most commonly used big data computing language, but a fact is that SQL often runs very slowly, seriously wasting hardware resources. The data preparation part of a bank’s anti-money laundering computation: it takes the 11-node Vertical cluster 1.5 hours to process the 3.6 billion rows of data. An e-commerce funnel analysis involving 300 million rows: it takes SnowFlake’s Medium 4-node cluster more than 3 minutes to be unable to get a result. A spatiotemporal collision task involving 25 billion rows: it takes a 5-node ClickHouse cluster 1,800 seconds to complete. The
The “Female Manager’s Male Subordinates” Problem That Frustrates All BI Tools
Join queries are a big, long-standing problem in BI, and wide tables (CUBE) are commonly used to deal with the problem. A wide table is constructed in advance to eliminate the multitable association so that the issue can be bypassed. But this reduces the flexibility of the BI tools. Here is the “female managers’ male subordinates” problem: based on employee table and department table, we want to find out male employees whose 1st-level managers are female. Below is the relationship between the two tables (there is an association relationship between employee table and department table, and the department table contains
Is It Necessary to Use a Specialized In-Memory Database?
It is easy to think about using an in-memory database to solve the performance problem of reporting, BI analysis, batch processing, and other data analysis tasks. An in-memory database allows storing all data permanently in the memory so that the computation external memory accesses (disk reads) are not needed, disk I/O can be avoided and data processing performance can be effectively improved. In-memory databases are advertised to have high performance and be able to solve many performance problems in business data analyses. They are fast because of not only the zero disk I/O costs but also the application of specialized
How far is SPL from vector database?
ChatGPT has made Large Language Model (LLM) popular, and also made vector database popular. The training cost of LLM is too high and the cycle to learn new knowledge for LLM is too long; vector database can act as the “memory” module of LLM, and can find old problems similar to new problems and hand them over to LLM for processing, significantly expanding the application scope of LLM. In fact, vector database has already been utilized in traditional AI and machine learning scenarios, such as face recognition, image search, and voice recognition. The main task of vector database is to







