Category: TalkingData
How Much Is One Terabyte of Data?
It seems that one mile distance isn’t very long, and that a cubic mile isn’t that big if compared with the size of the earth. You may be surprised if I tell you the entire world’s population can fit into a cubic mile of space. The statement is not from me, Hendrik Willem van Loon, a Dutch-American writer, once wrote this in his book. Teradata is a famous data warehouse product. Over 30 years ago, such a brand name aimed to impress people with its ability of handling massive amounts of data. Today, TB is already the smallest unit many
What to do if the query calculation is moved out of database but is too slow using java
Many modern applications will move data computation and processing tasks away from databases and implement them in Java, which can gain framework benefits. Moreover, Java has comprehensive process processing capabilities and is more adept at handling increasingly complex business logic than SQL (although the code is not short). However, we often find that the performance of these Java codes in computing and processing data is not satisfactory, and they cannot even match the performance of SQL in the database. Normally, as a compiled language, Java may not be as good as C++in terms of performance, but it should have an
When will the pre calculation of customer profile analysis be over
Customer profiling is very fashionable in current business analysis. Simply put, it means putting various tags on customers, using these tags to define different customer groups (so-called profiles), and then calculating the quantity (and changes) of customers in each customer group. Logically speaking, tags are dimensions or fields of a data table. Tags are fields with relatively simple values, and there are generally two types: binary tags, which have only two values and are usually represented by 0/1, such as marital status or gender. Another type is enumeration tags, with values ranging from a few to a few hundred, which
After retrieving JSON from ES Kafka Mongodb Restful…
JSON is a good thing that can carry rich structured data information in a common text format. Many modern technologies prefer to use JSON as a data transmission format, such as Elastic Search, Restful, Kafka, etc. Mongodb, which is more concerned about performance, uses binary JSON. Structured data is often in bulk and often requires recalculation. However, JSON related class libraries are not very convenient to use for calculations. JSONpath is fine to parse JSON, but it doesn’t have much computing power. Simple filtering and aggregation are fine, but it cannot handle slightly complex operations such as grouping and summarization.
A major culprit in the slow running and collapse of a database
It is the very inconspicuous account de-duplication count, written in SQL as COUNT (DISTINCT…). Account de-duplication count is common and has important business significance in business analysis. The account here may be a user ID, bank account, phone number, license plate number…. The calculation logic is basically the same, which is to calculate how many accounts meet a certain condition from the historical data of a certain period of time. For example, how many cars have been to NewYork last month? How many phones had calls between 2:00 am and 4:00 am last week? How many bank accounts have received







