Parallelism plays an important part in Redshift at the cluster level too. After Playrix switched to Redshift alongside serverless application container AWS Fargate (which slurps data from its partner systems) it saw improved response times for queries on massive amounts of historical data and reduced its monthly costs by 20 percent. Its EC2-hosted PostgreSQL database had served it well but was struggling to keep up. The company, which has 85 million daily active users, must analyze tens of petabytes of data to understand how players interact with its games. Mobile gaming company Playrix began using Amazon Redshift Serverless in 2022 in a bid to improve its use of marketing analytics to increase game sales. This accelerated the processing of multiple strings by up to sixty times for queries processing large amounts of string data, Gromoll says. Vectorizing the algorithms that read string compression encodings permitted CPU-efficient scans over compressed dictionary-encoded string columns. "We realized that there was a lot of opportunity to provide benefits to our customers by optimizing the string performance." Gromoll recalls.Īmazon engineers developed a new way to manage compressed string data on disk. Redshift customers store much of their data as strings rather than integers or floating point numbers. Vectorization uses that capability to run a single instruction on multiple numbers at once in a single clock cycle. As register sizes increased, a single register could store multiple numbers. Companies handled parallel operations by running groups of calculations across multiple cores in concert, but this still left space for performance improvements at the single-core level. With smaller registers, a single CPU core was only able to conduct one mathematical operation per clock cycle. One recent example was string vectorization, which applies a general performance-enhancing technology to string processing. Then, they work on ways to squeeze more data processing performance out of Redshift for the same cost. Gromoll and his team regularly examine performance telemetry from the Redshift fleet to find common performance optimization opportunities. But the real focus is on improving real-world data warehouse performance in areas that really matter for customers. "We want to know where we stand with these official benchmarks because we know people run them out there," he says.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |