MySQL Analytics with HeatWave

Introduction to HeatWave

Organizations invest in data technology and tooling to leverage their data for a variety of reasons, such as:

· Gaining a clear picture of performance

· Creating a single source of truth for business metrics

· Maintaining a sustainable e-commerce business

· Supporting stock and inventory management

· Understanding customer behaviour and preferences

· And optimizing advertising and marketing budgets

However, many businesses struggle to make their models viable due to the high cost and complexity of modern data stacks. One of the main challenges in managing existing data and analytics solutions is identifying suitable databases for online transaction processing (OLTP), online analytical processing (OLAP), and machine learning (ML) volumes and transporting extracted data between them while also maintaining multiple databases.

MySQL HeatWave addresses these challenges by combining transactional, analytic, and ML services into a single, fully managed service with the added power of an in-memory query accelerator. This results in a highly performant solution for SQL analytics at a fraction of the cost compared to other industry solutions.

Massive Parallel Architecture

The architecture of HeatWave allows for each node in a cluster and each core within a node to process partitioned data simultaneously. This includes parallel scanning, joining, grouping, aggregating, and top-k processing, which results in high cache hits for analytic operations and excellent scalability between nodes. HeatWave uses a compressed and optimized in-memory representation for both numeric and string data, which leads to significant performance improvements and reduced memory usage for customers. The in-memory representation also uses a columnar format, which facilitates vectorized processing and further enhances query performance.

HeatWave is a storage engine for MySQL that can be easily integrated with existing MySQL tools. It offers fast data reload operations such as error recovery and maintenance through a data management layer that is scalable and can work with OCI object storage or Amazon S3. The engine has a query optimizer that automatically decides if it can handle the query and if the estimated processing time is faster than the existing HeatWave cluster, the cluster can be scaled up or down without interrupting client connections. During the scaling operation, the cluster remains functional, and there is little to no effect on query performance.

MySQL Autopilot

Autopilot is a feature of HeatWave that improves the query optimizer’s intelligence by utilizing advanced techniques. It collects statistics on data and queries, samples data, and builds machine-learning models that can model memory usage, network load, and execution time. Autopilot focuses on four primary areas, which are system setup, data load, query execution, and failure handling.

Within the system setup, Autopilot has two main features, auto-provisioning, and auto-shape predictions. Auto-provisioning uses data sampling from tables to predict the number of nodes required to run a specific workload. Auto-shape predictions use machine-learning models to predict the optimal cluster shape for the workload.

Auto shape prediction is a feature of HeatWave that enables customers to optimize their price and performance by analyzing OLTP workload, throughput, and buffer pool hit rate. It recommends the appropriate compute shape for the workload at any given time.

The Data Load feature is enriched with four auto-features. Auto parallel loading predicts the optimal degree of parallelism for each table to enable faster data loading. Auto encoding determines the optimal representation of columns being loaded to minimize the cost and maximize query performance. Auto data placement uses an in-memory partition to achieve the best query performance and provides expected gain percentage and new column recommendations. Auto unload unloads unused table volumes, freeing up resources for other tasks.

Query Execution includes several features such as Auto thread pooling, which enables the database service to process more transactions for a given hardware configuration. Auto scheduling determines which queries in the queue are short-running and prioritizes them over long-running queries intelligently. Auto change propagation determines the optimal time to propagate changes from the MySQL database to the HeatWave storage layer. Auto query time estimation estimates the execution time of a query before executing it, allowing quick testing of different queries. Auto query plan improvement learns from execution data and improves the execution plan of future queries.

Failure Handling is addressed by the Auto error recovery feature, which reloads necessary data if one or more HeatWave nodes become unresponsive. This improves the system’s performance as more queries are run. Auto scheduling, auto query time estimation, and auto thread pooling also contribute to failure handling by reducing overall wait time and preventing OLTP workload drops at high levels of transactions and concurrency.

HeatWave AutoML

HeatWave AutoML is a technology that automates the creation of machine learning models by leveraging MySQL interfaces and explanations. This eliminates the need for the user to have deep knowledge of machine learning. The models created using HeatWave Atoms can be explained, which is important for building trust, ensuring fairness, and meeting regulatory requirements.

HeatWave AutoML is designed with security in mind. There are no security loopholes that allow unauthorized access to data or models within the database by clients or other services. HeatWave AutoML scales with the size of the cluster and can be easily upgraded. Compared to other competing services, HeatWave AutoML provides high performance at a lower cost.

Technology Advantages

Data Scientists often face laborious and time-consuming tasks such as tuning hyperparameters, selecting a suitable sample of data, fine-tuning pipelines, processing data, and generalizing models. HeatWave AutoML automates these tasks, minimizing the number of trials by extensively using meta-learning and providing an optimal model within a given time budget. It leverages Oracle AutoML, which has a scalable design and has been proven effective in various Oracle products, including the OCI Data Science Service and the Oracle Database. With HeatWave AutoML, data scientists can use the familiar MySQL interface to automate ML-tuned model creation, including interfaces and explanations, without the need for ML expertise. All models created by HeatWave AutoML can be explained, which is essential for building trust, demonstrating fairness, and complying with regulatory requirements. HeatWave AutoML is highly secure and easy to upgrade and scales with the size of the cluster. Compared to other competing services, it is much less expensive while delivering high performance.

Comparison Matrix for HeatWave with the competing analytics services

Conclusion

MySQL HeatWave is a solution that allows organizations to manage OLTP, OLAP, and ML databases with a single service. This means that businesses can leverage HeatWave for better analytic query performance and lower costs without the need for ETL. HeatWave is designed with a massively parallel architecture that enables high cache hits and provides excellent inter-node scalability, resulting in a highly performant solution for SQL analytics at a fraction of the cost compared to other industry solutions.

The HeatWave platform includes native machine learning capabilities called Autopilot and AutoML, which automate many of the tasks previously performed by data scientists, such as tuning hyperparameters, selecting data samples, and pipeline fine-tuning. Autopilot builds machine learning models to model memory usage, network load, and execution time by using advanced techniques to sample data, collect statistics on data and queries, and build machine learning models. Autopilot leverages Oracle AutoML to automate the task of generating models and is scalable, minimizing the number of trials by using meta-learning, and providing an optimal model given a time budget.

The HeatWave platform is fully managed, making it easy to use and accessible to businesses of all sizes. HeatWave’s automation features improve performance, scalability, and ease of use, allowing businesses to leverage their data to optimize outcomes while reducing the complexity and cost of managing data and analytics solutions. Overall, MySQL HeatWave provides a cost-effective, high-performance solution for businesses looking to leverage their data to gain insights and drive better business outcomes.

Leave a Reply

Your email address will not be published. Required fields are marked *