New Amazon CloudWatch log class to cost-effectively scale your AWS Glue workloads


AWS Glue is a serverless knowledge integration service that makes it simpler to find, put together, and mix knowledge for analytics, machine studying (ML), and software improvement. You should use AWS Glue to create, run, and monitor knowledge integration and ETL (extract, remodel, and cargo) pipelines and catalog your property throughout a number of knowledge shops.

One of the vital widespread questions we get from prospects is the right way to successfully optimize prices on AWS Glue. Over time, now we have constructed a number of options and instruments to assist prospects handle their AWS Glue prices. For instance, AWS Glue Auto Scaling and AWS Glue Flex can assist you scale back the compute price related to processing your knowledge. AWS Glue interactive periods and notebooks can assist you scale back the price of creating your ETL jobs. For extra details about cost-saving finest practices, check with Monitor and optimize price on AWS Glue for Apache Spark. Moreover, to know knowledge switch prices, check with the Price Optimization Pillar outlined in AWS Effectively-Architected Framework. For knowledge storage, you’ll be able to apply normal finest practices outlined for every knowledge supply. For a value optimization technique utilizing Amazon Easy Storage Service (Amazon S3), check with Optimizing storage prices utilizing Amazon S3.

On this publish, we deal with the remaining piece—the price of logs written by AWS Glue.

Earlier than we get into the fee evaluation of logs, let’s perceive the explanations to allow logging on your AWS Glue job and the present choices obtainable. While you begin an AWS Glue job, it sends the real-time logging info to Amazon CloudWatch (each 5 seconds and earlier than every executor stops) throughout the Spark software begins operating. You’ll be able to view the logs on the AWS Glue console or the CloudWatch console dashboard. These logs offer you insights into your job runs and provide help to optimize and troubleshoot your AWS Glue jobs. AWS Glue affords a wide range of filters and settings to scale back the verbosity of your logs. Because the variety of job runs will increase, so does the quantity of logs generated.

To optimize CloudWatch Logs prices, AWS just lately introduced a brand new log class for sometimes accessed logs known as Amazon CloudWatch Logs Rare Entry (Logs IA). This new log class affords a tailor-made set of capabilities at a decrease price for sometimes accessed logs, enabling you to consolidate all of your logs in a single place in a cheap method. This class gives a cheaper choice for ingesting logs that solely have to be accessed often for auditing or debugging functions.

On this publish, we clarify what the Logs IA class is, the way it can assist scale back prices in comparison with the usual log class, and the right way to configure your AWS Glue assets to make use of this new log class. By routing logs to Logs IA, you’ll be able to obtain vital financial savings in your CloudWatch Logs spend with out sacrificing entry to vital debugging info if you want it.

CloudWatch log teams utilized by AWS Glue job steady logging

When steady logging is enabled, AWS Glue for Apache Spark writes Spark driver/executor logs and progress bar info into the next log group:

If a safety configuration is enabled for CloudWatch logs, AWS Glue for Apache Spark will create a log group named as follows for steady logs:

<Log-Group-Title>-<Safety-Configuration-Title>

The default and {custom} log teams will probably be as follows:

  • The default steady log group will probably be /aws-glue/jobs/logs-v2-<Safety-Configuration-Title>
  • The {custom} steady log group will probably be <custom-log-group-name>-<Safety-Configuration-Title>

You’ll be able to present a {custom} log group title via the job parameter –continuous-log-logGroup.

Getting began with the brand new Rare Entry log class for AWS Glue workload

To achieve the advantages from Logs IA on your AWS Glue workloads, you have to full the next two steps:

  1. Create a brand new log group utilizing the brand new Log IA class.
  2. Configure your AWS Glue job to level to the brand new log group

Full the next steps to create a brand new log group utilizing the brand new Rare Entry log class:

  1. On the CloudWatch console, select Log teams below Logs within the navigation pane.
  2. Select Create log group.
  3. For Log group title, enter /aws-glue/jobs/logs-v2-infrequent-access.
  4. For Log class, select Rare Entry.
  5. Select Create.

Full the next steps to configure your AWS Glue job to level to the brand new log group:

  1. On the AWS Glue console, select ETL jobs within the navigation pane.
  2. Select your job.
  3. On the Job particulars tab, select Add new parameter below Job parameters.
  4. For Key, enter --continuous-log-logGroup.
  5. For Worth, enter /aws-glue/jobs/logs-v2-infrequent-access.
  6. Select Save.
  7. Select Run to set off the job.

New log occasions are written into the brand new log group.

View the logs with the Rare Entry log class

Now you’re able to view the logs with the Rare Entry log class. Open the log group /aws-glue/jobs/logs-v2-infrequent-access on the CloudWatch console.

While you select one of many log streams, you’ll discover that it redirects you to the CloudWatch console Logs Perception web page with a pre-configured default command and your log stream chosen by default. By selecting Run question, you’ll be able to view the precise log occasions on the Logs Insights web page.

Concerns

Be mindful the next concerns:

  • You can’t change the log class of a log group after it’s created. It’s essential create a brand new log group to configure the Rare Entry class.
  • The Logs IA class affords a subset of CloudWatch Logs capabilities, together with managed ingestion, storage, cross-account log analytics, and encryption with a decrease ingestion worth per GB. For instance, you’ll be able to’t view log occasions via the usual CloudWatch Logs console. To study extra concerning the options provided throughout each log courses, check with Log Lessons.

Conclusion

This publish offered step-by-step directions to information you thru enabling Logs IA on your AWS Glue job logs. In case your AWS Glue ETL jobs generate giant volumes of log knowledge that makes it a problem as you scale your functions, one of the best practices demonstrated on this publish can assist you cost-effectively scale whereas centralizing all of your logs in CloudWatch Logs. Begin utilizing the Rare Entry class together with your AWS Glue workloads right this moment and luxuriate in the fee advantages.


In regards to the Authors

Noritaka Sekiyama is a Principal Huge Information Architect on the AWS Glue staff. He works based mostly in Tokyo, Japan. He’s answerable for constructing software program artifacts to assist prospects. In his spare time, he enjoys biking on his street bike.

Abeetha Bala is a Senior Product Supervisor for Amazon CloudWatch, primarily centered on logs. Being buyer obsessed, she solves observability challenges via progressive and cost-effective methods.

Kinshuk Pahare is a frontrunner in AWS Glue’s product administration staff. He drives efforts on the platform, developer expertise, and massive knowledge processing frameworks like Apache Spark, Ray, and Python Shell.

Related Articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Latest Articles