Batch processing is difficult. You run the risk of underprovisioning your resources and not having enough to process all your jobs. Or you overprovision and spend significantly more on resources.
Whether you're processing thousands of transactions a day or running an ETL on new data once a month, AWS Batch can run batch jobs. These jobs can be at any scale, and you only pay for the resources you use while those jobs are running. Walk through this AWS Batch tutorial to learn how to set up and use this tool.
Setting up the batch resources
Before you run a batch job, set up the necessary resources. Here is a list of resources and what they accomplish:
- Compute environment. This type of compute resource runs each of your jobs. AWS manages how many of these resources must be created based on this definition and how many jobs are currently queued for processing. You can pick either a Fargate or EC2 configuration. Note that EC2 also supports spot instances for more cost optimization options for your workload.
- Job queue. When you submit a new batch job, it waits in a job queue until there is a compute environment ready to process it. The ability to set a priority level on each job queue makes it possible to have higher and lower priority jobs that run simultaneously at the time they are submitted.
- Job definition. Your job may require additional configurations to run, such as environment variables, IAM policies and persistent storage attached. You can set CPU and memory usage for each job. If your task is already packaged in a container image, you can define that here as well.
- Job. This is the actual unit of work, a single command-line command with any arguments or parameters. You can submit the job through the AWS console, the CLI or any AWS SDK.
Batch job states
Once you submit a job to AWS Batch, it moves through several states that describe what the batch service is doing. If jobs spend a long time in one state before they succeed or fail, this can indicate that you need to make changes to your AWS Batch components.
Once you submit a job to the batch service, it will inherit the properties defined in the job definition you attach to it. It will pass through four main states -- submitted, pending, starting and runnable -- before it can run successfully.
Submitted. In the submitted state, the batch service determines if an instance from the compute environment assigned to that job queue is available to process it. If one isn't available, the batch tries to create a new one based on how you configured your compute environment.
Pending. If a job in the queue cannot run because it has dependencies on another resource or job, it is in the pending state. It then moves to runnable once the dependencies are satisfied.
Starting. Once a new compute resource is available, the job moves to the pending state where it pulls any container images it needs to run the job.
Runnable. Finally, the job moves to the running state. Here, the command in the submitted job executes. If it returns an exit code 0, the batch service moves it to the succeeded state. Otherwise, it is moved to the failed state. In either case, if there are no other jobs in the job queue, the compute resource that ran the job is destroyed.
Run a 'Hello World' batch job
Set up a simple AWS Batch pipeline. For this example, title the job 'Hello World.'
- Start in the AWS console and find the AWS Batch service.
- Navigate to Compute Environments in the taskbar on the left-hand side and click on Create. This pipeline uses AWS Fargate to run the batch job inside a container. Give the compute environment a name and leave the rest of the settings as default before clicking Create compute environment.
- Go to Job queues in the taskbar and select Create. Give this queue a name.
- Scroll down to Connected compute environments and select the compute environment you created in the last step before clicking Create.
- Go to Job definitions from the taskbar and click Create. You'll see an option here for single-node or multi-node parallel. It's possible to use AWS Batch to run more intensive jobs such as machine learning model training, image classification or intensive ETL jobs.
- For this example, select Single-node. The job definition is where you'll spend the most time configuring your AWS Batch resources. Go through the options and change the following:
- Name: Same as the other resources, the name can be whatever you want it to be.
- Assign public IP: This must be enabled for the job to pull a container from a public registry or for network access outside of your VPC. Turn it on for this example.
- Image: This is the container image in which the job runs. Because this is a bash command, you can use the default of 'public.ecr.aws/amazonlinux/amazonlinux:latest' or 'ubuntu:latest' which will use the image from Docker Hub.
- Command: The command that runs the job. For this example, use echo 'Hello World'
- For this example, leave the rest of the options at their default settings and save the job definition.
- Go to the jobs panel from the left-hand taskbar and click Submit new job.
- Give the job a name, select the job definition and job queue you created earlier.
- Scroll down to the bottom of the page and click Submit. After refreshing the page, you should now see the job appear. If you click on it, you can see what state the job is in. After a minute or so, it should show the job status as succeeded and you can scroll to the bottom to see the output from the job.