« Index

AWS APIs and Python

One thing about cloud is that the providers are generally pretty good at exposing APIs to facilitate automation. Another thing about cloud is that without automation, it can be painful.

For the purpose of this article, we're going to focus on using Python to interact with AWS APIs.

Using boto3

The AWS SDK for Python has excellent documentation and makes it super-easy to get started. Follow the first few steps in their quickstart guide to install the library and configure the credentials. Once you have that installed, you can start writing relatively straightforward Python to complete different AWS-related tasks.

For example, you can list S3 buckets in your account:

#!/usr/bin/env python
import boto3

s3 = boto3.resource('s3')
for bucket in s3.buckets.all():
    print(bucket.name)

The boto3 SDK lets you operate using a higher-level, object-oriented resource construct like the example above, or using a lower-level client construct, like this:

#!/usr/bin/env python
import boto3

s3 = boto3.client('s3')
for bucket in s3.list_buckets()['Buckets']:
    print(bucket['Name'])

The client construct is typically closer to the underlying AWS API (per-service quirks and all), while the resource construct is a more generalized abstraction that tends to map more closely to other resources.

Iteration!

One use case for Python/boto3 is automating AWS-related actions across regions, accounts, or resources.

For example, we can iterate across regions getting counts of EC2 instances:

#!/usr/bin/env python3

import boto3
from botocore.exceptions import ClientError

for region_name in boto3.session.Session().get_available_regions('ec2'):
    result_instances = 0
    result_client = 'succeeded'
    ec2 = boto3.resource('ec2', region_name=region_name)
    try:
        result_instances = sum(1 for _ in ec2.instances.all())
    except ClientError:
        result_client = 'failed'
    print(f'{region_name:20}\t{result_client:10}\t{result_instances}')

It takes about 20 seconds for this script to run.

Parallelization with gevent

Waiting for synchronous iteration is less than ideal, so we can parallelize the process - make it asynchronous. Python has a GIL, Python can't do .. oh, yes it can. Python has threading, multiprocessing, and even asyncio more recently. There's a great introduction to these over at Real Python. Another option is gevent.

Abridge uses gevent, so that's the example we'll use here. This iterates across regions getting counts of EC2 instances, same as the previous example, but it does it in parallel.

#!/usr/bin/env python3

from gevent import monkey, spawn, joinall
monkey.patch_all()  # NOQA

import boto3
from botocore.exceptions import ClientError


def collect_for_region(region_name):
    result_instances = 0
    result_client = 'succeeded'
    # the key is to create a Session (connection) inside each greenlet, rather than outside
    # (if you did this outside collect_for_region(), you'd hit concurrency-related errors)
    ec2 = boto3.Session().resource('ec2', region_name=region_name)
    try:
        result_instances = sum(1 for _ in ec2.instances.all())
    except ClientError:
        result_client = 'failed'
    return f'{region_name:20}\t{result_client:10}\t{result_instances}'


regions = boto3.session.Session().get_available_regions('ec2')
jobs = [spawn(collect_for_region, region_name) for region_name in regions]
joinall(jobs)
print(*[job.value for job in jobs], sep='\n')

It takes about 4 seconds for this script to run, approximately 5 times faster than the synchronous example above.

... and more

Here's one more example, which iterates over available AWS profiles and regions, searching for an EC2 instance with a specific ID:

#!/usr/bin/env python

from gevent import monkey, joinall, spawn
monkey.patch_all()

import boto3
import sys

instance_id = sys.argv[1]
results = []

print('Finding instance: {}...'.format(instance_id))


def get_instances(profile_name, account_id, region_name, instance_id):
    print('profile_name: {}\t\tregion: {}'.format(profile_name, region_name))
    # need to create the Session inside the greenlet's target function
    session = boto3.Session(profile_name=profile_name)
    ec2 = session.resource('ec2', region_name=region_name)
    instances = []
    for instance in ec2.instances.filter(Filters=[{'Name': 'instance-id', 'Values': [instance_id,]}]):
        instances.append((profile_name, account_id, region_name, instance.id, instance.instance_type))
    return instances


session = boto3.Session()
profiles = session.available_profiles

for profile_name in profiles:
    session = boto3.Session(profile_name=profile_name)
    ec2 = session.client('ec2')
    # https://boto3.amazonaws.com/v1/documentation/api/latest/guide/ec2-example-regions-avail-zones.html
    regions = [r['RegionName'] for r in ec2.describe_regions()['Regions']]
    account_id = session.client('sts').get_caller_identity().get('Account')

    joined = joinall(
        [spawn(get_instances, profile_name, account_id, region_name, instance_id) for region_name in regions]
    )
    results.extend([j.value for j in joined if j.value != []])

print('Results collected: {}'.format(len(results)))
print(results)

Additional resources

There are many ways Python and boto3 can be used to automate cloud. These were just a few useful patterns that might help get started.

There are more examples for boto3 at https://boto3.amazonaws.com/v1/documentation/api/latest/guide/examples.html and gevent at http://www.gevent.org/examples/.


Interested in learning more about Abridge? Check out the web site, or get started now!


« Index