One of the most-often cited reasons for designing applications using cloud patterns is the ability to scale out. That is: to add additional resources as required. This is in contrast to the previous mentality of increasing capacity by scaling the size of existing resources up. In order for scale out to be feasible, you’ll need to do two things:
In section 2, we talked about various aspects of the application architecture, such as building in a modular fashion, creating an API, and so on. Now you’ll see why those are so important. By creating a modular application with decoupled services, it is possible to identify components that cause application performance bottlenecks and scale them out.
Just as importantly, you can also remove resources when they are no longer necessary. It is very difficult to overstate the cost savings that this feature can bring, as compared to traditional infrastructure.
Of course, just having access to additional resources is only part of the battle; while it’s certainly possible to manually add or destroy resources, you’ll get more value – and more responsiveness – if the application simply requests new resources automatically when it needs them.
This section continues to illustrate the separation of services onto multiple instances and highlights some of the choices we’ve made that facilitate scalability in the app’s architecture.
We’ll progressively ramp up to use up to about 6 instances, so ensure that your cloud account has appropriate quota to handle that many.
In the previous section, we used two virtual machines - one ‘control’ service and one ‘worker’. In our application, the speed at which fractals can be generated depends on the number of workers. With just one worker, we can only produce one fractal at a time. Before long, it will be clear that we need more resources.
Note
If you don’t have a working application, follow the steps in Section Two: Introduction to the Fractals Application Architecture to create one.
You can test for yourself what happens when the Fractals app is under loaded by * maxing out the CPU of the existing worker instances (loading the worker) * generating a lot of API requests (load up the API)
Use SSH to login to the controller instance, app-controller, using the previous added SSH keypair.
$ ssh -i ~/.ssh/id_rsa USERNAME@IP_CONTROLLER
Note
Replace IP_CONTROLLER with the IP address of the controller instance and USERNAME to the appropriate username.
Call the Fractal app’s command line interface (faafo) to request the generation of 5 large fractals.
$ faafo create --height 9999 --width 9999 --tasks 5
Now if you check the load on the worker, you can see that the instance is not doing well. On our single CPU flavor instance, a load average of more than 1 means we are at capacity.
$ ssh -i ~/.ssh/id_rsa USERNAME@IP_WORKER uptime
10:37:39 up 1:44, 2 users, load average: 1.24, 1.40, 1.36
Note
Replace IP_WORKER with the IP address of the worker instance and USERNAME to the appropriate username.
API load is a slightly different problem to the previous one regarding capacity to work. We can simulate many requests to the API as follows:
Use SSH to login to the controller instance, app-controller, using the previous added SSH keypair.
$ ssh -i ~/.ssh/id_rsa USERNAME@IP_CONTROLLER
Note
Replace IP_CONTROLLER with the IP address of the controller instance and USERNAME to the appropriate username.
Call the Fractal app’s command line interface (faafo) in a for loop to send many requests to the API. The following command will request a random set of fractals, 500 times:
$ for i in $(seq 1 500); do faafo --endpoint-url http://IP_CONTROLLER create &; done
Note
Replace IP_CONTROLLER with the IP address of the controller instance.
Now if you check the load on the API service instance, app-controller, you can see that the instance is not doing well. On our single CPU flavor instance, a load average of more than 1 means we are at capacity.
$ uptime
10:37:39 up 1:44, 2 users, load average: 1.24, 1.40, 1.36
The number of requests coming in means that some requests for fractals may not even get onto the message queue to be processed. To ensure we can cope with demand, we need to scale out our API services as well.
As you can see, we need to scale out the Fractals application’s API capability.
Go ahead and delete the existing instances and security groups you created in previous sections. Remember; when components in the cloud aren’t doing what you want them to do, just remove them and re-create something new.
for instance in conn.list_nodes():
if instance.name in ['all-in-one','app-worker-1', 'app-worker-2', 'app-controller']:
print('Destroying Instance: %s' % instance.name)
conn.destroy_node(instance)
for group in conn.ex_list_security_groups():
if group.name in ['control', 'worker', 'api', 'services']:
print('Deleting security group: %s' % group.name)
conn.ex_delete_security_group(group)
As you change the topology of your applications, you will need to update or create new security groups. Here, we will re-create the required security groups.
api_group = conn.ex_create_security_group('api', 'for API services only')
conn.ex_create_security_group_rule(api_group, 'TCP', 80, 80)
conn.ex_create_security_group_rule(api_group, 'TCP', 22, 22)
worker_group = conn.ex_create_security_group('worker', 'for services that run on a worker note')
conn.ex_create_security_group_rule(worker_group, 'TCP', 22, 22)
controller_group = conn.ex_create_security_group('control', 'for services that run on a control note')
conn.ex_create_security_group_rule(controller_group, 'TCP', 22, 22)
conn.ex_create_security_group_rule(controller_group, 'TCP', 80, 80)
conn.ex_create_security_group_rule(controller_group, 'TCP', 5672, 5672, source_security_group=worker_group)
services_group = conn.ex_create_security_group('services', 'for DB and AMQP services only')
conn.ex_create_security_group_rule(services_group, 'TCP', 22, 22)
conn.ex_create_security_group_rule(services_group, 'TCP', 3306, 3306, source_security_group=api_group)
conn.ex_create_security_group_rule(services_group, 'TCP', 5672, 5672, source_security_group=worker_group)
conn.ex_create_security_group_rule(services_group, 'TCP', 5672, 5672, source_security_group=api_group)
Define a short function to locate unused or allocate a new floating IP. This saves a few lines of boring code and prevents you from reaching your Floating IP quota too quickly.
def get_floating_ip(conn):
'''A helper function to re-use available Floating IPs'''
unused_floating_ip = None
for floating_ip in conn.ex_list_floating_ips():
if not floating_ip.node_id:
unused_floating_ip = floating_ip
break
if not unused_floating_ip:
pool = conn.ex_list_floating_ip_pools()[0]
unused_floating_ip = pool.create_floating_ip()
return unused_floating_ip
Prior to scaling out our application services like the API service or the workers we have to add a central database and messaging instance, called app-services, that will be used to track the state of the fractals and to coordinate the communication between the services.
userdata = '''#!/usr/bin/env bash
curl -L -s http://git.openstack.org/cgit/stackforge/faafo/plain/contrib/install.sh | bash -s -- \
-i database -i messaging
'''
instance_services = conn.create_node(name='app-services',
image=image,
size=flavor,
ex_keyname='demokey',
ex_userdata=userdata,
ex_security_groups=[services_group])
instance_services = conn.wait_until_running([instance_services])[0][0]
services_ip = instance_services.private_ips[0]
With multiple workers producing fractals as fast as they can, we also need to make sure we can receive the requests for fractals as quickly as possible. If our application becomes popular, we may have many thousands of users trying to connect to our API to generate fractals.
Armed with our security group, image and flavor size we can now add multiple API services:
userdata = '''#!/usr/bin/env bash
curl -L -s http://git.openstack.org/cgit/stackforge/faafo/plain/contrib/install.sh | bash -s -- \
-i faafo -r api -m 'amqp://guest:guest@%(services_ip)s:5672/' \
-d 'mysql://faafo:password@%(services_ip)s:3306/faafo'
''' % { 'services_ip': services_ip }
instance_api_1 = conn.create_node(name='app-api-1',
image=image,
size=flavor,
ex_keyname='demokey',
ex_userdata=userdata,
ex_security_groups=[api_group])
instance_api_2 = conn.create_node(name='app-api-2',
image=image,
size=flavor,
ex_keyname='demokey',
ex_userdata=userdata,
ex_security_groups=[api_group])
instance_api_1 = conn.wait_until_running([instance_api_1])[0][0]
api_1_ip = instance_api_1.private_ips[0]
instance_api_2 = conn.wait_until_running([instance_api_2])[0][0]
api_2_ip = instance_api_2.private_ips[0]
for instance in [instance_api_1, instance_api_2]:
floating_ip = get_floating_ip(conn)
conn.ex_attach_floating_ip_to_node(instance, floating_ip)
print('allocated %(ip)s to %(host)s' % {'ip': floating_ip.ip_address, 'host': instance.name})
These are client-facing services, so unlike the workers they do not use a message queue to distribute tasks. Instead, we’ll need to introduce some kind of load balancing mechanism to share incoming requests between the different API services.
One simple way might be to give half of our friends one address and half the other, but that’s certainly not a sustainable solution. Instead, we can do that automatically using a DNS round robin. However, OpenStack networking can provide Load Balancing as a Service, which we’ll explain in Section Seven: Networking.
To increase the overall capacity, we will now add 3 workers:
userdata = '''#!/usr/bin/env bash
curl -L -s http://git.openstack.org/cgit/stackforge/faafo/plain/contrib/install.sh | bash -s -- \
-i faafo -r worker -e 'http://%(api_1_ip)s' -m 'amqp://guest:guest@%(services_ip)s:5672/'
''' % {'api_1_ip': api_1_ip, 'services_ip': services_ip}
instance_worker_1 = conn.create_node(name='app-worker-1',
image=image, size=flavor,
ex_keyname='demokey',
ex_userdata=userdata,
ex_security_groups=[worker_group])
instance_worker_2 = conn.create_node(name='app-worker-2',
image=image, size=flavor,
ex_keyname='demokey',
ex_userdata=userdata,
ex_security_groups=[worker_group])
instance_worker_3 = conn.create_node(name='app-worker-3',
image=image, size=flavor,
ex_keyname='demokey',
ex_userdata=userdata,
ex_security_groups=[worker_group])
Adding this capacity enables you to deal with a higher number of requests for fractals. As soon as these worker instances come up, they’ll start checking the message queue looking for requests, reducing the overall backlog like a new register opening in the supermarket.
This was obviously a very manual process - figuring out we needed more workers and then starting new ones required some effort. Ideally the system would do this itself. If your application has been built to detect these situations, you can have it automatically request and remove resources, but you don’t actually need to do this work yourself. Instead, the OpenStack Orchestration service can monitor load and start instances as appropriate. See Section Six: Orchestration to find out how to set that up.
In the steps above, we’ve split out several services and expanded capacity. SSH to one of the app instances and create a few fractals. You will see that the Fractals app has a few new features.
$ ssh -i ~/.ssh/id_rsa USERNAME@IP_API_1
Note
Replace IP_API_1 with the IP address of the first API instance and USERNAME to the appropriate username.
Use the Fractal app’s command line interface to generate fractals faafo create. Watch the progress of fractal generation with the faafo list. Use faafo UUID to examine some of the fractals. The generated_by field will show which worker created the fractal. The fact that multiple worker instances are sharing the work means that fractals will be generated more quickly and the death of a worker probably won’t even be noticed.
root@app-api-1:/var/log/supervisor# faafo list
+--------------------------------------+------------------+-------------+
| UUID | Dimensions | Filesize |
+--------------------------------------+------------------+-------------+
| 410bca6e-baa7-4d82-9ec0-78e409db7ade | 295 x 738 pixels | 26283 bytes |
| 66054419-f721-492f-8964-a5c9291d0524 | 904 x 860 pixels | 78666 bytes |
| d123e9c1-3934-4ffd-8b09-0032ca2b6564 | 952 x 382 pixels | 34239 bytes |
| f51af10a-084d-4314-876a-6d0b9ea9e735 | 877 x 708 pixels | 93679 bytes |
+--------------------------------------+------------------+-------------+
root@app-api-1:# faafo show d123e9c1-3934-4ffd-8b09-0032ca2b6564
+--------------+------------------------------------------------------------------+
| Parameter | Value |
+--------------+------------------------------------------------------------------+
| uuid | d123e9c1-3934-4ffd-8b09-0032ca2b6564 |
| duration | 1.671410 seconds |
| dimensions | 952 x 382 pixels |
| iterations | 168 |
| xa | -2.61217 |
| xb | 3.98459 |
| ya | -1.89725 |
| yb | 2.36849 |
| size | 34239 bytes |
| checksum | d2025a9cf60faca1aada854d4cac900041c6fa762460f86ab39f42ccfe305ffe |
| generated_by | app-worker-2 |
+--------------+------------------------------------------------------------------+
root@app-api-1:# faafo show 66054419-f721-492f-8964-a5c9291d0524
+--------------+------------------------------------------------------------------+
| Parameter | Value |
+--------------+------------------------------------------------------------------+
| uuid | 66054419-f721-492f-8964-a5c9291d0524 |
| duration | 5.293870 seconds |
| dimensions | 904 x 860 pixels |
| iterations | 348 |
| xa | -2.74108 |
| xb | 1.85912 |
| ya | -2.36827 |
| yb | 2.7832 |
| size | 78666 bytes |
| checksum | 1f313aaa36b0f616b5c91bdf5a9dc54f81ff32488ce3999f87a39a3b23cf1b14 |
| generated_by | app-worker-1 |
+--------------+------------------------------------------------------------------+
The fractals are now available from any of the app-api hosts. Visit http://IP_API_1/fractal/FRACTAL_UUID and http://IP_API_2/fractal/FRACTAL_UUID to verify. Now you have multiple redundant web services. If one dies, the others can be used.
Note
Replace IP_API_1 and IP_API_2 with the corresponding Floating IPs. Replace FRACTAL_UUID the UUID of an existing fractal.
Go ahead and test the fault tolerance. Start killing workers and API instances. As long as you have one of each, your application should be fine. There is one weak point though. The database contains the fractals and fractal metadata. If you lose that instance, the application will stop. Future sections will work to address this weak point.
If we had a load balancer, we could distribute this load between the two different API services. As mentioned previously, there are several options. We will show one in Section Seven: Networking.
You could in theory use a simple script to monitor the load on your workers and API services and trigger the creation of new instances, which you already know how to do. If you can see how to do that - congratulations, you’re ready to create scalable cloud applications.
Of course, creating a monitoring system just for one application may not always be the best way. We recommend you look at Section Six: Orchestration to find out about how you can use OpenStack Orchestration’s monitoring and autoscaling capabilities to do steps like this automatically.
You should now be fairly confident about starting new instance, and about segregating services of an application between them.
As mentioned in Section Two: Introduction to the Fractals Application Architecture the generated fractals images will be saved on the local filesystem of the API service instances. Because we now have multiple API instances up and running the generated fractal images will be spreaded accross multiple API services, stored on local instance filesystems. This ends in a lot of IOError: [Errno 2] No such file or directory exceptions when trying to download a fractal image from an API service instance not holding the fractal image on its local filesystem.
From here, you should go to Section Four: Making it Durable to learn how to use Object Storage to solve this problem in a elegant way. Alternately, you may jump to any of these sections:
Here’s every code snippet into a single file, in case you want to run it all in one, or you are so experienced you don’t need instruction ;) If you are going to use this, don’t forget to set your authentication information and the flavor and image ID.
# step-1
for instance in conn.list_nodes():
if instance.name in ['all-in-one','app-worker-1', 'app-worker-2', 'app-controller']:
print('Destroying Instance: %s' % instance.name)
conn.destroy_node(instance)
for group in conn.ex_list_security_groups():
if group.name in ['control', 'worker', 'api', 'services']:
print('Deleting security group: %s' % group.name)
conn.ex_delete_security_group(group)
# step-2
api_group = conn.ex_create_security_group('api', 'for API services only')
conn.ex_create_security_group_rule(api_group, 'TCP', 80, 80)
conn.ex_create_security_group_rule(api_group, 'TCP', 22, 22)
worker_group = conn.ex_create_security_group('worker', 'for services that run on a worker note')
conn.ex_create_security_group_rule(worker_group, 'TCP', 22, 22)
controller_group = conn.ex_create_security_group('control', 'for services that run on a control note')
conn.ex_create_security_group_rule(controller_group, 'TCP', 22, 22)
conn.ex_create_security_group_rule(controller_group, 'TCP', 80, 80)
conn.ex_create_security_group_rule(controller_group, 'TCP', 5672, 5672, source_security_group=worker_group)
services_group = conn.ex_create_security_group('services', 'for DB and AMQP services only')
conn.ex_create_security_group_rule(services_group, 'TCP', 22, 22)
conn.ex_create_security_group_rule(services_group, 'TCP', 3306, 3306, source_security_group=api_group)
conn.ex_create_security_group_rule(services_group, 'TCP', 5672, 5672, source_security_group=worker_group)
conn.ex_create_security_group_rule(services_group, 'TCP', 5672, 5672, source_security_group=api_group)
# step-3
def get_floating_ip(conn):
'''A helper function to re-use available Floating IPs'''
unused_floating_ip = None
for floating_ip in conn.ex_list_floating_ips():
if not floating_ip.node_id:
unused_floating_ip = floating_ip
break
if not unused_floating_ip:
pool = conn.ex_list_floating_ip_pools()[0]
unused_floating_ip = pool.create_floating_ip()
return unused_floating_ip
# step-4
userdata = '''#!/usr/bin/env bash
curl -L -s http://git.openstack.org/cgit/stackforge/faafo/plain/contrib/install.sh | bash -s -- \
-i database -i messaging
'''
instance_services = conn.create_node(name='app-services',
image=image,
size=flavor,
ex_keyname='demokey',
ex_userdata=userdata,
ex_security_groups=[services_group])
instance_services = conn.wait_until_running([instance_services])[0][0]
services_ip = instance_services.private_ips[0]
# step-5
userdata = '''#!/usr/bin/env bash
curl -L -s http://git.openstack.org/cgit/stackforge/faafo/plain/contrib/install.sh | bash -s -- \
-i faafo -r api -m 'amqp://guest:guest@%(services_ip)s:5672/' \
-d 'mysql://faafo:password@%(services_ip)s:3306/faafo'
''' % { 'services_ip': services_ip }
instance_api_1 = conn.create_node(name='app-api-1',
image=image,
size=flavor,
ex_keyname='demokey',
ex_userdata=userdata,
ex_security_groups=[api_group])
instance_api_2 = conn.create_node(name='app-api-2',
image=image,
size=flavor,
ex_keyname='demokey',
ex_userdata=userdata,
ex_security_groups=[api_group])
instance_api_1 = conn.wait_until_running([instance_api_1])[0][0]
api_1_ip = instance_api_1.private_ips[0]
instance_api_2 = conn.wait_until_running([instance_api_2])[0][0]
api_2_ip = instance_api_2.private_ips[0]
for instance in [instance_api_1, instance_api_2]:
floating_ip = get_floating_ip(conn)
conn.ex_attach_floating_ip_to_node(instance, floating_ip)
print('allocated %(ip)s to %(host)s' % {'ip': floating_ip.ip_address, 'host': instance.name})
# step-6
userdata = '''#!/usr/bin/env bash
curl -L -s http://git.openstack.org/cgit/stackforge/faafo/plain/contrib/install.sh | bash -s -- \
-i faafo -r worker -e 'http://%(api_1_ip)s' -m 'amqp://guest:guest@%(services_ip)s:5672/'
''' % {'api_1_ip': api_1_ip, 'services_ip': services_ip}
instance_worker_1 = conn.create_node(name='app-worker-1',
image=image, size=flavor,
ex_keyname='demokey',
ex_userdata=userdata,
ex_security_groups=[worker_group])
instance_worker_2 = conn.create_node(name='app-worker-2',
image=image, size=flavor,
ex_keyname='demokey',
ex_userdata=userdata,
ex_security_groups=[worker_group])
instance_worker_3 = conn.create_node(name='app-worker-3',
image=image, size=flavor,
ex_keyname='demokey',
ex_userdata=userdata,
ex_security_groups=[worker_group])
# step-7
Except where otherwise noted, this document is licensed under Creative Commons Attribution 3.0 License http://creativecommons.org/licenses/by/3.0/legalcode.