This section introduces object storage. OpenStack Object Storage (code-named Swift) is open source software for creating redundant, scalable data storage using clusters of standardized servers to store petabytes of accessible data. It is a long-term storage system for large amounts of static data that can be retrieved, leveraged, and updated. Access is via an API, not through a file-system like more traditional storage.
There are a two key concepts to understand in the Object Storage API. The Object Storage API is organized around two types of entities:
Similar to the Unix programming model, an Object is a “bag of bytes” that contains data, such as documents and images. Containers are used to group objects. You can make many objects inside a container, and have many containers inside your account.
If you think about how you traditionally make what you store durable, very quickly you should come to the conclusion that keeping multiple copies of your objects on separate systems is a good way to do that. However, keeping track of multiple copies of objects is a pain, and building that into an app requires a lot of logic. OpenStack Object Storage does this automatically for you behind-the-scenes - replicating each object at least twice before returning ‘write success’ to your API call. It will always work to ensure that there are three copies of your objects (by default) at all times - replicating them around the system in case of hardware failure, maintanance, network outage or any other kind of breakage. This is very convenient for app creation - you can just dump objects into object storage and not have to care about any of this additional work to keep them safe.
The Fractals app currently uses the local filesystem on the instance to store the images it generates. This is not scalable or durable, for a number of reasons.
Because the local filesystem is ephemeral storage, if the instance is terminated, the fractal images will be lost along with the instance. Block based storage, which we’ll discuss in Section Five: Block Storage, avoids that problem, but like local filesystems, it requires administration to ensure that it does not fill up, and immediate attention if disks fail.
The Object Storage service manages many of these tasks that normally would require the application owner to manage them, and presents a scalable and durable API that you can use for the fractals app, without having to be concerened with the low level details of how the objects are stored and replicated, and growing the storage pool. In fact, Object Storage handles replication intrinsicly, storing multiple copies of each object and returning one of them on demand using the API.
First, let’s learn how to connect to the Object Storage Endpoint:
from libcloud.storage.types import Provider
from libcloud.storage.providers import get_driver
auth_username = 'your_auth_username'
auth_password = 'your_auth_password'
auth_url = 'http://controller:5000'
project_name = 'your_project_name_or_id'
region_name = 'your_region_name'
provider = get_driver(Provider.OPENSTACK_SWIFT)
swift = provider(auth_username,
auth_password,
ex_force_auth_url=auth_url,
ex_force_auth_version='2.0_password',
ex_tenant_name=project_name,
ex_force_service_region=region_name)
Warning
Libcloud 0.16 and 0.17 are afflicted with a bug that means authentication to a swift endpoint can fail with a Python exception. If you encounter this, you can upgrade your libcloud version, or apply a simple 2-line patch.
Note
Libcloud uses a different connector for Object Storage to all other OpenStack services, so a conn object from previous sections won’t work here and we have to create a new one named swift.
To begin to store objects, we must first make a container. Call yours fractals:
container_name = 'fractals'
container = swift.create_container(container_name=container_name)
print(container)
You should see output such as:
<Container: name=fractals, provider=OpenStack Swift>
You should now be able to see this container appear in a listing of all containers in your account:
print(swift.list_containers())
You should see output such as:
[<Container: name=fractals, provider=OpenStack Swift>]
The next logical step is to upload an object. Find a photo of a goat online, name it goat.jpg and upload it to your container fractals:
file_path = 'goat.jpg'
object_name = 'an amazing goat'
container = swift.get_container(container_name=container_name)
object = container.upload_object(file_path=file_path, object_name=object_name)
List objects in your container fractals to see if the upload was successful, then download the file to verify the md5sum is the same:
objects = container.list_objects()
print(objects)
[<Object: name=an amazing goat, size=191874, hash=439884df9c1c15c59d2cf43008180048, provider=OpenStack Swift ...>]
object = swift.get_object(container_name, object_name)
print object
<Object: name=an amazing goat, size=954465, hash=7513986d3aeb22659079d1bf3dc2468b, provider=OpenStack Swift ...>
import hashlib
print(hashlib.md5(open('goat.jpg', 'rb').read()).hexdigest())
7513986d3aeb22659079d1bf3dc2468b
Finally, let’s clean up by deleting our test object:
swift.delete_object(object)
Note
You need to pass in objects to the delete commands, not object names.
Now there should be no more objects be available in the container fractals.
objects = container.list_objects()
print(objects)
[]
So let’s now use the knowledge from above to backup the images of the Fractals app, stored inside the database right now, on the Object Storage.
Use the fractals‘ container from above to put the images in:
container_name = 'fractals'
container = swift.get_container(container_name)
Next, we backup all of our existing fractals from the database to our swift container. A simple for loop takes care of that:
import base64
import cStringIO
import json
import requests
endpoint = 'http://IP_API_1'
params = { 'results_per_page': '-1' }
response = requests.get('%s/v1/fractal' % endpoint, params=params)
data = json.loads(response.text)
for fractal in data['objects']:
response = requests.get('%s/fractal/%s' % (endpoint, fractal['uuid']), stream=True)
container.upload_object_via_stream(response.iter_content(), object_name=fractal['uuid'])
for object in container.list_objects():
print(object)
<Object: name=025fd8a0-6abe-4ffa-9686-bcbf853b71dc, size=61597, hash=b7a8a26e3c0ce9f80a1bf4f64792cd0c, provider=OpenStack Swift ...>
<Object: name=26ca9b38-25c8-4f1e-9e6a-a0132a7a2643, size=136298, hash=9f9b4cac16893854dd9e79dc682da0ff, provider=OpenStack Swift ...>
<Object: name=3f68c538-783e-42bc-8384-8396c8b0545d, size=27202, hash=e6ee0cd541578981c294cebc56bc4c35, provider=OpenStack Swift ...>
Note
Replace IP_API_1 with the IP address of the API instance.
Note
The example code uses the awesome Requests library. Ensure that it is installed on your system before trying to run the script above.
Warning
Currenctly it is not possible to directly store generated images on the OpenStack Object Storage. Please revisit this section again in the future.
One call we didn’t cover above that you probably need to know is how to delete a container. Ensure that you have removed all objects from the container before running this, otherwise it will fail:
for object in container.list_objects():
container.delete_object(object)
swift.delete_container(container)
Warning
It is not possible to restore deleted objects. Be careful.
You can also do advanced things like uploading an object with metadata, such as in this below example, but for further information we’ll refer you to the documentation for your SDK. This option also uses a bit stream to upload the file - iterating bit by bit over the file and passing those bits to swift as they come, compared to loading the entire file in memory and then sending it. This is more efficient, especially for larger files.
file_path = 'goat.jpg'
object_name = 'backup_goat.jpg'
extra = {'meta_data': {'description': 'a funny goat', 'created': '2015-06-02'}}
with open('goat.jpg', 'rb') as iterator:
object = swift.upload_object_via_stream(iterator=iterator,
container=container,
object_name=object_name,
extra=extra)
For efficiency, most Object Storage installations treat large objects (say, > 5GB) differently than smaller objects.
If you are working with large objects, use the ex_multipart_upload_object call instead of the simpler upload_object call. How the upload works behind-the-scenes is by splitting the large object into chunks, and creating a special manifest so they can be recombined on download. Alter the chunk_size parameter (in bytes) according to what your cloud can accept.
swift.ex_multipart_upload_object(file_path, container, object_name,
chunk_size=33554432)
You should now be fairly confident working with Object Storage. You can find more about the Object Storage SDK calls at:
https://libcloud.readthedocs.org/en/latest/storage/api.html
Or try a different step in the tutorial, including:
Except where otherwise noted, this document is licensed under Creative Commons Attribution 3.0 License http://creativecommons.org/licenses/by/3.0/legalcode.