Unlock the Power of DAOS in Python with PyDAOS

ID 标签 751481
已更新 11/17/2021
版本 Latest
公共

Introduction

Distributed Asynchronous Object Storage (DAOS) is high-performance storage that pushes the limits of Intel hardware. It's based on Intel Xeon, Persistent Memory and NVMe SSDs. It has been awarded top spots in the IO500 and IO500 10-node Challenge multiple times in the past several years. For more information, please refer to DAOS: Revolutionizing High-Performance Storage with Intel® Optane™ Technology.

This article will show how to easily interface with DAOS from Python through the PyDAOS package. There are two key advantages of using a dictionary stored in DAOS as opposed to a regular Python dictionary. The first is that you can manipulate a gigantic Key-Value (KV) store, given that nothing is stored in local memory. The second is that your dictionary is persistent (no need to sync your data to disk), which means that if you quit your program and reload it, your data will still be there.

It is esential to point out that PyDAOS is a work in progress. For example, only keys and values of type string are currently supported. In addition, the only supported data structure is a dictionary (although arrays will be included in the near future).

Installing DAOS also installs PyDAOS automatically. The location (in Linux) is:

<DAOS_INSTALLATION_DIR>/lib64/python3.6/site-packages/

If such path is not found by Python automatically, you can add it manually using sys:

import sys sys.path.append("<DAOS_INSTALLATION_DIR>/lib64/python3.6/site-packages/")

This is usually not required if you install DAOS from repository packages.

Pools and Containers

The first thing you will need to use PyDAOS is an existing DAOS pool and container. At the moment, both have to exist beforehand; it is not possible to create pools or containers from the Python API. To create a pool, you run:

$ dmg pool create --label=pydaos --size=1G Creating DAOS pool with automatic storage allocation: 1.0 GB total, 6,94 tier ratio Pool created with 100.00%,0.00% storage tier ratio -------------------------------------------------- UUID : 1a0ce47b-fb70-46e9-9564-c7dfbf43fdd7 Service Ranks : 0 Storage Ranks : 0 Total Size : 1.0 GB Storage tier 0 (SCM) : 1.0 GB (1.0 GB / rank) Storage tier 1 (NVMe): 0 B (0 B / rank)

That command will create 1GiB pool labeled pydaos (you can choose any other name). Next, we can create our container inside this pool running:

$ daos cont create --type=PYTHON --pool=pydaos --label=kvstore Container UUID : a7e1fa95-89f0-4685-8a66-61abf20e57db Container Label: kvstore Container Type : PYTHON

We pass the type PYTHON to indicate that the container will be used from a client written in Python. The type serves to designate a pre-defined layout of the data in terms of the underneath DAOS object model. For example, other available types are HDF5 and POSIX. The label kvstore is arbitrary: you can choose any other name.

PyDAOS Step by Step

First, we have to make sure that we import all the necessary classes:

from pydaos import (DCont, DDict, DObjNotFound)

DCont represents a container, DDict a dictionary, and DObjNotFound is used to catch exceptions raised when objects are not found in a container.

But before we can get or create objects, we have to create a Python container object by passing the pool and container labels:

daos_cont = DCont("pydaos", "kvstore", None)

We can also use the last parameter to create our object using the path to the container in unified namespace:

daos_cont = DCont(None, None, "daos://pydaos/kvstore")

Now we can get (or create) a dictionary object. We can use the DObjNotFound exception to create the dictionary if it doesn't exist:

daos_dict = None try: daos_dict = daos_cont.get("dict-0") except DObjNotFound: daos_dict = daos_cont.dict("dict-0")

Again, the name dict-0 is arbitrary. Now that we have our dictionary object, we can start inserting, reading, and deleting keys against our DAOS container.

Insert a New Key

To insert a new key, use put():

key = "dog" value = "perro" daos_dict.put(key, value)

Get a Key

To get a key, use the [] interface (as in native Python dictionaries):

try: value = str(daos_dict[key]) except KeyError: print("key not found")

Delete a Key

To delete a key, use pop():

daos_dict.pop(key)

Iterate the Whole Dictionary

We can iterate the whole dictionary as we would do with a native Python dictionary:

for key in daos_dict: print("key=" + key + " value=" + str(daos_dict[key]))

Bulk Insertion

PyDAOS dictionaries allow us to also insert and read in bulk. We can do bulk insertion by passing a Python dictionary to bput():

python_dict = {} python_dict[key0] = value0 python_dict[key1] = value1 python_dict[key2] = value2 ... daos_dict.bput(python_dict)

Read in Bulk

To read in bulk, we pass a Python dictionary with the keys that we want to read to bget():

python_dict = {} python_dict[key0] = None python_dict[key1] = None python_dict[key2] = None ... daos_dict.bget(python_dict)

It is also possible to read all keys in bulk with dump():

python_dict = daos_dict.dump()

Total Number of Keys

Finally, we can get the total number of keys stored in our dictionary with len():

print("dictionary has " + str(len(daos_dict)) + " keys")

A complete Example

Now that we have all the pieces, let’s put them together to create a complete example. The example is a simple program (kvmanage.py) to manage a DAOS KV store interactively through a command line interface (CLI):

from pydaos import (DCont, DDict, DObjNotFound) print("==========================") print("== KV STORE WITH PYDAOS ==") print("==========================") daos_cont = DCont("pydaos", "kvstore", None) daos_dict = None try: daos_dict = daos_cont.get("dict-0") except DObjNotFound: daos_dict = daos_cont.dict("dict-0") while True: cmd = input("\ncommand (? for help)> ") if cmd == "?": print("?\t- print this help") print("r\t- read a key") print("ra\t- read all keys") print("i\t- insert new key") print("d\t- delete key") print("ib\t- insert new keys in bulk") print("rb\t- read keys in bulk") print("q\t- quit") elif cmd == "r": key = input("key? ") try: print("\tkey: " + key + "\tvalue: " + str(daos_dict[key])) except KeyError: print("\tError! key not found") elif cmd == "ra": print("\ndict len = " + str(len(daos_dict))) for key in daos_dict: print("\tkey: " + key + "\tvalue: " + str(daos_dict[key])) elif cmd == "i": print("\ninserting new key") print("(enter nothing for key to skip)") value = "" key = input("key? ") while key != "" and value == "": value = input("value? ") if value != "": daos_dict.put(key, value) elif cmd == "d": print("\ndeleting key") print("(enter nothing for key to skip)") key = input("key? ") if key != "": daos_dict.pop(key) elif cmd == "ib": print("\ninserting new keys in bulk") print("(enter nothing for key to finish)") python_dict = {} value = "" key = input ("key[0]? ") i = 0 while key != "": value = input("value[" + str(i) + "]? ") if value == "": continue python_dict[key] = value i += 1 key = input ("key[" + str(i) + "]? ") print("inserting ", end = " ") print(python_dict) daos_dict.bput(python_dict) print("done") elif cmd == "rb": print("\nread keys in bulk") print("(enter nothing for key to finish)") python_dict = {} key = input ("key[0]? ") i = 0 while key != "": python_dict[key] = None i += 1 key = input ("key[" + str(i) + "]? ") print("reading = ", end = " ") print(python_dict) daos_dict.bget(python_dict) print("result = ", end = " ") print(python_dict) elif cmd == "q": break print("---") print("\nend")

The program accepts multiple commands to manage a KV store: read a key, read all keys, insert a new key, delete a key, insert new keys in bulk, read keys in bulk, and quit. The program runs an infinite loop until the user selects the quit command.

For example, to insert a new key:

$ python3 kvmanage.py ========================== == KV STORE WITH PYDAOS == ========================== command (? for help)> ? ? - print this help r - read a key ra - read all keys i - insert new key d - delete key ib - insert new keys in bulk rb - read keys in bulk q - quit --- command (? for help)> i inserting new key (enter nothing for key to skip) key? dog value? perro --- command (? for help)>

Now we can read all keys and see our newly inserted key:

command (? for help)> ra dict len = 1 key: dog value: b'perro' --- command (? for help)>

An Additional Example Using JSON Files

Below is a simple example demonstrating the use of PyDAOS with json files. Traditionally, a user can perform read/write operations in memory, but with the PyDAOS API, we can utilize DAOS’s performance with simple KV store operations.

First create your respective pool and container (which was already sown above), and then verify their creation through some simple commands on the CLI.

$ daos pool query pydaos_json Pool 31d7b053-b4c6-4c73-8d58-2c221d829815, ntarget=512, disabled=0, leader=1, version=1 Pool space info: - Target(VOS) count:512 - Storage tier 0 (SCM): Total size: 2.0 TB Free: 2.0 TB, min:4.0 GB, max:4.0 GB, mean:4.0 GB - Storage tier 1 (NVMe): Total size: 12 TB Free: 11 TB, min:22 GB, max:22 GB, mean:22 GB Rebuild idle, 0 objs, 0 recs
$ daos cont query pydaos_json kvstore Container UUID : 09b2bd13-fd38-4e80-be25-c9af6e4d7605 Container Label : kvstore Container Type : PYTHON Pool UUID : 31d7b053-b4c6-4c73-8d58-2c221d829815 Number of snapshots : 0 Latest Persistent Snapshot : 0 Highest Aggregated Epoch : 418681269376745486 Container redundancy factor: 0 Snapshot Epochs :

In json_example.py, we can take currency information from two json files—conversions.json and data.json—and then use that information to display simple exchange rates from the US Dollar to a requested currency. The file conversions.json contains the exchange rates from 29th of October of 2021. These rates are used in conjunction with data from data.json, which contains specific information pertaining to each of these currencies.

We begin by connecting to the pool and container created through the CLI, where the pool’s label is “pydaos_json”, and our container’s “kvstore.” Then we open data.json and store the information into our KV container named “kvstore” through put() operations.

data.json:

{ "USD": { "symbol": "$", "name": "US Dollar", "symbol_native": "$", "decimal_digits": 2, "rounding": 0, "code": "USD", "name_plural": "US dollars" }, "CAD": { "symbol": "CA$", "name": "Canadian Dollar", "symbol_native": "$", "decimal_digits": 2, "rounding": 0, "code": "CAD", "name_plural": "Canadian dollars" }, "EUR": { "symbol": "€", "name": "Euro", "symbol_native": "€", "decimal_digits": 2, "rounding": 0, "code": "EUR", "name_plural": "euros" }, "AUD": { "symbol": "AU$", "name": "Australian Dollar", "symbol_native": "$", "decimal_digits": 2, "rounding": 0, "code": "AUD", "name_plural": "Australian dollars" }, "CNY": { "symbol": "CN¥", "name": "Chinese Yuan", "symbol_native": "CN¥", "decimal_digits": 2, "rounding": 0, "code": "CNY", "name_plural": "Chinese yuan" }, "SGD": { "symbol": "S$", "name": "Singapore Dollar", "symbol_native": "$", "decimal_digits": 2, "rounding": 0, "code": "SGD", "name_plural": "Singapore dollars" } }

conversions.json:

{ "provider": "https://www.exchangerate-api.com", "WARNING_UPGRADE_TO_V6": "https://www.exchangerate-api.com/docs/free", "terms": "https://www.exchangerate-api.com/terms", "base": "USD", "date": "2021-10-29", "time_last_updated": 1635510901, "rates": { "USD": 1, "CAD": 1.23, "AUD": 1.33, "CNY": 6.39, "EUR": 0.857, "SGD": 1.34, } }

json_example.py:

from pydaos import (DCont, DDict, DObjNotFound) from random import randrange from timeit import default_timer as timer import ast daos_cont = DCont("pydaos_json", "kvstore", None ) daos_dict = None try: daos_dict = daos_cont.get("dict-0") except DObjNotFound: daos_dict = daos_cont.dict("dict-0") with open('data.json') as json_file: data = json.load(json_file) for i in data: daos_dict.put(i,str(data[i])) while True: cmd = input("\ncommand (? for help)> ") if cmd == "?": print("\n? - print this help") print("\nlc - list currencies") print("\nlci - list a specific currency's details") print("\ngxr - get exchange rate for an inputted currency to the US Dollar ") print("\ncxr - convert exchange rate for an inputted currency to the USD Dollar") elif cmd == "lc": for i in daos_dict: print(i) elif cmd == "lci": currency = input("Enter a currency: ") currency_dict = ast.literal_eval(daos_dict[currency].decode('utf-8')) print("The symbol for ", currency, " is ", currency_dict["symbol"]) print("The name is ", currency_dict["name"], " or ", currency_dict["name_ plural"], ".") print("The native symbol is ", currency_dict["symbol_native"]) elif cmd == "gxr": currency = input("what currency? ") currency_dict = ast.literal_eval(daos_dict[currency].decode('utf-8')) with open('conversions.json') as usd_exchange_rates: usd_data = json.load(usd_exchange_rates) print("Requested", currency_dict["name_plural"], " whose current exchange rate is ", currency_dict["symbol"], usd_data["rates"][currency], "for $1 USD") elif cmd == "cxr": currency = input("What currency do you want to convert to? ") currency_amount = input("How much USD do you currently have? ") currency_dict = ast.literal_eval(daos_dict[currency].decode('utf-8')) with open('conversions.json') as usd_exchange_rates: usd_data = json.load(usd_exchange_rates) converted_currency = int(currency_amount) * usd_data["rates"][currenc y] print("Converted ", currency_amount ,"US Dollars to", converted_curr ency, currency_dict["name_plural"]) elif cmd == "q": break

Running it:

$ python3 json_example.py command (? for help)> ? ? - print this help lc - list currencies lci - list a specific currency's details gxr - get exchange rate for an inputted currency to the US Dollar cxr - convert exchange rate for an inputted currency to the US Dollar command (? for help)> lci Enter a currency: EUR The symbol for EUR is € The name is Euro or euros . The native symbol is € command (? for help)> gxr what currency? EUR Requested euros whose current exchange rate is € 0.857 for $1 USD command (? for help)> cxr What currency do you want to convert to? EUR How much USD do you currently have? 150 Converted 150 US Dollars to 128.55 euros command (? for help)>

Summary

In this introductory article, we showed how to easily interface with DAOS from Python through the PyDAOS package. The PyDAOS dictionary API was presented, describing each operation in detail with small code snippets to ease understanding. After that, two complete working examples were presented. As mentioned in the introduction, the PyDAOS package is still a work in progress and more features will be supported in the future. Stay tuned.

"