Persistent memory technologies, such as Intel® Optane™ DC persistent memory, come with several challenges. Remote access seems to be one of the most difficult aspects of persistent memory applications because there is no ready-to-use technology that supports remote persistent memory (RPMEM). Most commonly used remote direct memory access (RDMA) for remote memory access does not consider data durability aspects.
This paper proposes solutions for accessing RPMEM based on traditional RDMA. These solutions have been implemented in the Persistent Memory Development Kit (PMDK) librpmem library. Before you read this part, which describes how to set up a machine to work with rpmem and set up another machine as the replication target, we suggest reading Part 1, “Understanding Remote Persistent Memory." The other parts in this series include:
- Part 3, "RDMA Enabling for Data Replication," describes how to enable RDMA for data replication.
- Part 4, "Persistent Memory Development-Kit Based PMEM Replication," discusses PMDK-based persistent memory replication.
Single Machine Setup
To start working with rpmem, the only thing required is a machine with a configured single virtual loopback network interface. Persistent memory is not required at this stage. This machine acts as an initiator and a target at the same time. In this case, rpmem replicates via the virtual loopback network interface (127.0.0.1). This is a good starting point to get to know the rpmem setup process. It is also a good enough configuration to develop software using rpmem without access to an RDMA-capable NIC.
Figure 1. Rpmem software stack on a single machine
This section describes how to configure rpmem components to define the replication of a pmem pool. The simple open-persist-close application will be used as an example of an rpmem-enabled application.
In the described setup, the librpmem library sends commands via a virtual loopback interface to start rpmemd and replicate data to a specific pmem pool. Note that rpmemd runs on the same machine as the application, but it is started via the SSH connection. All replication commands are executed on the same machine, but they are all transmitted over a virtual loopback interface.
Configuring the Platform
The operating systems establish the limits to what users can do, how many processes they can run, and how much memory they can use. The default values differ according to the distribution and version, for example:
Rpmem replication sometimes imposes requirements greater than the default values for these limits. Rpmem users have to be aware that:
- A single replication process may use many logical connections between the initiator and the target. Each one of them is represented by an open file descriptor.
- Memory registered for replication (on the initiator side as well as on the target side) counts against the limit of overall memory locked by the user.
File descriptors and locked memory are two limits that have to be taken into consideration. Both of them are configured by limits.confg(5). These limits can be increased by adding new lines to /etc/security/limits.conf, for example:
$ cat /etc/security/limits.conf # (...) * soft memlock unlimited * hard memlock unlimited * soft nofile 1048000 * hard nofile 1048000
Changes take effect in a new login session. For details on how to increase these limits, see the manual page limits.confg(5).
Rpmem Installation
The setup described in this section requires the following software components:
- librpmem, librpmem-devel (initiator)
- rpmemd (target)
In this example our initiator and target are the same machine, so the installation of all required software components is done with a single command:
The librpmem-devel package is required to compile rpmem-enabled applications. It is not required at run time.
SSH Configuration
Librpmem and rpmemd pass control messages over SSH. SSH, in order to connect without assistance, requires public key authentication without a passphrase.
- Generating an SSH authentication key pair:
In this part, rpmem is set up on the single machine but, in the general case, an SSH key pair has to be generated on the Initiator node and a public key has to be copied to the Target node.
- Allowing to connect to localhost via SSH:
In this part, the user account is used on both the initiator and the target. In the general case, accounts on the Initiator node and on the Target node may differ.
- Checking SSH connection:
Poolset
Poolset(5) is a persistent memory pool configuration file format defined by PMDK. The poolset file describes which memory devices and files are merged together into a single memory pool.
The rpmemd application uses poolset files to describe memory pools used as replicas on the target. The rpmemd poolset files cannot define local or remote replicas, so it looks as follows:
In the example above, the poolset file consists of the following sections:
- Poolset header required for validation purpose
- A memory pool description where each line points to exactly one file or device, which will be used as memory pool parts. The order of lines matters because it is also the order in which pool parts will be mapped to create a single memory pool. A memory pool must consist of one or more pool parts. A line describing the pool part starts with a file size followed by an absolute path to the part file.
Detailed syntax of poolset files is outside the scope of this paper. For details, see the poolset(5) manual page.
Rpmemd Configuration
Rpmemd requires a poolset file describing a replica memory pool.
By default, rpmemd looks for poolset files in the user’s home directory. It is possible to change the default search directory by setting poolset-dir in an rpmemd configuration file. For details, see the rpmemd(1) manual page.
In this part, rpmemd will use a single file on tmpfs (/dev/shm).
Librpmem Configuration
In the configuration described above, the replication goes via the virtual loopback interface, which does not support the verbs fabric provider. By default, the librpmem library refuses to use an interface if it does not support the verbs provider. To use the sockets provider, it is required to explicitly enable it using an environment variable.
Note The RPMEM_ENABLE_SOCKETS variable has to be set in an application’s environment each time it starts.
Creating a Memory Pool with a Replica
The code sample below shows the minimal viable example of using librpmem API from the simple open-persist-close application (manpage.c).
Assuming the source code is stored in the file called rpmem101.c, compile and link it using the following command:
After that, the resulting rpmem101 executable can be run to open the remote replica target.poolset on the target and perform a single rpmem_persist operation:
This application will perform a few simple steps:
- Create a replica using target.poolset on localhost
- Fill the local memory pool with zeros
- Replicate the data to the replica on localhost
- Close the replica
Inspecting the Results
If the command succeeds, the pool replica in tmpfs is created:
This can also be verified using hexdump to verify that the replica has the expected contents:
Replicating to Another Machine
The next step in replicating data to persistent memory setup (persistent memory is not yet installed) is to use another machine as the replication target. It starts with a similar setup as the one described in the previous part, but this time the single virtual loopback interface is replaced with two regular network interfaces (NICs). This setup won’t give the best possible performance. It is a more advanced configuration that provides a good example of how the rpmem replication of memory pools between machines works.
Figure 2. Rpmem software stack with two machines
In the described setup, the librpmem library sends commands to start rpmemd and replicate data to a specified memory pool. Commands are sent from the initiator to a network interface on the target machine via the network. The librpmem library runs on the initiator. The rpmemd daemon runs on the target. The source memory pool is located on the initiator and it is replicated to the replica on the target.
In this configuration, it is assumed that the initiator and the target are two separate machines that will require different configuration steps. To emphasize which command is performed on which machine, the steps are split into two groups as follows:
This section describes how to configure rpmem components to define the replication of a memory pool from the initiator to the replica on the target. Again, using the simple open-persist-close application (manpage.c).
Network Interfaces Configuration
The first step for the scenario described by this section is to configure NICs on both machines to allow routing TCP packets between them. An example network configuration might look like this:
Figure 3. Network configuration of two machines
The ping command can be used to make sure the initiator and the target are connected:
Note A firewall configuration on the target node may prevent pinging from the initiator. Firewall configuration is outside the scope of this paper.
Configuring Platforms
The platform configuration steps are the same as the case of a single machine replicating data via a virtual loopback device. The rpmem replication between two machines may be subject to limits established via limits.conf(5). Limits are modified on the initiator and on the target separately. For details, see “Single Machine Setup."
Rpmem Installation
Rpmem requires the following software components:
initiator:
- librpmem
- librpmem-devel (for linking an application)
target:
- rpmemd
SSH Configuration
Librpmem and rpmemd pass control messages over SSH. In order to connect without the assistance, SSH requires public key authentication without a passphrase. Librpmem initializes the SSH connection from the initiator to the target so all SSH configuration is done on the initiator.
In this part, the user account is used on both the initiator and the target. In the general case, accounts on the Initiator node and on the Target node don’t have to be the same.
Rpmemd Configuration
The rpmemd application manages replicas on the target, so it is required to describe the replica on the Target node as an rpmemd poolset file:
Librpmem Configuration
In the described configuration, the replication goes via the network interface, which requires the sockets fabric provider. By default, the librpmem library refuses to use an interface if it does not support the verbs provider. To use the sockets provider, it is required to explicitly enable it using an environment variable.
Creating a Memory Pool with a Replica
The rpmem101 application, open-persist-close (manpage.c), from the previous section is also used here. Its compilation and linking should be performed on the initiator. To allocate memory on the initiator and replicate its content to the replica on the target via rpmem, execute the following commands:
Inspecting the Results
If the command succeeds, it creates a replica pool in tmpfs on the target. Since rpmem101 allocates a memory pool from DRAM, there are no visible results on the initiator.
This can also be verified using hexdump to verify that the replica has the expected contents:
Proceed to Part 3, "RDMA Enabling for Data Replication," which describes how to build an RDMA-capable network.
Other Articles in This Series
Part 1, “Understanding Remote Persistent Memory," describes the theoretical realm of remote persistent memory.
Part 4, "Persistent Memory Development Kit-Based PMEM Replication," describes how to create and configure an application that can replicate persistent memory over an RDMA-capable network.
"