Keeping connections alive with libcurl

libcurl is quite a comfortable option to transfer files across a variety of network protocols, e.g. HTTP, FTP and SFTP.

It’s really easy to get started: downloading a single file via http or ftp takes only a couple of lines.

Drip, drip..

But as with most powerful abstractions, it is a bit leaky. While it does an excellent job of hiding such steps as name resolution and authentication, these steps still “leak out” by increasing the overall run-time.

In our case, we had five dozen FTP servers and we needed to repeatedly download small files from all of them. To make matters worse, we only had a small time window of 200ms for each transfer.

Now FTP is not the most simple protocol. Essentially, it requires the client to establish a TCP control connection, that it uses negotiate a second data connection and initiate file transfers.

This initial setup phase needs a lot of back and forth between server and client. Naturally, this is quite slow. Ideally, you would want to do the connection setup once and keep both the control and the data connection open for subsequent transfers.

libcurl does not explicitly expose the concept of an active connection. Hence you cannot explicitly tell the library not to disconnect it. In a naive implementation, you would download multiple files by simply creating an easy session object for each file transfer:

for (auto file : FILE_LIST)
{
  std::vector<uint8_t> buffer;
  auto curl = curl_easy_init();
  if (!curl)
    return -1;
  auto url = (SERVER+file);
  curl_easy_setopt(curl, CURLOPT_URL,
    url.c_str());
  curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION,
    appendToVector);
  curl_easy_setopt(curl, CURLOPT_WRITEDATA,
    &buffer);
  if (curl_easy_perform(curl) != CURLE_OK)
    return -1;

  process(buffer);
  curl_easy_cleanup(curl);
}

That does indeed reset the connection for every single file.

Re-use!

However, libcurl can actually keep the connection open as part of a connection re-use mechanism in the session object. This is documented with the function curl_easy_perform. If you simply hoist the easy session object out of the loop, it will no longer disconnect between file transfers:

auto curl = curl_easy_init();
if (!curl)
  return -1;

for (auto file : FILE_LIST)
{
  std::vector<uint8_t> buffer;
  auto url = (SERVER+file);
  curl_easy_setopt(curl, CURLOPT_URL, 
    url.c_str());
  curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, 
    appendToVector);
  curl_easy_setopt(curl, CURLOPT_WRITEDATA, 
    &buffer);
  if (curl_easy_perform(curl) != CURLE_OK)
    return -1;

  process(buffer);
}
curl_easy_cleanup(curl);

libcurl will now cache the active connection in the session object, provided the files are actually on the same server. This improved the download timings of our bulk transfers from 130ms-260ms down to 30ms-40ms, quite the enormous gain. The timings now fit into our 200ms time window comfortably.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.