WORD PTR

Pointed Development

Linux Daemon Lock File – a Bug Retrospective

| Comments

I wrote a daemon framework in C. It has bugs.

Two years ago I posted a tutorial on writing a Linux daemon in C.

I want to continue fleshing out the Linux daemon framework, and as a consequence, I began re-evaluating the code: It needs a little tender loving care. As part of that exercise, I started writing a retrospective on the nmc_daemonizer_set_pid_lock method: The initial tutorial glossed over the method and it’s not exactly clear what nmc_daemonizer_set_pid_lock does. It seemed like a good place to start.

However, while writing the post I discovered a bug! And what I thought was a simple correction turned out to be a rather large rework of the pid lock mechanism.

This bug has existed for two years. That’s embarrassing.

Let’s look at what I found.

The Bug

Here’s the original (bugged/bug/buggy) version of the method. I added /* <-- BUGGY */ annotations where appropriate.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
/**
 * Open the lock file name and save the PID of the daemon into it.
 * @param self pointer to an instance of the daemonizer.
 * @return NMC_SUCCESS.
 */
static nmc_status_t nmc_daemonizer_set_pid_lock(const nmc_daemonizer_t *self) {
  nmc_status_t res = NMC_SUCCESS;                         /* <-- BUGGY */
  nmc_configuration_t *config = NULL;
  char *lock_file_name = NULL;

  config = self->data->config;
  lock_file_name = config->get_lock_file_path(config);
  int lfp = open(lock_file_name, O_RDWR | O_CREAT, 0640); /* <-- BUGGY */

  if(lfp != -1) {
    if(get_set_lock(lfp, 0, SEEK_SET, 0) != -1) {         /* <-- BUGGY */
      if(ftruncate(lfp, 0) == 0) {
        char pidtext[NMC_MAX_LINE];
        snprintf(pidtext, sizeof(pidtext), "%ld\n", (long)getpid());
        int r = write(lfp, pidtext, strlen(pidtext));
        if(r != -1) {
          res = NMC_SUCCESS;
        }
      }
    }
  }

  if(res != NMC_SUCCESS) {                                /* <-- BUGGY */
    exit(EXIT_FAILURE);
  }

  return res;
}

The key line is this line:

1
  nmc_status_t res = NMC_SUCCESS;

I initialize my result variable to NMC_SUCCESS. The res variable is never subsequently changed or checked! By the time it gets to the end of the method, res will never be NMC_FAILURE and the exit(EXIT_FAILURE) line will never execute.

This means that for two years the pid lock check hasn’t actually worked. You could start up multiple instances of the daemon. That sucks.

Oh but it gets worse!

Additionally, I thought it would be clever (and that’s the problem right there) to use fcntl to acquire a file lock (get_set_lock is a wrapper around fcntl). This was simply a bad idea in general and I ended up tripping over a couple of fcntl gotchas. I simplified the lock to a basic if-exist-if-can-exclusively-create check and removed all of the cruft I added as a result of trying to use fcntl. The end result is cleaner and less error-prone.

Finally, I inadvertently left two calls to the pid lock method within the nmc_daemonizer_daemonize code: One call before forking, one call after forking. I removed the incorrect first call (before the fork) and slightly refactored the daemonizing method: It now correctly creates the pid lock file with the proper umask after forking but before it nulls out the file descriptors.

Corrected Version

After correcting the bug, I’m left with the following (much cleaner) code as of 2012-11-25 (please see the nmc_daemonizer.c file within the GitHub repository for the latest revision):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
/**
 * Open the lock file name and save the PID of the daemon into it.
 * @param self pointer to an instance of the daemonizer.
 * @return NMC_SUCCESS.
 */
static nmc_status_t nmc_daemonizer_set_pid_lock(const nmc_daemonizer_t *self) {
  nmc_status_t res = NMC_FAILURE;
  nmc_configuration_t *config = NULL;
  char *lock_file_name = NULL;

  config = self->data->config;
  lock_file_name = config->get_lock_file_path(config);
  struct stat sts;
  if(stat(lock_file_name, &sts) != 0 && errno == ENOENT) {
    int lfp = open(lock_file_name, O_WRONLY | O_CREAT | O_EXCL, 0640);
    if(lfp > -1) {
      if(ftruncate(lfp, 0) == 0) {
        char pidtext[NMC_MAX_LINE];
        snprintf(pidtext, sizeof(pidtext), "%ld\n", (long)getpid());
        int r = write(lfp, pidtext, strlen(pidtext));
        if(r > 0) {
          self->data->created_pid_lock_file = 1;
          res = NMC_SUCCESS;
        }
      }
    }
  }

  if(res != NMC_SUCCESS) {
    nmc_log(stdout, config, LOG_INFO, "Could not acquire exclusive lock %s due to %i: %m", lock_file_name, errno);
  }

  return res;
}

Lessons Learned

exsvcd has been around for two years. It’s old, but it’s not at all mature! I have no idea if anyone actually uses the code, but I suspect not: I’ve received a single bug report since I posted the code to GitHub.

Part of doing this retrospective exercise is to:

  • Go back and review the code
  • Find bugs and fix them
  • Mature the code

Finding bugs and learning – no matter how embarrassing – is one of my favorite aspects of development.

Comments