Wednesday, December 26, 2012

LXC lies about /.. inode number making FlexLM unhappy

One of our customers wanted to run a flexlm licensed tool on the Amazon EC2 cloud. He turned to us over here at Cloud Niners and we started doing the work. The approach we chose was to start an ec2 Ubuntu precise machine, host a CentOS LXC container on it and run the license daemon inside that. The reason for the CentOS choice is that the tools and license daemons are only certified on RHEL and CentOS is the closest next thing

Trying to start the license daemon, I hit the following error

 8:59:27 (TOOLNAME) Cannot open daemon lock file

Checking the usual suspicions of permissions and the like led no where. Google immediately led me to similar problems for people wanting to run Flexlm tools under Solaris zones (which are quite similar to Linux LXC containers). I was reading about a guy facing a similar problem, and how Solaris' legend Brendan Gregg wrote dtrace scripts to run-time patch memory structures to resolve the issue.

At first I was dismissive that this was what I was actually see'ing. A quick "ls -lai /" just to confirm /. and /.. actually had the same inode number, and I was sure it was something else

# ls -lai / | grep '\.$'
  256 dr-xr-xr-x   1 root root   212 Dec 26 09:08 .
  256 dr-xr-xr-x   1 root root   212 Dec 26 09:08 ..



but I had one of those mmm moments after stracing the binaries and confirming they are actually failing immediately after calling getdents on "/". This sounds suspiciously close to what the Solaris folks were see'ing. I grabbed gcc and built the sample code in getdents kernel doc page (Thankful!) And much to my surprise, inode numbers for /. and /.. were actually different!

# ./a.out / | grep '\.$'
     256  directory    24            1  .
   42669  directory    24          2  ..



Of course this makes sense since the LXC guest was just another directory on the host, but I didn't suspect ls -i was actually lying inside guests!

At this point, I'm not exactly sure how to resolve my issue apart from reinstalling the LXC guest on a separate block device (like lvm) which I think should resolve the issue. This blog post is simply to confirm the issue, and to gather feedback and potential solutions from smart people reading this. Shoot me a comment