martes, 13 de agosto de 2013

Porting xz: Part 2


Hi again. This is the second part of the description about porting xz.

We have to remember, that xz doesn't only accept filenames only on the arguments. It also has the arguments --files/--files0 wich are used to read the filenames from a text file. --files expect the filenames to end with the '\n' character, while --files0 expect them to end with a '\0' character (NUL, or 0).

To make things easier. I think that the best approach is to save filenames coming from both --files/--files0 and arguments, to pass all the filenames to the function that opens and limits them.

This way, we only have to save all the names using malloc() and strcpy().

One example is:

/* Declaration of the variable */
char ** files = malloc(8*sizeof(char *));

/* When opening files, we do this */
if (nfiles % 8 == 0 && nfiles != 0)
   files = realloc(files, (nfiles+8)*sizeof(char*));
[...]
size_t len = strlen(args.arg_names[i])+1;
files[i] = malloc(len);
strncpy(files[i], args.arg_names[i], len);

This is the implemented way to do this. The good thing is that it allows us to call after that io_open_files(files); (The function described in the first post).

So, how we've done to keep everything compartmentalized and compressing inside a sandbox? Here's the main() code that does it:

for( i = 0; i < nfiles; i++){
#if defined(CAPSICUM)
    if ( (forkpid = fork()) == -1 ){
        message_error("%d: %s", STDERR_FILENO, strerror(errno));
        exit(E_ERROR);
    } else if ( forkpid != 0) {
        /* Let the children compress */
        wait(NULL);
    } else if (forkpid == 0){
        capsicum_enter();
#endif
        run(pairs[i]);
        free(files[i]);
#if defined(CAPSICUM)
        exit(0);
    }
#endif
}

I think that the code is very self-explanatory. But anyways, a simple version is that for every file to compress, we simply call fork, and if there's any kind of error, it lets the user know. For the parent process, it simply waits the children to compress the file. And the children process, gets to capability mode, compress the file, and frees the file_pair associated with it

And with this, I think that the xz explanation is done.
(It really seems simpler than it is to implement and think about the whole program).

The next post will be about porting zlib (the deflate and gzip compressing library). But I don't think I will be writing the code there. If somebody reads this, please tell me if you either want code or don't. (Writting a  comment is easy, and it would be great to have someone commenting this :P)

Edit: Thanks to oshogbo, I discovered that cap_init() is a function used on a Casper public interface, so I changed it for capsicum_enter(). As such, I would like to thank him. Thanks, oshogbo! :)

Also, I would like to include capsicum_enter() code:

void
capsicum_enter(void)
{
	cap_rights_t rights;

	if( cap_rights_get(STDIN_FILENO, &rights) < 0 && errno != ENOSYS) {
		message_error("%d: %s", STDIN_FILENO, strerror(errno));
		exit(E_ERROR);
	} else if (rights == 0) {
		if (cap_rights_limit(STDIN_FILENO, CAP_WRITE) < 0 && errno != ENOSYS){
			message_error("%d: %s", STDIN_FILENO, strerror(errno));
			exit(E_ERROR);
		}
	}

	if( cap_rights_get(STDOUT_FILENO, &rights) < 0 && errno != ENOSYS) {
		message_error("%d: %s", STDOUT_FILENO, strerror(errno));
		exit(E_ERROR);
	} else if (rights == 0) {
		if (cap_rights_limit(STDOUT_FILENO, CAP_WRITE) < 0 && errno != ENOSYS){
			message_error("%d: %s", STDOUT_FILENO, strerror(errno));
			exit(E_ERROR);
		}
	}

	if (cap_rights_limit(STDERR_FILENO, CAP_WRITE) < 0 && errno != ENOSYS){
		message_error("%d: %s", STDERR_FILENO, strerror(errno));
		exit(E_ERROR);
	}

	if (cap_enter() < 0 && errno != ENOSYS){
		message_error("cap_enter: %s", strerror(errno));
		exit(E_ERROR);
	}

	return;
}

viernes, 2 de agosto de 2013

Porting xz: Part 1

Hello again!

As you know, I managed to port xz to the Capsicum framework. As always, you can see my project at my repo:

So, let's start explaining the porting effort on xz. This time, I'll try to get it down lo the code itself. The bad thing is that it seems that it will be very long to explain (but of course, it's easier that it will help somebody this way).

xz works with a struct file_pair. This handles all the information about the to-compress files, and the already compressed files. Let's see the code, and what we've added:


typedef struct {
 /// Name of the source filename (as given on the command line) or
 /// pointer to static "(stdin)" when reading from standard input.
 const char *src_name;

 /// Destination filename converted from src_name or pointer to static
 /// "(stdout)" when writing to standard output.
 char *dest_name;

#if defined(CAPSICUM)
 // File descriptor of the directory where the files are.
 int dir_fd;
#endif

 /// File descriptor of the source file
 int src_fd;

 /// File descriptor of the target file
 int dest_fd;

 /// True once end of the source file has been detected.
 bool src_eof;

        [...] <- There's some more metainfo here.

 /// Stat of the source file.
 struct stat src_st;

 /// Stat of the destination file.
 struct stat dest_st;
} file_pair;

As we can see, this struct handles all the information of the files. The red lines is something I've added. Let's see why.

By default, in the capability mode (after calling cap_enter() in a program), we can't make any system call using the global OS namespace at all. That is, we can't call open(), or unlink(), or stat(), since to do that we have to put the path (that's a global namespace!) as its arguments.

To do that we have to refer to the already open()ed files (to its file descriptors), ideally, by letting the operation be done after calling cap_rights_limit() to limit further the fd possible operations.

If we want to call something as fstat(), or futimes() it can work out easily, the problem comes when we have to delete something. And when we're talking about compressing, there is always something to delete. It could be that the compressing went wrong, and in that case we have to delete the new created file. It could also go everything allright, and then, we have to delete the original file.

So, if we want to delete something from capability mode, we have to call unlinkat(), and delete our file(s) referenced from the directory where these files are. That's the reason I have added a reference to where the files are. And since we can't get file(s) from one directory, and compress them on another, it already works this way.

To keep explaining this "issue", let's also take a look at where the files are opened:


/// Opens the source file. Returns false on success, true on error.
static bool
io_open_src_real(file_pair *pair)
{
 // There's nothing to open when reading from stdin.
 if (pair->src_name == stdin_filename) {
  pair->src_fd = STDIN_FILENO;
  return false;
 }

        [...]

 // Flags for open()
 int flags = O_RDONLY | O_BINARY | O_NOCTTY;

        [...]

 // Maybe this wouldn't need a loop, since all the signal handlers for
 // which we don't use SA_RESTART set user_abort to true. But it
 // doesn't hurt to have it just in case.
 do {
  pair->src_fd = open(pair->src_name, flags);
#if defined(CAPSICUM)
  pair->dir_fd = open( dirname(pair->src_name), O_DIRECTORY);
#endif
 } while (pair->src_fd == -1 && errno == EINTR && !user_abort);

        [...]

 // Drop O_NONBLOCK, which is used only when we are accepting only
 // regular files. After the open() call, we want things to block
 // instead of giving EAGAIN.
 if (reg_files_only) {
  flags = fcntl(pair->src_fd, F_GETFL);
  if (flags == -1)
   goto error_msg;

  flags &= ~O_NONBLOCK;

  if (fcntl(pair->src_fd, F_SETFL, flags))
   goto error_msg;
 }

 // Stat the source file. We need the result also when we copy
 // the permissions, and when unlinking.
 if (fstat(pair->src_fd, &pair->src_st))
  goto error_msg;

        [...]
}

We can see here all the syscalls done in  the function that opens the files. Before calling this function, the original file is opened, and its name is set in pair->src_name.

Now, the main problem is the way that the main function calls run(), wich is set to list_run or coder_run() depending of the mode in wich xz is executed. Before porting it to Capsicum it worked this way:

void (*run)(file_pair *pair) = opt_mode == MODE_LIST
     ? &list_file : &coder_run;
 
  // Process the files given on the command line. Note that if no names
  // were given, args_parse() gave us a fake "-" filename.
 for (size_t i = 0; i < args.arg_count && !user_abort; ++i) {
   if (strcmp("-", args.arg_names[i]) == 0) {
       //We treat everything differently then.
       [...]
   }

  // Do the actual compression or decompression.
  run(args.arg_names[i]);
  }

What's the problem with  this approach? We can't really limit the file descriptors, and we also have no good entry point to call cap_enter(). Thus, we need to change everything, and open all the files before thinking in working on the files. We also need to store all this opened files (in the format of file_pair) somewhere to be able to work on them.

And that's the reason I wrote io_open_files(). It implements a part of what both list_file() and coder_run() do when opening the files. Also, when it's done opening the files, a new function is also written, that is used to limit all the file descriptors on a given file_pair.

extern file_pair **
io_open_files(char *filename[], int files)
{
 int i;
 file_pair **pairs = (file_pair **)malloc(sizeof(file_pair **));

        [...]

 for ( i = 0; i< files; i++ ){
  if (filename[i] == NULL) 
   continue;

  // Set and possibly print the filename for the progress message.
  message_filename(filename[i]);

                [...]

  pairs[i] = (file_pair*)malloc(sizeof(file_pair *));
  if ( (pairs[i] = io_open_src(filename[i])) == NULL) 
   continue;

  if( opt_mode != MODE_TEST )
   io_open_dest(pairs[i]);
#if defined(CAPSICUM)
  limitfd(pairs[i]);
#endif
 }
 return pairs;
}

This way, when doing this, we get a nice array of file_pair waiting to be worked on (take a look at the blue colored code).

And this is the end of part 1. On part two, we'll see the other problem when dealing with this, and also, how I managed everything af main(). cya!

viernes, 19 de julio de 2013

Porting bzip2

It's been a great summer so far. At least I had advanced quite a lot in this GSoC. So far, I've already ported bzip2 and xz to Capsicum succesfully, but I'm talking about each one in a different post.

I'm also starting with zlib now, which is something entirely different. And, for now, every ported program is increasingly difficult.

bzip2

What where the problems I had? 

With bzip2, it was a fairly straightforward process, but since it was the first application ever ported by me, I had some basic problems.

One of them was choosing a bad design for the new program:

  1. bzip2 opens a pair of UNIX domain sockets.
  2. bzip2 forks for every file.
  3. The child process closes all its open file descriptors.
  4. The parent process sends the child the open file descriptors through the socket.
  5. The child process compress/decompress the file and exits.
  6. The parent process ends when all the childs are done.
The other problems that I had were some basic newbie problems (too much to count, or keep in a list) that luckily were solved quickly.

The good

I managed to get created a pair of functions that send and receive file descriptors through a UNIX socket. (I had a reference book called "UNIX: Advanced programming", and that doesn't even get commented there... so I got this beast instead :P)

I'm learning a LOT (and I'm doing it every day!).

During my work with bzip2 I learnt much about doing the basic UNIX stuff, and I also have seen some of the basic problems when we have to port any program to Capsicum.

The bad

I lost a whole week doing the file descriptor passing stuff. Which was half of the time allocated to porting any program. That means that I had to use 3 weeks instead of two! Oh, well... at least I'm slowly recovering the time. (But did I mention all the stuff I'm learning??)

The ugly

That design is not necessary at all! 

The beauty

This is the design that I finally used:
  1. Open the files.
  2. Limit the files.
  3. Enter capability mode.
  4. Fork, and let the child inherit and compress.
If we're under the FreeBSD version that added Capsicum, or later, we have some global variables that originally where local as all the FILE*, or the file descriptor of the directory where we're working, to be able to call unlinkat() (unlink() can't be used since it access the file system, and it's forbidden in capability mode).

The end

So, this is the basically the summary of all the bzip2-porting-related-stuff. I hope it's not a bad read, and that you're having at least half the fun I'm having doing this!

Good luck everybody with their projects!

PS: Oh, and by the way, I managed to use acme as my day-to-day text editor...and it's AWESOME. If you can, just take a look at it (you can find a fairly good demo here). If I get to write some of the programs that I would like to have to work with acme, I might mention them here, and upload its code somewhere.

martes, 18 de junio de 2013

Setting everything up

I said that I would like to explain what my set up is (and the adventures I had while doing it).

For starters, I tried downloading the whole head branch in its directories with svn. And then, simply compile it. Since I am crazy, I just didn't backup, hoping to get it right.

Well, unluckily for me, I did not once (a lot of time), but twice (a lot of time * 2) the whole process (deleting the code and downloading it again every time). That's like two or three days with no FreeBSD laptop, so I just used the other machine I have at home and waited to finish.

Since I couldn't get it to work properly (some very weird, vague and general bugs started to appear, concerning the ports (the configure script failed, but carried on, and I was unable to install anything).

So I had to use the snapshots. The problem? I had to back up everything, since it was all partitioned in one slice, and mounted as /. *palmface*

Once everything was backed up, I installed the snapshot, and made sure that I had at least one partition reserved to /usr (the biggest).

And as for the ports, I don't want to have to compile everything from source, but there are no official packages for pkgng (default package system in FreeBSD 10.0-CURRENT) due to a security issue, the only other option is using a non-official package server, like exonetric (Of course, when I really need to, I compile them using the port, like rxvt-unicode, which has no 256 color suport by default).

It's just a matter of setting up the PACKAGESITE to: http://mirror.exonetric.net/pub/pkgng/${ABI}/latest

And then: sudo pkg update
That's if you already have sudo, if not, just su, or login as root.

And, voilà :D

I also said that I wanted to use devel/plan9port but there seems to be some problem making it from the port. I already sent an email to the maintainer, and he's investigating the issue.

For now, I'm using X11, spectrwm, rxvt-unicode, xombrero, the default sh, and, the best combo when working from a terminal: tmux and vim (with the awesome and minimal eink colorscheme).




My next steps about the environment? Switch to devel/plan9port, and use acme(1).
My next steps about programming/results? We'll see that in the weekly status report :)

domingo, 9 de junio de 2013

Starting the summer of code!

So, first things first.

I opened this blog to talk about my GSoC during the summer of 2013, so, I will try to keep things tight and only talk about this here.

For now, I'm just updating my system to FreeBSD-CURRENT, compiling my ports and doing my homework, which involves some reading of the manuals about Capsicum, and the mailing lists.


For interested people, here's my link to my FreeBSD wiki page.

I will surely talk about my environment as well, since I will probably use devel/plan9ports, and it will be quite fun to set up.