Setting up a File and Print Server under Crunchbang Linux

Overview

I have an old computer that I'm not using much: it's occasionally a media player, but mostly just sits there. It occurred to me recently that I could use it to serve files to other computers in my apartment, and that this would have several benefits for me: for instance, that my nightly backup script, which previously required that my primary laptop be physically connected to the hard drive to which the files were backed up, could run if both computers were just on the same network. Too, there are other files on external hard drives (movies, for instance) that I need access to occasionally, but not enough that I'm willing to tether myself to the study in order to get at them. I'd prefer that they just be available over the air wherever I am.

At the same time, I'd love to share at least one of my printers over the air so that I can print from the living room. I've got an old Okidata laser printer and a much less old HP Deskjet F4480. Both work under Crunchbang. I'll probably want to connect the F4480 physically to the computer that I primarily use, since I'll also want to scan with it, which means that I'll have to live with it only being available when it's physically connected to my computer. That's OK with me.

Finally, I want at least a subset of these resources to be accessible to my girlfriend, who runs Windows.

So here are my priorities, really, in roughly decreasing order:

  1. I want my backup script to run every night without me having to plug the computer into the external hard drive every night. If I pass out reading on the couch and the computer is sitting next to me on the coffee table, I'd still like the backup to occur.
  2. I want files on the file server, or at least the interesting ones, to be browsable from anywhere in the house by another computer running Linux.
    • A subsidiary goal: I want at least some of these files also accessible to someone running Windows. More specifically:
      • I want my girlfriend (and potentially other guests) to be able to get at my media collection.
      • I'd like to give my girlfriend access to a folder on the file server so that she can make backups, too.
      • I'd like access to her folder to involve at least nominal identity checking, so that not any random user who guesses the password to connect to the router can see her files.
      • Although I trust her not to use too much space, I'd like to learn how to limit the size of the directory she can write to.
  3. I want to be able to share the laser printer.
  4. I want the inkjet printer to be shared when it's physically connected to the primary computer I use, the one that's not a file server.

If you're following along, you should adapt these instructions to your own circumstances. I would guess that this would work in more or less the same way in pretty much any Debian-derived distribution, including Ubuntu, Linux Mint, and Bodhi, all of which I've used in the past. But adapting to a different distro, I think, is less work than adapting to your particular circumstances, so I'm trying hard to be explicit about why I'm doing what I'm doing and how I came to the decisions that I came to.

This document looks like a long description in comparison to a lot of other Linux howtos that are posted on people's blogs, and this might be intimidating. It shouldn't be: it's long because I'm trying to be excruciatingly complete and specific about what I'm doing and why I'm doing it, in large part because I hate that Linux how-tos online are often so laconic about explaining why they're doing what they're doing. All of which is to say that typing this description up took much longer than doing what the description describes, and that the length of this description shouldn't scare you off from trying to follow it.

Specifications

The computer I'm using as a file server is a Dell Latitude D420 (yes, really) with 1GB of RAM and a clock running at a maximum of 1.2 GHz. I have two external drives attached to it through a USB hub: one is a 320GB drive, which I call "Externa," and the other is a 3TB drive, which I call "Backups." Alas, the D420 only has a USB 1.1 controller, but honestly, network speed is a bigger bottleneck for me, as tests have shown.

The file server has the network name patrick-laptop, though I now wish I'd picked something more descriptive. The other Linux computer that will be connecting to the server is called liniscious; it's a Sony VAIO VGN-FZ250E, though that's not really relevant. It's also running Crunchbang Waldorf, based on Debian unstable. It's my primary computer and is frequently out of the apartment. Nevertheless, that's the computer to which the inkjet/scanner is attached.

Setting up the Network

Getting Started

On the server

I installed a fresh version of Crunchbang Linux (I'm using the current release, Waldorf, which is based on the current release of Debian, called Wheezy.) I picked Crunchbang because it's lightweight and fast while giving me a usable operating system; I use it on my primary computer, and I'm familiar with it.

I installed Crunchbang over the edition of Bodhi Linux that I'd previously been using, primarily because Crunchbang is faster and less resource-intensive ... and because I really hate the Enlightenment window manager that Bodhi defaults to. This is of course a personal choice, but I found Crunchbang to work more intuitively once I got used to it. I'm keeping my existing installation of Windoze XP on the file server, just in case I ever need to boot into Windoze for anything again. After all, it's my last computer that even has a Windoze installation. So here's me showing my partition list for the internal hard drive:

patrick@patrick-laptop:~$ sudo parted [sudo] password for patrick: GNU Parted 2.3 Using /dev/sda Welcome to GNU Parted! Type 'help' to view a list of commands. (parted) print Model: ATA TOSHIBA MK8009GA (scsi) Disk /dev/sda: 80.0GB Sector size (logical/physical): 512B/512B Partition Table: msdos Number Start End Size Type File system Flags 1 32.3kB 23.1GB 23.1GB primary ntfs boot 2 23.1GB 24.1GB 1074MB primary linux-swap(v1) 4 25.4GB 60.0GB 34.6GB extended 5 25.4GB 44.3GB 18.9GB logical ext4 6 44.3GB 60.0GB 15.7GB logical ext4 (parted) quit

The first partition (/dev/sda1) is my Windows partition. The second is my swap file for the Linux installation. Then there's an extended partition (#4) containing partition #5 (Crunchbang) and the unused partition #6. I'll return to disk partitioning in a bit.

Once I installed Crunchbang, I immediately added the unstable sources to /etc/apt/sources.list, because I may want to use this computer for other things eventually, and I find that using Crunchbang based on Debian unstable (Jessie) makes for a more usable system in my particular case, though likely not in yours, for reasons that I'm not going to go into now. I accomplished this by copying all lines that referred to wheezy, pasting them into the file again later, and changing one of each duplicate line to say jessie rather than wheezy. So I wound up with an /etc/apt/sources.list file that looks like this:

## CRUNCHBANG ## Compatible with Debian Wheezy, but use at your own risk. deb http://packages.crunchbang.org/waldorf waldorf main deb-src http://packages.crunchbang.org/waldorf waldorf main ## DEBIAN deb http://http.debian.net/debian jessie main contrib non-free deb-src http://http.debian.net/debian jessie main contrib non-free ## DEBIAN SECURITY deb http://security.debian.org/ jessie/updates main deb http://http.us.debian.org/debian/ jessie contrib non-free main # deb-src http://security.debian.org/ jessie/updates main ## DEBIAN (Wheezy) deb http://http.debian.net/debian wheezy main contrib non-free deb-src http://http.debian.net/debian wheezy main contrib non-free ## DEBIAN SECURITY (Wheezy) deb http://security.debian.org/ wheezy/updates main deb http://http.us.debian.org/debian/ wheezy contrib non-free main deb http://packages.mate-desktop.org/repo/ubuntu precise main # deb-src http://packages.mate-desktop.org/repo/ubuntu precise main # deb-src http://security.debian.org/ wheezy/updates main

Once this was done, I ran sudo apt-get update; sudo apt-get dist-upgrade in a terminal and sorted out several package conflicts. (I would not have had to solve package conflicts if I were sticking purely with the standard repositories, in fact, but there you have it. In any case, the fact that it's unlikely to be relevant to anyone following along is the reason I'm not documenting it in detail.)

While all packages were updating to their latest versions, I configured the router.

Configuring the Router

From any computer, ideally one connected to the router with an Ethernet cable

What I essentially need to do with the router is to give some of the devices on the network a static IP address — not relative to the Internet as a whole, of course, but relative to the local network. After all, home networking is IP-based too. It turns out that my modem/router, a Linksys E3000, assigns itself the IP address 192.168.0.1 on the local network, so typing http://192.168.0.1 in a browser address bar gets me to the router's interface. It prompts me for a username and password, then displays a set of web pages that I can use to configure it. In my case, I needed to click on the MAC filtering link to get the MAC addresses of the various pieces of hardware currently connected, then click on Lan IP to assign each individual computer (MAC address) a permanent IP address. Of course, while I was there I changed the router's name and passwords, too.

Your particular process may vary depending on what router you're using, though. If you're following along, this is one of those places where you'll have to consult your own hardware's documentation. Chances are it'll be at least conceptually similar. Every router I've ever worked with has offered its configuration options via a web interface that appears when a browser is pointed at its IP address. Consult your router's documentation.

Initial Server Setup

On the server

Before I set up the server as a server, I need to set up the server as a computer. I've already done a lot of this by installing Crunchbang, but there are a few more things that need doing to make this setup work as easily as possible. First is to reboot the computer and make sure that it's not set to boot from an external USB drive; on the D420, this involves mashing F2 on startup and changing the order of boot devices so that the computer boots from the internal hard drive before checking USB drives. (This is probably unnecessary for virtually everyone else, even if using the same computer model; my problem is that, at some point, I've managed to install the GRUB bootloader to one of my external hard drives, and the GRUB config on that drive no longer points to useful boot information. The upshot is that booting with the drive plugged in results in a GRUB error message. I've been unable to remove GRUB and asking for help on Reddit didn't result in any working solutions. On the other hand, I really don't care that much; it's a non-problem with the drives plugged into the computer that's going to be a file server once that computer's set not to boot from an external drive. Which it now is.)

I want the computer to automatically login when it's turned on, because I want to be able to tell someone else to turn it on and know that it'll start working for them, if they're depending on it in some way, without my having to give them a password, which I'd rather not do. I don't give out my passwords, ever, as a security policy. Configuring auto-login in Crunchbang can be easily done through a GUI; just type gksudo slimconf in a terminal, choose the user account that should be logged into automatically, and check the "Auto Login" button, then click "Save," then click "Exit." Easy enough.

What I want to do next is ensure that the external drives mount in the same place on every boot. To guarantee this, I'm going to edit the /etc/fstab (file system table) file, which is just a text file describing canonical locations for where in the big filesystem tree various things should be mounted. (Here's a good Ubuntu-based tutorial ... which works just find in Crunchbang). Every line in fstab specifies a something (what's going to be mounted?) and a somewhere (where's it going to be mounted in the file tree?), plus some other options. There are a lot of ways to specify what the something is, including block device names like /dev/sdb1 ... but these block device names can change from boot to boot. In my fstab files, I specify partitions in terms of their UUID, a long hex string with hyphens, which never changes (well, unless you change it manually; but why would I do that?). You can find the universally unique identifier for a block device from the command line by doing something like this in a terminal:

patrick@linicious:~$ blkid -o value -s UUID /dev/sda1 190e74c2-444e-4159-b2fd-2d7330c7af83

although, to be honest, I tend to just find it through GParted, which has a graphical interface and makes it easy to cut and paste this information.

In any case, I have two external drives I'd like to make available over the network: Backups and Externa. I want to mount them in the same place every time (I went, unimaginatively, with /media/Backup and /media/Externa) so that other software knows where they are. This means adding two lines to my /etc/fstab file, which now looks like this (note that it looks an awful lot like the default /etc/fstab file created during Crunchbang installation):

# /etc/fstab: static file system information. # # Use 'blkid' to print the universally unique identifier for a # device; this may be used with UUID= as a more robust way to name devices # that works even if disks are added and removed. See fstab(5). # # <file system> <mount point> <type> <options> <dump> <pass> # / was on /dev/sda6 during installation UUID=a6b140d0-fbb9-4950-89f4-cd8269a38d5b / ext4 errors=remount-ro 0 1 # /mnt/mint was on /dev/sda5 during installation UUID=657a2404-6206-4425-9947-463b4fa34c6e /mnt/mint ext4 defaults 0 2 # WINDOZE was on /dev/sda1 during installation UUID=22B4F825B4F7F95D /mnt/Windoze ntfs defaults,users 0 2 # Backups was on /dev/sdb2 during installation UUID=3b101588-3674-43cd-9112-2dc33a17d14c /media/Backup ext4 defaults,nofail 0 3 # Externa was on /dev/sdc1 during installation UUID=efbc36f1-42d0-4621-be91-2ba829683d61 /media/Externa ext4 defaults,users,nofail 0 3 # swap was on /dev/sda2 during installation UUID=d39b9b40-a741-4ff6-863b-7665a1936c11 none swap sw 0 0 /dev/sr0 /media/cdrom0 udf,iso9660 user,noauto 0 0

I've highlighted the lines I added above. Note that I added the nofail option to both drives because I'd like, in theory, to be able to use this computer for other things, such as a backup laptop, and want booting to continue without needing to be connected to the external hard drives if and when I ever take the D420 out of the house again. Again, all pretty straightforward. I'm just using /etc/fstab to control where in the filesystem hierarchy drives are mounted. This is exactly what it's for.

Then I turned the system's swappiness value down to 10.

Sharing the external drives with NFS

Because I have an existing backup script that copies my home folder to one of the external hard drives that I just moved, my immediate priority is to make sure that the backup happens tonight without my having to do too much work. (I recently talked about my existing backup script on Reddit. It was based on this article, now gone from its original URL but happily preserved by the Internet Wayback Machine.) Setting up Linux-only sharing between the computers is one way to let this happen without modifying my existing backup script (though I'll do that later) because the Linux file system is very flexible, and I can mount the network share corresponding to the external hard drive in the same location as the external hard drive used to be mounted. Which will have the benefit of making the change to the overall file system organization transparent to the backup script. Plus, I can in other ways make the files on the file server — movies, for instance — locally available.

In what follows, I'm largely following this tutorial, which has once again been happily preserved from the vicissitudes of time and link rot by the Internet Archive.

On the server

  1. I need to install a package to support NFS, the network file system. Typing sudo apt-get install nfs-kernel-server in the terminal installs it.
  2. I also need to know the IP address of the server relative to the local router. I know this from setting up the router, of course, but if I needed to figure it out, I could use the ifconfig command, as such:

    patrick@patrick-laptop:~$ ifconfig

    It then spits out a bunch of information, ending with this:

    wlan0 Link encap:Ethernet HWaddr 00:13:e8:7e:3f:87 inet addr:192.168.0.2 Bcast:192.168.1.255 Mask:255.255.255.0 inet6 addr: fe80::213:e8ff:fe7e:3f87/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:2515382 errors:0 dropped:33588 overruns:0 frame:0 TX packets:964498 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:1601121261 (1.4 GiB) TX bytes:173394415 (165.3 MiB)

    The IP address for the server that I want is 192.168.0.2, from the section ending in wlan0: wlan0 is the name of the network interface for the wireless connection. In no case should I choose 127.0.0.1 as the IP address, because that has a special meaning: it means this computer from any computer, so I can't ever use it to access this computer from another computer, because it will always mean this computer from whatever computer I type it into. Which is to say that I need to find another IP address for this computer relative to the local router.

    I also need to know the address for the client computer, which I can find the same way from the other computer, if I need to. I happen to have configured the client to have the IP address 192.168.0.3, but you may have configured it differently when you set up your router.

  3. Next, I'm going to create a "fake" (well ... only valid for the server) hostname for the client computer. To do this, I edit the /etc/hosts file, which maps hostnames to IP addresses. So I back up my existing /etc/hosts file (sudo cp /etc/hosts /etc/hosts.old in a terminal), then associate the hostname liniscious (the name for the client) with the IP address 192.168.0.3. To do this, I type gksudo pluma /etc/hosts in a terminal, then add the line 192.168.0.3 patrick-laptop to the end of the file. (Of course, I could use another editor than Pluma, if I wanted.) So here's my complete /etc/hosts file, which, again, you may notice is very similar to the default one that's set up by the operating system:

    127.0.0.1 localhost 127.0.1.1 linicious # The following lines are desirable for IPv6 capable hosts ::1 ip6-localhost ip6-loopback fe00::0 ip6-localnet ff00::0 ip6-mcastprefix ff02::1 ip6-allnodes ff02::2 ip6-allrouters 192.168.0.2 liniscious

    Once again, I've highlighted the line that I've added.

  4. Theoretically, this means that I can use the name liniscious to refer to the client instead of its IP address in more or less any command that normally expects an IP address. I want to test this to make sure that it's working by pinging the server:

    patrick@patrick-laptop:~$ ping -c 1 liniscious

    I should get output that looks like this:

    PING liniscious (192.168.0.3) 56(84) bytes of data. 64 bytes from liniscious (192.168.0.3): icmp_seq=1 ttl=47 time=46.3 ms --- liniscious ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 13.883/13.883/13.883/0.000 ms

    The important thing here is that the ping command is working; if it weren't, I'd get a message that said something like unknown host, and in that case, I'd want to re-check my work in previous steps (and double-check that the computers really are both connected to the same router).

  5. I next need to allow the client to access the server for NFS-related issues. To do so, I'm going to edit the /etc/hosts.allow file (first back it up with sudo cp /etc/hosts.allow /etc/hosts.allow.old, then gksudo pluma /etc/hosts.allow) and then add the line ALL: 192.168.0.2 to the end. (Note that this is not one of the places where I can use the hostname I just defined, alas.)
  6. Now I need to list the directories on the server that should be accessible over the network. To do this, I'm going to edit the /etc/exports file, which lists, naturally enough, the directories that should be "exported," i.e. made available via NFS to other computers. For each folder that I want to exported, I need to create a line that says something like /media/Backup 192.168.0.3(rw,sync,subtree_check,no_root_squash) to the /etc/exports file.

    Note that there's no space between the IP address and the opening parentheses that contain the filesystem options. Here's what the options mean:

    rw
    Mount the folder read/write. This allows me to write to the shared directories on the server from the client.
    sync
    Force the server to complete all writes immediately.
    subtree_check
    Probably overkill, but this forces the server to actually verify that files that are accessed are definitely in a directory that's been exported.
    no_root_squash
    By default, accessing the server from the client while the client is signed in as the root account doesn't allow root privileges on the server. I may want to get root access from the client sometimes, so I'm disabling this.

    Here's my /etc/exports file for the server patrick-laptop:

    fill me in

  7. Now that I've changed the NFS setup, I'm going to restart the NFS server so it recognizes the new options. Typing sudo /etc/init.d/nfs-kernel-server restart is supposed to do this, though some people on the Internet complain that it doesn't work (it did for me). If that's the case for you, you'll need to restart the server.
  8. If you don't have it already, get the IP address for the server. You can do this either by logging back into your router, if you didn't write the address down earlier, or by following the directions in step 2, above.

Finally, time to do some setup on the client!

On the client

  1. I need to install a different NFS software package to support networking functions. Typing sudo apt-get install nfs-common in a terminal does that.
  2. I'm going to edit the /etc/hosts file on the client to give a pseudo-name to the server, just as I did on the server to give a pseudo-name for the client. So, sudo cp /etc/hosts /etc/hosts.old in a terminal to make a backup, then gksudo pluma /etc/hosts to open up the file. Once again, I'm just going to add the line 192.168.0.3 patrick-laptop to the end of the file.
  3. Once again, I'm going to make sure that I can use the name patrick-laptop from the client. So ping -c 1 patrick-laptop in a terminal should result in similar success messages as it did on the server.
  4. Since I used to mount the drives accessed over the network directly on the client, I already have folders created where they used to be mounted (in my case, at /media/Externa and /media/Backup). But if I didn't have them already created, I could do this in a terminal:

    sudo mkdir /media/Externa sudo mkdir /media/Backup

  5. Now I want to make sure (finally!) that I can actually mount the remote drives on the local filesystem. Here's how I'm checking this in a terminal:

    sudo mount -t nfs patrick-laptop:/media/Backup /media/Backup sudo mount -t nfs patrick-laptop:/media/Externa /media/Externa

    Note that I'm mounting the network shares in the same place on the client as they're physically located on the server. This isn't actually necessary, but it's convenient, because I don't have to remember a set of mappings later on.

  6. Once that's successful, I want to check that I am, in fact, able to browse the server's shares from the client. Basically, I just want to make sure that the files are visible and readable from the paths at which I expect them to be, i.e. that the files I expect to see in the locations /media/Backup and /media/Externa — that is, the files on the external hard drives that I'm sharing — are actually there. There are a number of ways to do this; maybe the simplest way would be to point my file browser (I use PCManFM) at both of those locations and see whether the files are actually there.

    Alternately, I could do this in a terminal, checking to see that the output of the ls command lists the files that I expect to be there:

    cd /media/Backup ls -la cd /media/Externa ls -la

  7. Great! Everything's working so far! But I don't want to have to mount the network shares manually every time I reboot, so I'm going to modify the /etc/fstab file on the client so that it mounts them automatically on every boot (well ... to be more specific, every time mount -a is called; booting is one of those times). As always, I make a backup of the config file I'm about to modify (sudo cp /etc/fstab /etc/fstab.old in a terminal) in case I screw it up, then gksudo pluma /etc/fstab to open up the file.

    Here's my /etc/fstab file after I've modified it. (You'll note that I have several other Linux distributions installed, though I basically only ever run Crunchbang.) Once again, I've bolded the lines I've added.

    # /etc/fstab: static file system information. # # Use 'blkid' to print the universally unique identifier for a # device; this may be used with UUID= as a more robust way to name devices # that works even if disks are added and removed. See fstab(5). # # <file system> <mount point> <type> <options> <dump> <pass> # / was on /dev/sda7 during installation UUID=159f51c0-dc13-4f62-83bc-d9f9938cb3fe / ext4 errors=remount-ro 0 1 # /home was on /dev/sda3 during installation UUID=c4bdfaa0-d2b0-4c06-8d29-474e9cf7bf19 /home ext4 defaults 0 2 # swap was on /dev/sda2 during installation UUID=c7aa48cc-e372-4775-bcdf-d7941b4a180e none swap sw 0 0 # Mint was on /dev/sda1 during installation UUID=190e74c2-444e-4159-b2fd-2d7330c7af83 /mnt/mint ext4 defaults 0 2 # Bodhi was on /dev/sda6 during intallation UUID=c8064cf8-e633-4fe2-90aa-ffbdda30c23f /mnt/bodhi ext4 defaults 0 2 # tmp was on /dev/sda5 during installation UUID=22f11a68-bfc2-4196-8b0f-c2601a0296d1 /tmp ext4 defaults 0 2 /dev/sr0 /media/cdrom0 udf,iso9660 user,noauto 0 0 #network shares from the little computer used as a file server patrick-laptop:/media/Externa /media/Externa nfs defaults,nofail,_netdev 0 0 patrick-laptop:/media/Backups /media/Backup nfs defaults,nofail,_netdev 0 0

    The manpage for fstab (here, or man fstab in a terminal) has examples of using the /etc/fstab file to automatically mount network drives, and you should look at it if you need more information. Still, some general notes on the added lines:

    • The file system field in the file should be the name of the server, a colon, a single slash, and then the full path to the file from the server's perspective.
    • The next field is the location on the client where the network share will be mounted. After that comes the filesystem type: nfs for the Network File System, of course.
    • The default mount options are the next field. nofail means "don't hold up booting if the network share's not available," which it won't be if I take the client elsewhere (as I frequently do). _netdev means "don't try to mount the drives until a network interface is available," which it won't be until later in the boot process than the boot process's invocation of mount -a.
    • I use zero for both the dump and pass fields of the entries, because these control how often and in what order the mount command will run filesystem checks on the shares. But I don't want the client trying to run filesystem checks on the shares; it doesn't have access to them at the block-device level, anyway, so there's no point. If maintenance needs to be done on those drives, the server should be doing it, not the client. Setting these values to zero prevents the client from trying to run checks on mount.

    Once I've saved and closed the file, I want to check that everything's working, so I do this in a terminal:

    sudo umount /media/Backup sudo umount /media/Externa sudo mount -a

    The first two lines unmount the network shares from the client. The third line should mount them again. To check this, I'll do the same thing I did in the previous step and, as always, if it doesn't work, I'll go back and check the work that I did.

So the real test as to whether it works or not is whether the files are usefully accessible. My litmus test here involves two things. The first is: can I play movies that are part of the server's network share on the client? (It turns out that I can, without any noticeable lag). The second test is: can I run my nightly backup script? After all, if the network share is mounted in the same place as the external drive that correspond's to the network share's previous location, then the backup script should work as-is. I keep my backup script in ~/.scripts/, so typing ~/.scripts/home-backup.sh (or, since I've added that directory to my $PATH, just home-backup.sh) should work. And it does: I ran my backup script manually, and the backup happened almost as normal.

Almost is a key word here: the backup happens much slower than normal. This is to be expected, I guess: transferring files across the network to a drive that's connected to another computer is slower than transferring them directly to a drive that's attached to the computer running the script. Normally, it takes several minutes to run my backup script; running it across the network took more than an hour and a half. Most of that time was spent transferring a single 3.2 GB file, which is used to store my archived emails in Thunderbird (yes, I am, in fact, an information packrat). The simple fact is that this is just too slow.

However, there's an easy way to speed it up. My backup script (which, after all, is short and simple) uses rsync to do the real work. rsync works perfectly fine as a way of syncing and copying files on the local filesystem, but it really shines as a way of copying files across a network connection. The problem that I'm currently having is that rsync isn't taking advantake of these capabilities right now, because the way that I'm invoking it treats the network shares as if they were local folders. After all, that's what NFS does: it makes network shares available as if they were local folders. So rsync doesn't even realize that it's copying files over a network connection, and therefore isn't employing any of the network-copying optimizations.

So I need to invoke rsync from my script in a way that gets it to employ network-transfer optimizations. More specifically, I want it transfer only the parts of files that have changed: after all, though the file containing my Thunderbird archive changes every time I move messages into it, these are small changes to a very large file. Why transfer the whole thing again? Only the changes really need to be transferred (and then re-applied to the existing file on the other end). Transferring only the changed bits of the file means that a lot of the file won't need to be transferred. This is the idea behind rsync so-called delta-transfer algorithm.

For this to work, rsync needs to be installed and running on both machines, and each rsync computes checksums for individual chunks of files that have changed, then transfers the chunks with different checksums and applies the differences to the file in the backup. But how do I get the rsync running on the network client, my primary laptop, to communicate with an rsync running on the file server? There are two ways: having rsync running in daemon mode on the file server, or using SSH to start up an rsync on the file server when I need it. It turns out the second one is easier to set up and meets my needs perfectly.

You can read more about rsync on its manpage (here, or man rsync in a terminal).

Setting up SSH

I'll be using SSH, the secure shell, to start rsync on the file server from the client's script. SSH is pretty easy to use; it's just a way of logging into another computer on the network, authenticating with public-key cryptography. (Well, SSH can do other things, too, but that's what I'm concerned with right now.) A side benefit of setting up SSH is that it also allows me to log into the file server from a terminal on my primary laptop, so I can get at any information that happens to be on there, or administer the system remotely, without having to be in the same room as it. (This has come in handy more than once while writing up this description.)

In this section of the description, I'm largely following this howto on the Crunchbang forums.

  1. sudo aptitude install openssh openssh-server will make sure everything is installed that's necessary.

On the client

  1. Again, sudo aptitude install openssh openssh-server will make sure everything is installed that's necessary.
  2. Next I want to make sure that I can log into the remote system with SSH:

    patrick@linicious:~$ ssh patrick-laptop

    should prompt me for my password (on the remote machine, i.e. the server), then log me into the remote machine. You'll notice that the bash prompt that Crunchbang sets up by default will change to reflect the fact that I'm now in a terminal that's connected to a different computer:

    patrick@linicious:~$ ssh patrick-laptop Last login: Thu Jul 24 03:53:28 2014 from liniscious -bash: warning: setlocale: LC_ALL: cannot change locale (en_US.UTF-8) -bash: warning: setlocale: LC_ALL: cannot change locale (en_US.UTF-8) -bash: warning: setlocale: LC_ALL: cannot change locale (en_US.UTF-8) patrick@patrick-laptop:~$

    There are a few miscellaneous errors, but they don't seem to cause me any trouble.

    I'm going to poke around on the remote machine and verify to my own satisfaction that I'm actually logged into it:

    patrick@patrick-laptop:~$ cd ~ patrick@patrick-laptop:~$ ls -la total 328 drwxr-xr-x 35 patrick patrick 4096 Jul 16 10:58 . drwxr-xr-x 3 root root 4096 Dec 14 2012 .. drwx------ 2 patrick patrick 4096 Jul 16 10:58 .TrueCrypt -rw------- 1 patrick patrick 59 Jul 9 02:41 .Xauthority -rw-r--r-- 1 patrick patrick 1723 Jul 3 08:03 .Xresources drwx------ 2 patrick patrick 4096 Jul 7 23:55 .aptitude -rw------- 1 patrick patrick 4473 Jul 24 04:06 .bash_history -rw-r--r-- 1 patrick patrick 220 Jul 3 08:03 .bash_logout -rw-r--r-- 1 patrick patrick 3392 Jul 3 08:03 .bashrc drwxr-xr-x 4 patrick patrick 4096 Jul 10 13:22 .bluefish drwxr-xr-x 11 patrick patrick 4096 Jul 10 13:21 .cache drwxr-xr-x 30 patrick patrick 4096 Jul 10 13:21 .config -rw-r--r-- 1 patrick patrick 1814 Jul 3 08:03 .conkyrc drwx------ 3 patrick patrick 4096 Jul 3 14:38 .dbus drwxr-xr-x 2 patrick patrick 4096 Jul 3 14:38 .fontconfig drwxr-xr-x 2 patrick patrick 4096 Jul 3 08:03 .fonts lrwxrwxrwx 1 patrick patrick 43 Jul 4 19:48 .fonts.conf -> /home/patrick/.config/fontconfig/fonts.conf drwx------ 3 patrick patrick 4096 Jul 9 02:41 .gconf -rw-r----- 1 patrick patrick 0 Jul 10 22:51 .gksu.lock -rw-r--r-- 1 patrick patrick 107 Jul 16 10:56 .gmrun_history -rw-r--r-- 1 patrick patrick 404 Jul 3 08:03 .gmrunrc drwx------ 4 patrick patrick 4096 Jul 3 14:38 .gnome2 drwx------ 2 patrick patrick 4096 Jul 3 14:38 .gnome2_private drwx------ 2 patrick patrick 4096 Jul 6 04:53 .gphoto drwxr-xr-x 2 patrick patrick 4096 Jul 9 02:41 .gstreamer-0.10 -rw------- 1 patrick patrick 278 Jul 12 21:57 .gtk-bookmarks -rw-r--r-- 1 patrick patrick 557 Jul 3 08:03 .gtkrc-2.0 -rw-r--r-- 1 patrick patrick 1 Jul 3 08:03 .gtkrc-2.0.mine drwx------ 2 patrick patrick 4096 Jul 3 14:38 .gvfs drwxr-xr-x 2 patrick patrick 4096 Jul 3 08:03 .icons drwx------ 3 patrick patrick 4096 Jul 7 23:54 .kde drwxr-xr-x 3 patrick patrick 4096 Jul 3 08:03 .local drwx------ 4 patrick patrick 4096 Jul 7 11:22 .mozilla -rw-r--r-- 1 patrick patrick 98 Jul 3 08:03 .pbuilderrc -rw-r--r-- 1 patrick patrick 808 Jul 3 08:03 .profile drwx------ 2 patrick patrick 4096 Jul 3 14:38 .pulse -rw------- 1 patrick patrick 256 Jul 3 14:38 .pulse-cookie drwxr-xr-x 2 patrick patrick 4096 Jul 10 23:30 .ssh drwxr-xr-x 2 patrick patrick 4096 Jul 3 08:03 .themes drwx------ 4 patrick patrick 4096 Jul 7 11:03 .thumbnails drwxr-xr-x 2 patrick patrick 4096 Jul 3 08:03 .xchat2 -rw-r--r-- 1 patrick patrick 7470 Jul 3 08:03 .xscreensaver -rw------- 1 patrick patrick 115196 Jul 16 10:58 .xsession-errors drwxr-xr-x 2 patrick patrick 4096 Jul 11 00:15 Desktop drwxr-xr-x 2 patrick patrick 4096 Jul 3 08:03 backup drwxr-xr-x 2 patrick patrick 4096 Jul 3 08:03 bin drwxr-xr-x 2 patrick patrick 4096 Jul 3 08:03 documents drwxr-xr-x 2 patrick patrick 4096 Jul 3 08:03 downloads drwxr-xr-x 3 patrick patrick 4096 Jul 3 08:03 images drwxr-xr-x 2 patrick patrick 4096 Jul 3 08:03 music drwxr-xr-x 2 patrick patrick 4096 Jul 3 08:03 templates drwxr-xr-x 2 patrick patrick 4096 Jul 3 08:03 tmp drwxr-xr-x 2 patrick patrick 4096 Jul 3 08:03 videos

    If I were on my primary computer, there would be many many more files and directories in my home directory. Note that I really can do just about anything on the remote computer through the SSH connection that I could do in a terminal on the machine itself:

    patrick@patrick-laptop:~$ sudo aptitude update; sudo aptitude upgrade [sudo] password for patrick: Get: 1 http://security.debian.org jessie/updates InRelease [73.5 kB] Get: 2 http://http.us.debian.org jessie InRelease [155 kB] Ign http://packages.crunchbang.org waldorf InRelease Get: 3 http://security.debian.org wheezy/updates InRelease [103 kB] [etc.] Fetched 831 kB in 1min 34s (8825 B/s) Current status: 7 updates [+7], 356 new [+1]. The following packages will be upgraded: libnet-ssleay-perl libperl5.18 libsnmp-base libsnmp30 perl perl-base perl-modules 7 packages upgraded, 0 newly installed, 0 to remove and 0 not upgraded. Need to get 11.0 MB of archives. After unpacking 455 kB will be freed. Do you want to continue? [Y/n/?]

    It turns out that updating the system remotely via SSH in this way is slower than walking over to the server and typing the same command, but I can do it if I want to.

    When I'm done poking around, logout or exit will end my remote connection:

    patrick@patrick-laptop:~$ exit logout Connection to patrick-laptop closed. patrick@linicious:~$

    Note again that the hostname — the section after the at-sign in my bash prompt — changes to reflect the fact that I'm not connected to the remote machine any more.

  3. Being able to log into the remote machine manually is helpful — I can do roughly everything through the terminal that I could do on a terminal if I were sitting in front of the server — but I don't want to have to type in my password every time I connect, since I want to be able to connect from a script, after all.

    The solution here is to set up OpenSSH so that I can connect with a keypair: that is, I need to generate a keypair, keep the private key on the client (my primary laptop) and putting the public key in a known location on the file server. This is pretty easy to do.

    patrick@linicious:~$ ssh-keygen

    This will ask me for some information; every tie I'm asked for something, I'm just going to hit Enter to accept the default value. In particular, I'm not going to add a password to the key, because my backup script can't supply it. The whole point of setting open key-based authentication here is to avoid being prompted for a password.

    The output should look something like this:

    Generating public/private rsa key pair. Enter file in which to save the key (/home/patrick/.ssh/id_rsa): Enter passphrase (empty for no passphrase): Enter same passphrase again: Your identification has been saved in /tmp/id_rsa. Your public key has been saved in /tmp/id_rsa.pub. The key fingerprint is: e2:1b:58:b7:d9:ab:eb:a9:91:ea:c9:d5:ee:a8:51:1a patrick@linicious The key's randomart image is: +--[ RSA 2048]----+ | | | | | | | | | E + S | | B = + | | + * + . | | . = * . . | | .*.++Bo. | +-----------------+

  4. Now I've got a public/private keypair, I need to put the public key on the file server, in a place where I can find it easily. I'm just going to put it in the home directory for now:

    scp ~/.ssh/id_rsa.pub patrick@patrick-laptop:~/

    puts it there. Then I'm going to connect to the remote machine again:

    patrick@linicious:~$ ssh patrick-laptop

    I'm now in the file server's home directory in my terminal. This is where I left the public key I generated on the client. Now what I need to do is add the public key I generated to the list of keys that the server's SSH trusts:

    patrick@patrick-laptop:~$ cat ~/id_rsa.pub >> ~/.ssh/authorized_keys

    Note that I'm appending the key to the end of the list (<<) rather than overwriting it.

    Since I no longer need the copy of the key I dumped into the home folder on the file server, I can remove it:

    patrick@patrick-laptop:~$ rm ~/id_rsa.pub

  5. Now I need to modify the server's SSH configuration options, which involves editing a text file. I could do this by walking over to the server and opening up the relevant file for editing, which would have the advantake of letting me use a graphical text editor, but I'm lazy and don't mind text editing in a console application. Besides, I'm already logged in to the file server from my primary laptop.

    However, if I'd rather use a graphical text editor like pluma, I could walk over to the server and, instead of doing what I'm about to describe in the next paragraph, make a backup with sudo cp /etc/ssh/sshd_config /etc/ssh/sshd_config.old and then open up the sshd_config file with gksudo pluma /etc/ssh/sshd_config &.

    First, sudo cp /etc/ssh/sshd_config /etc/ssh/sshd_config.old to make a backup of the existing configuration in case I screw it up, then sudo nano /etc/ssh/sshd_config to open it up in a console editor (nano is plenty easy to use for a simple task).

    I want to find and modify two lines in the file to let me log in with my RSA-based keypair. They may not be found together, but each line should look like this:

    • RSAAuthentication yes
    • PubkeyAuthentication yes

    And then I'm going to turn off the automatic request for a password on login:

    • ChallengeResponseAuthentication no
    • PasswordAuthentication no
    • UsePAM no

    Once again, those lines will probably not be found all together in the sshd_config file, so I'll have to hunt for each one separately.

  6. Now I need to reload the SSH config file on the server:

    sudo /etc/init.d/ssh reload

    If it doesn't work, make sure that the server software is installed (sudo aptitude install openssh-server). If it still doesn't work, log out from your connection to the file server from the client (logout), walk over to the server and reboot it.

Speeding up the backup script

Being able to log into the file server remotely may come in handy, but it's not my original point in setting up an SSH login: I want to be able to use SSH as a transport for rsync. Now that I've set up SSH in the first place, this is easy.

Several things about my setup have changed since I talked about my backup script on Reddit. In particular, I'm no longer storing some of the information on external hard drives that I was when I wrote that, so life is much simpler.

In a nutshell, here's my backup script:

#!/bin/bash if [ -d /media/Backup/Backups ]; then rsync -azvhP --delete --log-file=/home/patrick/Desktop/home_backup.log --exclude ".thumbnails" --exclude ".gvfs" --exclude ".cache" -e ssh /home/patrick patrick@patrick-laptop:/media/Backup/Backups fi

I save this text in the file ~/.scripts/home-backup.sh on the client, my primary laptop. Of course, the file holding the script has to be executable (chmod +x ~/.scripts/home-backup.sh).

There are only two lines that do anything. First, the script checks that the network share (patrick-laptop:/media/Backup) that's normally mounted on my primary laptop, the client, at /media/Backups is actually mounted locally at /media/Backup. (Remember that I'm using the same path for the drive that I used to use when the hard drive was mounted locally as the path at which I set it up on the file server, then just mount the network share corresponding to that drive in its old location. This makes things easier for me to remember, but perhaps a little harder to describe.) It does this by checking for the presence of a known subfolder, /media/Backup/Backups, which is, not surprisingly, the folder containing my backups.

The second line calls rsync to make the backup, manually excluding a few directories (the multiple --exclude options)that I want to go out of my way to avoid backing up, from my home directory (/home/patrick) and all its subdirectories to the folder /media/Backup/Backups shared by the computer patrick-laptop on my network. I'm keeping a log file on my desktop (--log-file=/home/patrick/Desktop/home_backup.log) so I can tell what happens if I need to (and, if I delete the log file after I read it, its reappearance is a rough heuristic for whether the script has run and done something useful since I last deleted the log file).

Since rsync is connecting to another computer, it needs to be told how to make the connection. Using the -e ssh option tells rsync to use SSH to make the connection. Since I've already set up SSH to log in automatically, this works silently already.

Here are what the other options mean:

-a
-a stands for archive; it's a synonym for -rlptgoD, which means recurse through the directory structure [the -r option], copying symbolic links as symbolic links instead of copying the files or directories they point to [-l], preserving permissions [-p], preserving modification times [-t] and user [-o] and group [-g] ownership, and preserving device and special files [-D].
-z
Compress file data during the transfer. This is comparatively CPU-intensive, but makes the transfer faster.
-h
Output more human-readable numbers. This is useful if I run the script manually: because -P (described below) implies --verbose, the script will tells me about each change it's making, and I can easily tell whether it's done by glancing at the terminal I'm using to run the script. I'd prefer to see these numbers expressed in larger units than by number of bytes, and -h does this.
-P
A short way to type --partial --progress. I want --partial because if the transfer is interrupted for some reason, this means that the partially transferred file doesn't have to be transferred again from the beginning the next time the script is run (which may be efficient if large files are partially transferred). --progress is, again, useful if I'm running the script from a terminal.
--delete
Delete files from the backup that are no longer on my primary hard drive. I do this both because I want to keep the backup's size from growing unmanageably, and because I see having a backup as primarily being useful in case I need to restore (a portion of) my entire home folder in case of a catastrophic failure, not as a way of recovering accidentally deleted files.

So now I want to test that my backup script is running correctly: I can type ~/.scripts/home-backup.sh in a terminal on the client. If I want to save myself some typing, I can add the line export PATH=$PATH:$HOME/.scripts to ~/.bashrc. Running the script produces output that looks like:

patrick@linicious:~$ home-backup.sh bash: warning: setlocale: LC_ALL: cannot change locale (en_US.UTF-8) sending incremental file list patrick/ patrick/.Xauthority 54 100% 0.00kB/s 0:00:00 (xfr#1, ir-chk=1143/1147) patrick/.bash_history 30.13K 100% 2.87MB/s 0:00:00 (xfr#2, ir-chk=1141/1147) patrick/.gmrun_history 1.98K 100% 193.36kB/s 0:00:00 (xfr#3, ir-chk=1128/1147) patrick/.xsession-errors 4.64M 100% 13.84MB/s 0:00:00 (xfr#4, ir-chk=1109/1147) patrick/.TrueCrypt/ patrick/.TrueCrypt/Configuration.xml 1.71K 100% 2.63kB/s 0:00:00 (xfr#5, ir-chk=1003/1364) patrick/.bluefish/ patrick/.bluefish/autosave_journal.6705 124 100% 0.18kB/s 0:00:00 (xfr#6, ir-chk=1015/1390) patrick/.bluefish/autosave/ patrick/.bluefish/autosave/crunchbang-networking.html 61.27K 100% 90.52kB/s 0:00:00 (xfr#7, ir-chk=1009/1390) deleting patrick/.config/google-chrome/Default/Extension Rules/MANIFEST-000335 deleting patrick/.config/google-chrome/Default/Extension Rules/000337.log deleting patrick/.config/google-chrome/Default/Extension State/MANIFEST-000480 deleting patrick/.config/google-chrome/Default/Extension State/000484.log deleting patrick/.config/google-chrome/Default/Extensions/Temp/ deleting patrick/.config/google-chrome/Default/Extensions/bepbmhgboaologfdajaanbcjmnhjmhfn/0.1.1.5019_0/images/speech.png deleting patrick/.config/google-chrome/Default/Extensions/bepbmhgboaologfdajaanbcjmnhjmhfn/0.1.1.5019_0/images/off.png deleting patrick/.config/google-chrome/Default/Extensions/bepbmhgboaologfdajaanbcjmnhjmhfn/0.1.1.5019_0/images/mic-normal.gif

And so forth. This is what I should be seeing: rsync is listing the files that are being deleted or transferred. What I want to do now is make the script run automatically every night while I'm sleeping. To do that, I modify my crontab file on the client. The crontab file should never be modified directly; it should always be edited by typing crontab -e in a terminal. If this is your first time editing crontab since you've installed the operating system, it will ask which (terminal-based) text editor you want to use to edit the file. I myself prefer nano, which is the easiest.

Here's my crontab file, with the relevant line highlighted:

# Edit this file to introduce tasks to be run by cron. # # Each task to run has to be defined through a single line # indicating with different fields when the task will be run # and what command to run for the task # # To define the time you can provide concrete values for # minute (m), hour (h), day of month (dom), month (mon), # and day of week (dow) or use '*' in these fields (for 'any').# # Notice that tasks will be started based on the cron's system # daemon's notion of time and timezones. # # Output of the crontab jobs (including errors) is sent through # email to the user the crontab file belongs to (unless redirected). # # For example, you can run a backup of all your user accounts # at 5 a.m every week with: # 0 5 * * 1 tar -zcf /var/backups/home.tgz /home/ # # For more information see the manual pages of crontab(5) and cron(8) # # m h dom mon dow command 0 5 * * * /home/patrick/.scripts/home-backup.sh 30 5 * * * gpg --refresh-keys --keyserver hkp://pool.sks-keyservers.net

All this line means is at zero minutes after 5 a.m. on any [*] day of the month, in any [*] month, on any [*] day of the week, run the script at /home/patrick/.scripts/home-backup.sh. I run it at 5 a.m. because that's the time of day when I'm most likely to be sleeping (I'm an irregular sleeper but a real night person), and I specify the entire path without using ~ as an abbreviation, because cron-executed scripts don't necessarily have a full set of environment variables. Once I've typed in the relevant line, I can quit nano by hitting Ctrl+X and answering yes when nano asks me if I want to save the changes I've made.

And it works. Every morning when I wake up, I can tell that it's run because a file called home_backup.log has appeared in my desktop folder. (Note that, in a default Crunchbang installation, the contents of ~/Desktop aren't displayed on the actual desktop, so I need to browse to ~/Desktop in a file browser to check on this.)