Home Blog CV Projects Patterns Notes Book Colophon Search

Sensible KVM Networking

7 May, 2009

I wanted a networking setup I could use on my MacBook Air and on my Hetzner server (although I haven't tested it on Hetzner yet). I also wanted to learn how QEMU/KVM is really working without the libvirt abstractions.

It tunrs out that you can have a perfectly good KVM setup without libvirt at all. In fact if you follow these instructions you can disable it completely if you like (although you can also leave it running because the naming convention here doesn't conflict with it):

$ sudo /etc/init.d/libvirt-bin stop

Now look in the logs for the last time you loaded the virtual machine:

$ sudo tail -f /var/log/libvirt/qemu/grp-dev.log
/usr/bin/kvm -S -M pc -m 500 -smp 2 -name grp-dev -monitor pty -boot c -drive file=/home/james/vms/grp/ubuntu-kvm/disk0.qcow2,if=ide,index=0,boot=on -drive file=/home/james/vms/grp/ubuntu-kvm/disk1.qcow2,if=ide,index=1 -net nic,macaddr=52:54:00:39:81:49,vlan=0 -net tap,fd=11,script=,vlan=0,ifname=vnet1 -serial none -parallel none -usb -vnc 127.0.0.1:0

You'll use this command exaclty as it is but change the networking configuration.

$ brctl show
bridge name    bridge id        STP enabled    interfaces
$ sudo brctl addbr br0
$ brctl show
bridge name    bridge id            STP enabled    interfaces
br0            8000.000000000000    no
$ sudo ifconfig br0 192.168.100.254 netmask 255.255.255.0 up
james@dirac:~$ ifconfig
br0       Link encap:Ethernet  HWaddr a6:d0:64:89:09:0d
          inet addr:192.168.100.254  Bcast:192.168.100.255  Mask:255.255.255.0
          inet6 addr: fe80::a4d0:64ff:fe89:90d/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:21 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:0 (0.0 B)  TX bytes:3422 (3.4 KB)

eth1      Link encap:Ethernet  HWaddr 00:1f:5b:84:23:e2
          inet addr:192.168.1.2  Bcast:192.168.1.255  Mask:255.255.255.0
          inet6 addr: fe80::21f:5bff:fe84:23e2/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:620635 errors:0 dropped:0 overruns:0 frame:209457
          TX packets:380237 errors:13 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:757417431 (757.4 MB)  TX bytes:47895905 (47.8 MB)
          Interrupt:16

lo        Link encap:Local Loopback
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:16436  Metric:1
          RX packets:2 errors:0 dropped:0 overruns:0 frame:0
          TX packets:2 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:100 (100.0 B)  TX bytes:100 (100.0 B)

I've got eth1 here rather than eth0 because the wireless card is actually the physical interface.

You now have a bridge with its own IP address, 192.168.100.254. You could have chosen a different IP if you preferred.

QUESTION: Is it possible to choose an IP on the same network as eth1 to have the guests bridged directly onto the host network? ANSWER: No! The guest can ping the bridge and host but not the internet

$ tunctl -b -u root -t qtap0
Failed to open '/dev/net/tun' : Permission denied

Oops, you need to load the tun module and run as root:

$ sudo modprobe tun
$ sudo tunctl -b -u root -t qtap0
qtap0

Now add the interface to the bridge:

$ sudo brctl addif br0 qtap0
$ sudo ifconfig qtap0 up 0.0.0.0 promisc
$ ifconfig
br0       Link encap:Ethernet  HWaddr 6a:09:73:aa:5e:ea
          inet addr:192.168.100.254  Bcast:192.168.100.255  Mask:255.255.255.0
          inet6 addr: fe80::a4d0:64ff:fe89:90d/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:24 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:0 (0.0 B)  TX bytes:3741 (3.7 KB)

eth1      Link encap:Ethernet  HWaddr 00:1f:5b:84:23:e2
          inet addr:192.168.1.2  Bcast:192.168.1.255  Mask:255.255.255.0
          inet6 addr: fe80::21f:5bff:fe84:23e2/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:694903 errors:0 dropped:0 overruns:0 frame:232976
          TX packets:430352 errors:13 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:835017751 (835.0 MB)  TX bytes:55654200 (55.6 MB)
          Interrupt:16

lo        Link encap:Local Loopback
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:16436  Metric:1
          RX packets:2 errors:0 dropped:0 overruns:0 frame:0
          TX packets:2 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:100 (100.0 B)  TX bytes:100 (100.0 B)

qtap0     Link encap:Ethernet  HWaddr 6a:09:73:aa:5e:ea
          inet6 addr: fe80::6809:73ff:feaa:5eea/64 Scope:Link
          UP BROADCAST RUNNING PROMISC MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:4 overruns:0 carrier:0
          collisions:0 txqueuelen:500
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)

You can now ping the bridge:

$ ping 192.168.100.254
PING 192.168.100.254 (192.168.100.254) 56(84) bytes of data.
64 bytes from 192.168.100.254: icmp_seq=1 ttl=64 time=0.046 ms
64 bytes from 192.168.100.254: icmp_seq=2 ttl=64 time=0.072 ms
64 bytes from 192.168.100.254: icmp_seq=3 ttl=64 time=0.065 ms
^C
--- 192.168.100.254 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2001ms
rtt min/avg/max/mdev = 0.046/0.061/0.072/0.011 ms

but the Vm doesn't have an IP yet until you start it.

Now, you can boot your VM but replace the two -net options with the following:

-net nic,macaddr=52:54:00:39:81:49 -net tap,ifname=qtap0,script=no,downscript=no

The two "-net" switches have a different meaning but are both needed:

nic,macaddr=52:54:00:39:81:49

Specifies network interface options on the guest side, each guest should have a different mac address

tap,ifname=qtap0,script=no

Specifies network interface options on the host side. The "script=no" means not use the scripts in /etc/kvm/kvm.ifup which is what we want because we've configured the networking manually.

You'll also need to remove the -S option which stops the PC and remove the -vnc 127.0.0.1:0 if you want it to load in a window rather than having to connect via VNC.

Here's the final command for my 2 CPU 512Mb system:

/usr/bin/kvm -M pc -m 500 -smp 2 -name grp-dev -monitor pty -boot c -drive file=/home/james/vms/grp/ubuntu-kvm/disk0.qcow2,if=ide,index=0,boot=on -drive file=/home/james/vms/grp/ubuntu-kvm/disk1.qcow2,if=ide,index=1 -net nic,macaddr=52:54:00:39:81:49 -net tap,ifname=qtap0,script=no,downscript=no -serial none -parallel none -usb

I also used these commands on the host:

sudo echo "1" > /proc/sys/net/ipv4/ip_forward
sudo iptables -t nat -A POSTROUTING -o eth1 -j MASQUERADE

You'd replace this with eth0 on a normal server, I'm actually using my wireless card!

Each guest then need a configuration like this in /etc/network/interfaces. The first guest gets the address 192.168.100.1, second gets 192.168.100.2 etc:

auto lo
iface lo inet loopback

# The primary network interface
auto eth0
iface eth0 inet static
 address 192.168.100.1
 network 192.168.100.0
 netmask 255.255.255.0
 up route add 192.168.100.254 dev eth0
 up route add default gw 192.168.100.254
 down route del default gw 192.168.100.254
 down route del 192.168.100.254 dev eth0

Then you need to run this on each guest if you performed the configuration after booting:

$ sudo ifdown eth0
$ sudo ifup eth0

Networking should now work. You can ping the guest from the host and from the guests you can ping any address on the internet, the host or any computer on the bridge including other guests.

Finally you need to edit /etc/resolv.conf on the guest so it can resolve domain names. Change it so that it uses the same IP address as the same file on the host. In my case:

nameserver 192.168.1.1

At this point you could actually start libvirt again since we've used a different naming convention so the networks won't conflict. There's really no point in doing so though.

$ sudo /etc/init.d/libvirt-bin start
 * Starting libvirt management daemon libvirtd
$ brctl show
bridge name    bridge id            STP enabled    interfaces
br0            8000.6a0973aa5eea    no             qtap0
vnet0          8000.000000000000    yes

WARNING: Just don't load the same image via libvirt and on the command line at the same time! I'm not sure what that would do!

Making it Permanant

If you want to make it permanant you can use the kvm-manager.sh script from http://www.linux-kvm.org/page/Simple_shell_script_to_manage_your_virtual_machine_with_bridged_networking as a basis. You'll need to install socat though:

sudo apt-get install socat

The script shutsdown virtual machines with a qemu quit command, equivalent to pulling the power out and clearly not very friendly. My approach is to change the way the virtual machine deals with the keys ctrl+alt+del being sent so that they poweroff the guest rather than restart it. On the guest edit this:

$ sudo vi /etc/event.d/control-alt-delete

So that the command executed is /sbin/poweroff. Then change the kvm-manager.sh script so that it sends the ctrl+alt+del keys, waits 10 seconds (hopefully long enough for the machine to shut down) and then pulls the power. Here's the relevant section:

stop_vm() {
        echo "stop virtual machine"

        get_vm_pid_to "stop vm"
        # check if monitor file there
        if [ ! -e ${FILE_MONITOR} ]
        then
                echo "${FILE_MONITOR} not found, can not stop vm"
                exit 1
        fi
        # if the process is still running
        # send command quit to its monitor, and wait
        if [ -d /proc/${VM_PID} ]
        then
                send_cmd "sendkey ctrl-alt-delete"
                sleep 10
        fi
        # check if the process is still running
        if [ -d /proc/${VM_PID} ]
        then
                sleep 1
        fi
        # if the process is still running
        # send command quit to its monitor, and wait
        if [ -d /proc/${VM_PID} ]
        then
                send_cmd "quit"
                sleep 1
        fi
        if [ ! -d /proc/${VM_PID} ]
        then
                # yes, done
                rm ${FILE_PID}
                rm ${FILE_MONITOR}
                echo "vm stopped successfully"
        else
                # no, something wrong there...
                echo "failed to stop vm"
                exit 1
        fi
}

An alternative approach is to use advanced power management options to make the machine power down but this seemed more complicated than the above approach.

You could also do some work to make this script work as an init script to load all your VMs.

See also:

Copyright James Gardner 1996-2020 All Rights Reserved. Admin.