Dobrica Pavlinušić's random unstructured stuff
LXC: Revision 23

This are notes for my LXC workshop, in state of flux



Cgroups

Systemd

launchd alterntive, somilar to inetd, but for unix sockets (mostly)

LXC

Virtual Servers and Checkpoint/Restart in Mainstream Linux http://lxc.sourceforge.net/doc/sigops/appcr.pdf

LXC inside KVM for testing

setup KVM LXC test machine

Step 1: Create a root filesystem for the KVM system.

Step 2: Build a kernel for KVM, with container support.

Step 3: Boot the result under QEMU or KVM

Step 4: ssh into the KVM instance.

ssh root@127.0.0.1 -p 9876

Step 5: Set up a simple busybox-based container under the KVM system.

wget http://busybox.net/downloads/binaries/latest/busybox-i686 -O busybox
chmod +x busybox
echo -e "lxc.utsname = container\nlxc.network.type = empty" > container.conf
PATH=$(pwd):$PATH lxc-create -f container.conf -t busybox -n container

Step 6: Launch the container

lxc-start -n container

# console is broken, so start another

lxc-console -n container

network

prepare host machine

br0
dnsmasq

  1. sysctl -w net.ipv4.ip_forward=1
  1. iptables -t nat -A POSTROUTING -o wlan0 -j SNAT --to-source=WLAN0_IP

or

  1. iptables -t nat -A POSTROUTING -o wlan0 -j MASQUERADE

macvlan

  • aliased IP at eth devel
  • new device with own mac with offloading
  • can't communicate with other containers or host (< 2.6.33)

lxc.network.type=macvlan
lxc.network.link=eth0
lxc.network.flags=up

ip link add link <phys> name <vif> address <mac address> type macvlan mode (bridge|vepa|private)

ip link add link bond200 name bond200:0 address 00:aa:bb:cc:dd:ee type macvlan mode bridge

ip -d show link bond200:0

lxc.network.type = macvlan
lxc.network.macvlan.mode = bridge
lxc.network.flags = up
lxc.network.link = bond200
lxc.network.name = eth7
lxc.network.mtu = 1500
lxc.network.ipv4 = 192.168.90.11/24
lxc.network.hwaddr = 4a:49:43:49:79:0B

veth

lxc.network.type=veth
lxc.network.link=br0
lxc.network.flags=up

pseudo-random mac?

http://en.wikipedia.org/wiki/Mac_address

    x2:xx:xx:xx:xx:xx
    x6:xx:xx:xx:xx:xx
    xA:xx:xx:xx:xx:xx
    xE:xx:xx:xx:xx:xx

IP=192.168.0.50 # container nic IP
HA=printf "02:00:%x:%x:%x:%x" ${IP//./ } # generate a MAC from the IP

slow network?

/usr/sbin/ethtool -K br0 sg off
/usr/sbin/ethtool -K br0 tso off

host-only bridge

auto br0
iface br0 inet static
        bridge_ports dummy0
        bridge_maxwait 0
        address 172.16.16.1
        netmask 255.255.255.0


phys

kernel > 2.6.35

lxc.network.type=phys
lxc.network.link=eth1
lxc.network.name=eth1

limit container resources

cpuset.cpus

echo 1 > /cgroup/<name>/cpuset.cpus # 2nd CPU!

echo 1,2,3 > /cgroup/<name>/cpuset.cpus

echo 0-7 > /cgroup/<name>/cpuset.cpus

lxc-execute -n foo -s lxc.cgroup.cpuset.cpus="1,2,3" myforks

cpu.shares

lxc-execute -n foo -s lxc.cgroup.cpu.shares=1 /bin/bash

lxc-execute -n bar -s /bin/bash

while $(true); do echo -n . ; done

lxc-cgroup -n foo cpu.shares=1024

memory

lxc.cgroup.memory.limit_in_bytes = 256M
lxc.cgroup.memory.memsw.limit_in_bytes = 1G

disk

LVM?
quota (it can be bypassed if the container runs with CAP_SYS_ADMIN and/or CAP_SYS_RESOURCE capabilities)

network

# mkdir -p /dev/cgroup
# mount -t cgroup net_cls -o net_cls /dev/cgroup
# mkdir /dev/cgroup/A
# mkdir /dev/cgroup/B

# cd /dev/cgroup
# echo 0x1001 > A/net_cls.classid   # 10:1
# echo 0x1002 > B/net_cls.classid   # 10:2

# tc qdisc add dev eth0 root handle 10: htb

# tc class add dev eth0 parent 10: classid 10:1 htb rate 40mbit
# tc class add dev eth0 parent 10: classid 10:2 htb rate 30mbit

# tc filter add dev eth0 parent 10: protocol ip prio 10 handle 1: cgroup

LXC commands

lxc-create

/usr/lib/lxc/templates/

export MIRROR=http://192.168.1.20:3142/ftp.debian.org
export SUITE=lenny

cat > /tmp/lenny.conf
lxc.network.type=veth
lxc.network.link=br0
lxc.network.flags=up

  1. <ctrl+d>

t61p:~# lxc-create -n lenny -t debian -f /tmp/lenny.conf

lxc-execute

application container (shares filesystem!)

lxc-ssh

lxc-execute -n foo -s lxc.utsname=foo /bin/bash
lxc-execute -n bar -s lxc.utsname=bar /bin/bash

lxc-attach

Needs kernel patch

lxc-attach n n0 - /usr/sbin/tcpdump -i eth0

devices

http://lwn.net/Articles/273208/

lxc.cgroup.devices.allow = <type> <major>:<minor> <perm>

<type> : b (block), c (char), etc ...
<major> : major number
<minor> : minor number (wildcard is accepted)
<perms> : r (read), w (write), m (mapping)

monitoring

htop

htop - cgroups > r192

t61p:/tmp# apt-get source htop
t61p:/tmp# apt-get build-dep htop
t61p:/tmp# dpkg-source -x htop_0.9-2.dsc
t61p:/tmp# cd htop-0.9/

t61p:/tmp/htop-0.9# DEB_BUILD_OPTIONS="--enable-cgroup" fakeroot debian/rules binary

  1. sigh, no work, patch debian/rules to add --enable-cgroup

t61p:/tmp/htop-0.9# fakeroot debian/rules binary
t61p:/tmp/htop-0.9# dpkg -i ../htop_0.9-2_i386.deb

procfs

http://lxc.sourceforge.net/download/procfs/procfs.tar.gz (fuse, defunct)
http://www.tinola.com/lxc/ (somewhat newer)

debugging

lxc-start --logpriority=TRACE -o /tmp/trace.log --name my_container

(must have redirect to file!)

kernel patches

http://lxc.sourceforge.net/patches/linux/

Are we in container?

on host:

dpavlin@stage:~$ cat /proc/$$/cgroup
1:net_cls,freezer,devices,cpuacct,cpu,ns,cpuset:/

inside container:

dpavlin@narada:~$ cat /proc/$$/cgroup
1:net_cls,freezer,devices,cpuacct,cpu,ns,cpuset:/narada

32-bit guest on 64-bit kernel

(lxc >= 0.7.3)

lxc.arch=x86

Container tweaks

udev

echo udev hold | dpkg --set-selections

nfs

kernel doesn't have nfs namespaces yet, use user-space nfs servers:

chromium

pam

pam_netns allows to setup a private network namespace for every user
session (comparable with pam_namespace for filesystem namespaces). This
is especially useful on multiseat environments.

X-server

Virtual PCI network cards

don't delete files

dpkg-divert --rename /etc/init/theinitfile.conf