Dobrica Pavlinušić's random unstructured stuff
LXC: Revision 33
This are notes for my LXC workshop, in state of flux

{toc: }

^ Cgroups

* http://www.kernel.org/doc/Documentation/cgroups/cgroups.txt
* http://www.webupd8.org/2010/11/alternative-to-200-lines-kernel-patch.html
* http://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/6/html/Resource_Management_Guide/ch01.html

* Resource allocation using cgroups http://blip.tv/file/4773168

^^ Systemd

* systemd, beyond init http://www.youtube.com/watch?v=TyMLi8QF6sw

launchd alterntive, somilar to inetd, but for unix sockets (mostly)

^ LXC

Virtual Servers and Checkpoint/Restart in Mainstream Linux http://lxc.sourceforge.net/doc/sigops/appcr.pdf

* you don't have hardware virtualization (netbooks, anyone?)
** http://lxc.teegra.net/
** http://en.gentoo-wiki.com/wiki/LXC
** http://sysadvent.blogspot.com/2010/12/day-1-linux-containers-lxc.html
* Amazon EC2
** http://www.phenona.com/blog/using-lxc-linux-containers-in-amazon-ec2/
* Running X
** http://blog.ikibiki.org/2011/04/05/Running_X_from_LXC/
* LVM integration
** http://s3hh.wordpress.com/2011/03/30/one-more-lxc-clone-update/

^ LXC inside KVM for testing

* http://sysadmin-cookbook.rot13.org/#lxc_kvm

^^ setup KVM LXC test machine

* http://www.landley.net/lxc/01-setup.html

^^^ Step 1: Create a root filesystem for the KVM system.

* http://sysadmin-cookbook.rot13.org/#01_create_kvm_root_sh 3m12.426s

^^^ Step 2: Build a kernel for KVM, with container support.

* http://sysadmin-cookbook.rot13.org/#02_build_kvm_kernel_sh 8m22.248s

^^^ Step 3: Boot the result under QEMU or KVM

* http://sysadmin-cookbook.rot13.org/#03_boot_kvm_sh

^^^ Step 4: ssh into the KVM instance.

.pre
ssh root@127.0.0.1 -p 9876
.pre

^^^ Step 5: Set up a simple busybox-based container under the KVM system.

.pre
wget http://busybox.net/downloads/binaries/latest/busybox-i686 -O busybox
chmod +x busybox
echo -e "lxc.utsname = container\nlxc.network.type = empty" > container.conf
PATH=$(pwd):$PATH lxc-create -f container.conf -t busybox -n container
.pre

^^^ Step 6: Launch the container

.pre
lxc-start -n container

# console is broken, so start another

lxc-console -n container
.pre

^^^ Step 7: Stop the container, and the KVM system.

.pre
lxc-stop -n container

# remove container
lxc-destroy -n container
.pre

^^ Setup networking

* http://www.landley.net/lxc/02-networking.html

^^^ Step 1: Add a TAP interface to the Laptop.

.pre
# FIXME change username
tunctl -u dpavlin -t kvm0
ifconfig kvm0 192.168.254.1 netmask 255.255.255.0
echo 1 > /proc/sys/net/ipv4/ip_forward

# iptables -t nat -A POSTROUTING -o eth0 -j MASQUERADE
.pre

^^^ Step 2: Launch KVM with two ethernet interfaces.

.pre
kvm -m 1024 -kernel ../01-setup/linux-2.6.*/arch/x86/boot/bzImage -no-reboot \
-hda ../01-setup/squeeze.ext3 -append "root=/dev/hda rw panic=1" \
-net nic,model=e1000 -net user -redir tcp:9876::22 \
-net nic,model=e1000 -net tap,ifname=kvm0,script=no
.pre

^^^ Step 3: Set up a new container in the KVM system.

.pre
root@kvm:~# cat > busybox.conf << EOF
lxc.utsname = busybox
lxc.network.type = phys
lxc.network.flags = up
lxc.network.link = eth1
#lxc.network.name = eth0
EOF

PATH=$(pwd):$PATH lxc-create -f busybox.conf -t busybox -n busybox
lxc-start -n busybox
.pre

.pre
root@kvm:~# lxc-console -n busybox

ifconfig eth1 192.168.254.2 netmask 255.255.255.0
route add default gw 192.168.254.1
.pre

^^^ Step 4: Fun with routing.

On host, bring up loopback alias in KVM network:

.pre
dpavlin@x200:~$ sudo ifconfig lo:1 10.0.2.200 netmask 255.255.255.0
.pre

busybox container can reach it, while KVM can't !

^ network hints

^^ prepare host machine

^^ macvlan

* aliased IP at eth devel
* new device with own mac *with* offloading
* can't communicate with other containers or host (< 2.6.33)

lxc.network.type=macvlan
lxc.network.link=eth0
lxc.network.flags=up

ip link add link <phys> name <vif> address <mac address> type macvlan mode (bridge|vepa|private)

ip link add link bond200 name bond200:0 address 00:aa:bb:cc:dd:ee type macvlan mode bridge

ip -d show link bond200:0

lxc.network.type = macvlan
lxc.network.macvlan.mode = bridge
lxc.network.flags = up
lxc.network.link = bond200
lxc.network.name = eth7
lxc.network.mtu = 1500
lxc.network.ipv4 = 192.168.90.11/24
lxc.network.hwaddr = 4a:49:43:49:79:0B

^^ veth

.pre
sudo apt-get install bridge-utils dnsmasq


# setup hints

sysctl -w net.ipv4.ip_forward=1

ptables -t nat -A POSTROUTING -o wlan0 -j SNAT --to-source=WLAN0_IP

# or for nat
iptables -t nat -A POSTROUTING -o wlan0 -j MASQUERADE
.pre

lxc.network.type=veth
lxc.network.link=br0
lxc.network.flags=up

# name inside container

lxc.network.name = eth0.12
lxc.network.mtu = 1500
lxc.network.ipv4 = 10.60.0.12/23
lxc.network.hwaddr = AC:DE:48:00:00:0C

# name host interface for bridge

lxc.network.veth.pair = veth12

^^^ host-only bridge

.pre
$ cat /etc/network/interfaces

auto br0
iface br0 inet static
bridge_ports dummy0
bridge_maxwait 0
address 172.16.16.1
netmask 255.255.255.0
.pre

^^^ pseudo-random mac?

http://en.wikipedia.org/wiki/Mac_address

x2:xx:xx:xx:xx:xx
x6:xx:xx:xx:xx:xx
xA:xx:xx:xx:xx:xx
xE:xx:xx:xx:xx:xx

IP=192.168.0.50 # container nic IP
HA=`printf "02:00:%x:%x:%x:%x" ${IP//./ }` # generate a MAC from the IP

^^^ slow network?

/usr/sbin/ethtool -K br0 sg off
/usr/sbin/ethtool -K br0 tso off

^^ phys

kernel > 2.6.35

lxc.network.type=phys
lxc.network.link=eth1
lxc.network.name=eth1

^ limit container resources

^^ cpuset.cpus

echo 1 > /cgroup/<name>/cpuset.cpus # 2nd CPU!

echo 1,2,3 > /cgroup/<name>/cpuset.cpus

echo 0-7 > /cgroup/<name>/cpuset.cpus

lxc-execute -n foo -s lxc.cgroup.cpuset.cpus="1,2,3" myforks

^^ cpu.shares

lxc-execute -n foo -s lxc.cgroup.cpu.shares=1 /bin/bash

lxc-execute -n bar -s /bin/bash

while $(true); do echo -n . ; done

lxc-cgroup -n foo cpu.shares=1024

^^ memory

lxc.cgroup.memory.limit_in_bytes = 256M
lxc.cgroup.memory.memsw.limit_in_bytes = 1G

^^ disk

^^^ usage

standard Linux tools:

* LVM
* quota (it can be bypassed if the container runs with CAP_SYS_ADMIN and/or CAP_SYS_RESOURCE capabilities)

^^^ limit disk bandwith using cgroup blkio

* http://www.mjmwired.net/kernel/Documentation/cgroups/blkio-controller.txt

Required kernel configuration

CONFIG_BLK_CGROUP=y
CONFIG_CFQ_GROUP_IOSCHED=y
CONFIG_BLK_DEV_THROTTLING=y

create containers for test

.pre
#!/bin/sh -xe

lxc-ls | xargs -i sh -c "lxc-stop -n {} ; lxc-destroy -n {}"

echo "lxc.network.type = empty" > blkio.conf

PATH=$(pwd):$PATH lxc-create -f blkio.conf -t busybox -n disk1
PATH=$(pwd):$PATH lxc-create -f blkio.conf -t busybox -n disk2
PATH=$(pwd):$PATH lxc-create -f blkio.conf -t busybox -n disk3

lxc-ls | xargs -i dd if=/dev/zero of=/var/lib/lxc/{}/rootfs/tmp/zero bs=1M count=100

cat > /tmp/speed.sh <<EOF
#!/bin/sh
while true ; do
sync ; echo 3 > /proc/sys/vm/drop_caches
dd if=/tmp/zero of=/dev/null 2>&1
done | grep MB
EOF

chmod +x /tmp/speed.sh

lxc-ls | xargs -i cp /tmp/speed.sh /var/lib/lxc/{}/rootfs/tmp/speed.sh

lxc-ls | xargs -i lxc-start -d -n {}
.pre

login into each container and run test

.pre
root@kvm:~# lxc-console -n disk1

Type <Ctrl+a q> to exit the console

disk1 login: root
~ # /tmp/speed.sh
104857600 bytes (100.0MB) copied, 0.958453 seconds, 104.3MB/s
.pre

Test limits (be careful not to enter 1000, you might oops kernel!)

.pre
root@kvm:~# echo 100 > /mnt/cgroup/disk1/blkio.weight
root@kvm:~# echo 200 > /mnt/cgroup/disk2/blkio.weight
root@kvm:~# echo 500 > /mnt/cgroup/disk3/blkio.weight

root@kvm:~# cat /mnt/cgroup/disk?/blkio.weight
100
200
500
.pre

Limit /dev/hda to 1Mb/s read

.pre
root@kvm:~# ls -al /dev/hda
brw-rw---- 1 root disk 3, 0 May 15 00:10 /dev/hda

root@kvm:~# echo "3:0 1048576" > /mnt/cgroup/disk1/blkio.throttle.read_bps_device
.pre

^^ network

* http://vger.kernel.org/netconf2009_slides/Network%20Control%20Group%20Whitepaper.odt

.pre
# mkdir -p /dev/cgroup
# mount -t cgroup net_cls -o net_cls /dev/cgroup
# mkdir /dev/cgroup/A
# mkdir /dev/cgroup/B

# cd /dev/cgroup
# echo 0x1001 > A/net_cls.classid # 10:1
# echo 0x1002 > B/net_cls.classid # 10:2

# tc qdisc add dev eth0 root handle 10: htb

# tc class add dev eth0 parent 10: classid 10:1 htb rate 40mbit
# tc class add dev eth0 parent 10: classid 10:2 htb rate 30mbit

# tc filter add dev eth0 parent 10: protocol ip prio 10 handle 1: cgroup

.pre

^ LXC commands

^^ lxc-create

/usr/lib/lxc/templates/

export MIRROR=<http://192.168.1.20:3142/ftp.debian.org>
export SUITE=lenny

cat > /tmp/lenny.conf
lxc.network.type=veth
lxc.network.link=br0
lxc.network.flags=up

# <ctrl+d>

t61p:~# lxc-create -n lenny -t debian -f /tmp/lenny.conf

^^ lxc-execute

application container (shares filesystem!)

lxc-ssh

lxc-execute -n foo -s lxc.utsname=foo /bin/bash
lxc-execute -n bar -s lxc.utsname=bar /bin/bash

^^ lxc-attach

Needs kernel patch

lxc-attach -n n0 -- /usr/sbin/tcpdump -i eth0

^ devices

http://lwn.net/Articles/273208/

lxc.cgroup.devices.allow = <type> <major>:<minor> <perm>

<type> : b (block), c (char), etc ...
<major> : major number
<minor> : minor number (wildcard is accepted)
<perms> : r (read), w (write), m (mapping)

^ monitoring

^^ htop

htop - cgroups > r192

t61p:/tmp# apt-get source htop
t61p:/tmp# apt-get build-dep htop
t61p:/tmp# dpkg-source -x htop_0.9-2.dsc
t61p:/tmp# cd htop-0.9/

t61p:/tmp/htop-0.9# DEB_BUILD_OPTIONS="--enable-cgroup" fakeroot debian/rules binary

# sigh, no work, patch debian/rules to add --enable-cgroup

t61p:/tmp/htop-0.9# fakeroot debian/rules binary
t61p:/tmp/htop-0.9# dpkg -i ../htop_0.9-2_i386.deb

^^ procfs

http://lxc.sourceforge.net/download/procfs/procfs.tar.gz (fuse, defunct)
http://www.tinola.com/lxc/ (somewhat newer)

^^ debugging

lxc-start --logpriority=TRACE -o /tmp/trace.log --name my_container

(must have redirect to file!)

^ kernel patches

http://lxc.sourceforge.net/patches/linux/

^ Are we in container?

on host:

dpavlin@stage:~$ cat /proc/$$/cgroup
1:net_cls,freezer,devices,cpuacct,cpu,ns,cpuset:/

inside container:

dpavlin@narada:~$ cat /proc/$$/cgroup
1:net_cls,freezer,devices,cpuacct,cpu,ns,cpuset:/narada

^ 32-bit guest on 64-bit kernel

(lxc >= 0.7.3)

lxc.arch=x86

^ Container tweaks

^^ udev

echo udev hold | dpkg --set-selections

^^ nfs

kernel doesn't have nfs namespaces yet, use user-space nfs servers:

* http://unfs3.sourceforge.net
* http://sourceforge.net/apps/trac/nfs-ganesha

^^ chromium

* http://www.chromium.org/chromium-os/chromiumos-design-docs/system-hardening
* http://git.chromium.org/gitweb/?p=chromiumos/platform/minijail.git;a=summary

^^ pam

* http://pam-netns.sourceforge.net/

pam_netns allows to setup a private network namespace for every user
session (comparable with pam_namespace for filesystem namespaces). This
is especially useful on multiseat environments.

^^ X-server

* http://box.matto.nl/lxcxserver.html (Xnest example)

* https://launchpad.net/arkose - Arkose - Desktop Application Sandboxing (using aufs2)

^^ Virtual PCI network cards

* http://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/6/html/Virtualization/sect-Para-virtualized_Windows_Drivers_Guide-How_SR_IOV_Libvirt_Works.html

^^ don't delete files

dpkg-divert --rename /etc/init/theinitfile.conf

.pre