Ceph Placement Groups

Quick introduction of placement groups within Ceph

  • a ceph pool is made up of Pg’s. No Pg can be shared across pools.
  • a Pg is made up of objects. No object can be shared across Pg’s.
  • Pg’s can be shared as a whole(replica) or as a part(Erasure coding) across OSD’s.

CRUSH algorithm computes the placement group for each object. So technically ceph client decides where to put objects.

I have a ceph cluster, lets have a look at pg’s in it.

list pools

root@node15:~# ceph osd lspools
6 ephemeral-vms,7 cinder-volumes,

find all pgs in a pool 6 (ephemeral-vms)

ceph pg dump | egrep -v ‘^(7\.)’

PG_STAT OBJECTS MISSING_ON_PRIMARY ………….
6.9f    323                 0  ………………

Note: The above is a snippet from “ceph pg dump”.

The first field is the PG ID,  which are two values separated by a single dot (.).

– The left side value is the POOL ID   (6 is pool id),
– The right side value is the actual PG number (9f is PG number).

Together with a pool id and pg number it forms a Pg_id. So a Pg can’t be shared across pools.

state of a pg

root@node15:~# ceph pg 6.9f query

{

“state”: “active+clean”,
“snap_trimq”: “[]”,
“epoch”: 6974,
“up”: [
5,
3,
1
],
“acting”: [
5,
3,
1
],
“actingbackfill”: [
“1”,
“3”,
“5”
],
“info”: {
“pgid”: “6.9f”,
“last_update”: “6952’880407”,
“last_complete”: “6952’880407”,
“log_tail”: “4347’877389”,
“last_user_version”: 880407,
“last_backfill”: “MAX”,
“last_backfill_bitwise”: 0,
“purged_snaps”: “[1~41]”,
“history”: {
“epoch_created”: 130,
“last_epoch_started”: 6834,
“last_epoch_clean”: 6834,
“last_epoch_split”: 0,
“last_epoch_marked_full”: 0,
“same_up_since”: 6275,
“same_interval_since”: 6833,
“same_primary_since”: 273,
“last_scrub”: “5752’880250”,
“last_scrub_stamp”: “2018-03-29 16:16:12.236973”,
“last_deep_scrub”: “5752’880250”,
“last_deep_scrub_stamp”: “2018-03-29 16:16:12.236973”,
“last_clean_scrub_stamp”: “2018-03-29 16:16:12.236973”
},
……………………..

 

Lets have a look at my ceph cluster health

root@node15:~# ceph -s
cluster 7c75f6e9-b858-4ac4-aa26-48ae1f33eda2
health HEALTH_WARN
34 pgs backfill_wait
1 pgs backfilling
35 pgs stuck unclean
recovery 2/423917 objects degraded (0.000%)
recovery 28282/423917 objects misplaced (6.672%)
pool cinder-volumes pg_num 300 > pgp_num 128
pool ephemeral-vms pg_num 300 > pgp_num 128

monmap e2: 3 mons at {node15=10.0.5.15:6789/0,node16=10.0.5.16:6789/0,node17=10.0.5.17:6789/0}

election epoch 858, quorum 0,1,2 node15,node16,node17

mgr active: node17 standbys: node16, node15
osdmap e6998: 7 osds: 7 up, 6 in; 35 remapped pgs
flags sortbitwise,require_jewel_osds,require_kraken_osds
pgmap v15320688: 600 pgs, 2 pools, 519 GB data, 133 kobjects
1546 GB used, 3999 GB / 5546 GB avail
2/423917 objects degraded (0.000%)
28282/423917 objects misplaced (6.672%)
564 active+clean
34 active+remapped+backfill_wait
1 active+clean+scrubbing+deep
1 active+remapped+backfilling

recovery io 120 MB/s, 30 objects/s
client io 9808 B/s rd, 201 kB/s wr, 7 op/s rd, 22 op/s wr

 

Active   Ceph will process requests to the placement group.
Clean    Ceph replicated all objects in the placement group the correct number of times.
Scrubbing Ceph is checking the placement group metadata for inconsistencies.
Deep Ceph is checking the placement group data against stored checksums.
Degraded Ceph has not replicated some objects in the placement group the correct number of times
Peering The placement group is undergoing the peering process
Backfilling Ceph is scanning and synchronizing the entire contents of a placement group instead of inferring what contents need to be synchronized from the logs of recent operations.
Backfill-wait The placement group is waiting in line to start backfill.
Remapped The placement group is temporarily mapped to a different set of OSDs from what CRUSH specified.

 

You can clearly see that ceph is self healing its Pg’s.

Advertisements

OpenStack-Ansible Getting Started

This blog post covers how to use Openstack-Ansible project to deploy OpenStack. OSAD  allows you to deploy production-grade cloud on LXC containers. In order to ,you need to know how this OSAD works and its basic architecture.

Basic Things You need to know about OSAD:

OSAD deploys openstack components to the respective nodes using ansible.

 

 

Hardware i used is as follows.

  • 2 servers as Infrastructure nodes(4 cores, 4GB RAM, 500GB HDD, 2 NICs)
  • 2 servers as Swift storage nodes  (12 cores, 64GB RAM, 24 TB HDD, 2 NICs)
  • 4 servers as compute nodes (8 cores, 20GB RAM, 500GB HDD, 2 NICs)
  •  1 server as deployment mode
  • a vlan enabled switch

Steps to follow

  1. Clone OSAD git repo and run bootstrap script which installs all required ansible roles.
# git clone -b 15.1.7 https://git.openstack.org/openstack/openstack-ansible \
  /opt/openstack-ansible

Change to the /opt/openstack-ansible

# scripts/bootstrap-ansible.sh
  1. Copy the configurational stuff to /etc/

# cp -R etc/openstack_deploy /etc/

  1. Create a passwords file.
# cd /opt/openstack-ansible/scripts
# python pw-token-gen.py --file /etc/openstack_deploy/user_secrets.yml
  1. Write Openstack user configure.yml according to your need but here I will be installing HA Openstack with 2 sets of each infrastructural component , swift storage , neutron with openvswitch dvr.


cidr_networks:
container: 10.0.4.0/22
tunnel: 10.0.1.0/22

used_ips:
– 10.0.4.1,10.0.5.20
– 10.50.0.1,10.50.0.20
– 10.0.1.1,10.0.1.20

global_overrides:
internal_lb_vip_address: <internal_vip>
external_lb_vip_address: <external_vip>
mgmt_bridge: “br-mgmt”
tunnel_bridge: “br-vxlan”

provider_networks:
– network:
group_binds:
– all_containers
– hosts
type: “raw”
container_bridge: “br-mgmt”
container_interface: “eth1”
container_type: “veth”
ip_from_q: “container”
is_container_address: true
is_ssh_address: true

– network:
group_binds:
– neutron_openvswitch_agent
container_bridge: “br-vlan”
container_interface: “eth12”
container_type: “veth”
type: “vlan”
range: “10:1000”
net_name: “physnet”

– network:
container_bridge: “br-vxlan”
container_type: “veth”
container_interface: “eth10”
ip_from_q: “tunnel”
type: “vxlan”
range: “1:1000”
net_name: “vxlan”
group_binds:
– neutron_openvswitch_agent

– network:
container_bridge: “br-mgmt”
container_type: “veth”
container_interface: “eth2”
ip_from_q: “container”
type: “raw”
group_binds:
– cinder_api
– cinder_volume
– nova_compute

– network:
container_bridge: “br-mgmt”
container_type: “veth”
container_interface: “eth3”
ip_from_q: “container”
type: “raw”
group_binds:
– glance_api
– swift_proxy
– nova_compute

swift:
part_power: 9
repl_number: 2
storage_network: ‘br-mgmt’
drives:
– name: sda
– name: sdb
– name: sdc
– name: sdd
mount_point: /srv/node
storage_policies:
– policy:
name: default
index: 0
default: True
repl_number: 2

swift-proxy_hosts:
node3:
ip: 10.0.5.3
container_vars:
swift_proxy_vars:
read_affinity:”r1=100″
write_affinity:”r1″
write_affinity_node_count:”2 * replicas”
node4:
ip: 10.0.5.4
container_vars:
swift_proxy_vars:
read_affinity:”r1=100″
write_affinity:”r1″
write_affinity_node_count:”2 * replicas”

swift_hosts:
node6:
ip: 10.0.5.6
container_vars:
swift_vars:
storage_ip: 10.0.5.6
repl_ip: 10.55.0.6
limit_container_types: swift
zone: 0
region: 1
node7:
ip: 10.0.5.7
container_vars:
swift_vars:
storage_ip: 10.0.5.7
repl_ip: 10.55.0.7
limit_container_types: swift
zone: 0
region: 1

 

shared-infra_hosts:
node3:
ip: 10.0.5.3
node4:
ip: 10.0.5.4

repo-infra_hosts:
node3:
ip: 10.0.5.3

os-infra_hosts:
node3:
affinity:
heat_apis_container: 0
heat_engine_container: 0
ip: 10.0.5.3
node4:
affinity:
heat_apis_container: 0
heat_engine_container: 0
ip: 10.0.5.4

identity_hosts:
node3:
ip: 10.0.5.3
node4:
ip: 10.0.5.4

network_hosts:
node6:
ip: 10.0.5.6
node7:
ip: 10.0.5.7

compute_hosts:
node6:
ip: 10.0.5.6
host_vars:
nova_virt_type: kvm
node7:
ip: 10.0.5.7
host_vars:
nova_virt_type: kvm
node11:
ip: 10.0.5.11
host_vars:
nova_virt_type: kvm
node12:
ip: 10.0.5.12
host_vars:
nova_virt_type: kvm
node13:
ip: 10.0.5.13
host_vars:
nova_virt_type: kvm
node14:
ip: 10.0.5.14
host_vars:
nova_virt_type: kvm

haproxy_hosts:
node5:
ip: 10.0.5.5
node2:
ip: 10.0.5.2

log_hosts:
node3:
ip: 10.0.5.3

dashboard_hosts:
node3:
ip: 10.0.5.3
node4:
ip: 10.0.5.4

image_hosts:
node3:
ip: 10.0.5.3
node4:
ip: 10.0.5.4

storage-infra_hosts:
node4:
ip: 10.0.5.4
node3:
ip: 10.0.5.3

storage_hosts:
node15:
ip: 10.0.5.15
container_vars:
cinder_backends:
limit_container_types: cinder_volume
rbd:
volume_driver: cinder.volume.drivers.rbd.RBDDriver
volume_backend_name: rbd
rbd_pool: cinder-volumes
rbd_ceph_conf: /etc/ceph/ceph.conf
rbd_user: cinder
rbd_secret_uuid: 5c618737-d4d4-4ee8-95e6-279ac54e080f
rbd_flatten_volume_from_snapshot: ‘false’
rbd_max_clone_depth: 5
rbd_store_chunk_size: 4
rados_connect_timeout: -1

node17:
ip: 10.0.5.17
container_vars:
cinder_backends:
limit_container_types: cinder_volume
rbd:
volume_driver: cinder.volume.drivers.rbd.RBDDriver
volume_backend_name: rbd
rbd_pool: cinder-volumes
rbd_ceph_conf: /etc/ceph/ceph.conf
rbd_user: cinder
rbd_secret_uuid: 5c618737-d4d4-4ee8-95e6-279ac54e080f
rbd_flatten_volume_from_snapshot: ‘false’
rbd_max_clone_depth: 5
rbd_store_chunk_size: 4
rados_connect_timeout: -1

node16:
ip: 10.0.5.16
container_vars:
cinder_backends:
limit_container_types: cinder_volume
rbd:
volume_driver: cinder.volume.drivers.rbd.RBDDriver
volume_backend_name: rbd
rbd_pool: cinder-volumes
rbd_ceph_conf: /etc/ceph/ceph.conf
rbd_user: cinder
rbd_secret_uuid: 5c618737-d4d4-4ee8-95e6-279ac54e080f
rbd_flatten_volume_from_snapshot: ‘false’
rbd_max_clone_depth: 5
rbd_store_chunk_size: 4
rados_connect_timeout: -1

 

  1.  User variables file

debug: false

swift_allow_all_users: true
glance_default_store: swift

glance_glance_api_conf_overrides:
DEFAULT:
show_multiple_locations:true

## Common Nova Overrides
# when nova_libvirt_images_rbd_pool is defined, ceph clients will be installed on the nova hosts.

nova_libvirt_images_rbd_pool: ephemeral-vms
cinder_ceph_client: cinder
cephx: true

ceph_mons:
– 10.0.5.15
– 10.0.5.16
– 10.0.5.17

apt_pinned_packages:
– { package: “lxc”, version: 2.0.0 }

haproxy_use_keepalived: True
haproxy_bind_on_non_local: True

haproxy_keepalived_external_vip_cidr: “<>”
haproxy_keepalived_internal_vip_cidr: “<>”

haproxy_keepalived_external_interface: <>
haproxy_keepalived_internal_interface: <>

haproxy_keepalived_external_virtual_router_id: 10
haproxy_keepalived_internal_virtual_router_id: 11

haproxy_keepalived_priority_master: 100
haproxy_keepalived_priority_backup: 90

keepalived_ping_address: “<>”
haproxy_keepalived_vars_file: ‘vars/configs/keepalived_haproxy.yml’
keepalived_use_latest_stable: True

haproxy_user_ssl_cert: ‘/root/certificate.crt’
haproxy_user_ssl_key: ‘/root/private.key’
haproxy_user_ssl_ca_cert: ‘/root/ca_bundle.crt’

apply_security_hardening: false

horizon_images_upload_mode: legacy
horizon_enable_ha_router: True

neutron_plugin_base:
– router
– metering
– neutron_dynamic_routing.services.bgp.bgp_plugin.BgpPlugin
– trunk
– qos

nova_nova_conf_overrides:
DEFAULT:
cpu_allocation_ratio:3.0
reserved_host_memory_mb:2048
ram_allocation_ratio:2.5

nova_libvirt_hw_disk_discard: ‘unmap’
nova_libvirt_disk_cachemodes: ‘network=writeback’

neutron_plugin_type: ml2.ovs.dvr
neutron_ml2_drivers_type: “flat,vlan,vxlan”
neutron_l2_population: “True”
neutron_vxlan_enabled: true
neutron_vxlan_group: “239.1.1.1”

neutron_provider_networks:
network_flat_networks: “*”
network_types: “flat,vlan,vxlan”
network_vlan_ranges: “physnet:10:1000”
network_mappings: “physnet:br-provider”
network_vxlan_ranges: “1:1000”

  1. Run playbooks.

Openstack Neutron

inside-architecture-of-neutron-9-638
Basic Neutron Deployment
Neutron is an Openstack stand-alone project which aims at providing network connectivity for the compute resources created by nova.

Neutron comprises of multiple services and agents running on multiple nodes. Let us know about the services in the above basic neutron deployment.

  1. neutron-server provides an API layer that acts as an single point of access to manage other neutron services.
  2. L2 agent runs on compute and network nodes which creates various types of networks (flat,local,vlan,vxlan,gre) and provides isolation between tenant networks. It takes care of wiring the VM instances. L2 agent can use Linux bridge or OpenvSwitch or any other vendor technology to perform above tasks.
  3. L3 agent runs on network node allows its users to create routers that connects Layer2 networks. Behind the scenes L3 agent uses linux iptables to perform layer3 forwarding and NAT. It’s possible to create multiple routers with overlapping ip range through network namespaces. Each router creates its own namespace with name based on its UUID.
  4. DHCP agent runs on the network node allocates ip addresses to instances. It uses a dnsmasq instance per network.

Neutron Plugins

Neutron exposes a logical API which defines the network connectivity between the devices created by OpenStack nova. Under the hood all the CRUD operations on an attribute managed by neutron API is being handled by a Neutron Plugin.

As of Mitaka release core API of Neutron manages three kind of entities:

1.Network, representing isolated virtual Layer-2 domains; a network can also be regarded as a virtual (or logical) switch;

2.Subnet, representing IPv4 or IPv6 address blocks from which IPs to be assigned to VMs on a given network are selected;

  1. Port, representing virtual (or logical) switch ports on a given network.

All entities, discussed in detail in the rest of this chapter, support the basic CRUD operations with POST/GET/PUT/DELETE verbs, and have an auto-generated unique identifier

The Modular Layer 2 (ML2) plugin is a python module which providesneutron.neutron_plugin_base_v2.NeutronPluginBaseV2 class with a minimum set of methods that needs to be implemented.

  1. create_network(context, network)

def create_network(self, context, network): result, mech_context = self._create_network_db(context, network) kwargs = {'context': context, 'network': result} registry.notify(resources.NETWORK, events.AFTER_CREATE, self, **kwargs) try: self.mechanism_manager.create_network_postcommit(mech_context) except ml2_exc.MechanismDriverError: with excutils.save_and_reraise_exception(): LOG.error(_LE("mechanism_manager.create_network_postcommit " "failed, deleting network '%s'"), result['id']) self.delete_network(context, result['id']) return result

Let’s spend some time in understanding the above code. Motto is to Create a network, which represents an L2 network segment which can have a set of subnets and ports associated with it.

Parameters: context – neutron api request context

network – dictionary describing the network, with keys as listed in the RESOURCE_ATTRIBUTE_MAP object in neutron/api/v2/attributes.py. All keys will be populated.

Containerizing Virtual Machines

Recently Google joined hands with Mirantis and Intel to distribute Openstack components in docker containers managed with kubernetes. In the above deployment scenario, each and every component of openstack like nova, neutron, keystone etc.. runs in docker containers and are deployed, managed through kubernetes. I wondered if the nova service is running in a container then how is it going to span a vm instance. An another doubt flashed in my mind that is it possible to run a virtual machine inside a docker container? . The answer is yes but with some prerequisites installed and tweaks done on the docker host. In this post i will show you how to run a vm using kvm in a docker container.

As the docker containers don’t have a kernel of its own and it uses hosts kernel, so it’s not possible to insert kvm kernel module. So instead we will add /dev/kvm and /dev/net/tun devices to the container.

Make sure that you installed docker and kvm on the host. Kvm installation can be tested by

$ kvm-ok

INFO: /dev/kvm exists
KVM acceleration can be used

Run an ubuntu kvm image by the following command

$ docker run -e “RANCHER_VM=true” –cap-add NET_ADMIN -v \
/var/lib/rancher/vm:/vm –device /dev/kvm:/dev/kvm \
–device /dev/net/tun:/dev/net/tun rancher/vm-ubuntu -m 1024m -smp 1

Ubuntu vm which is spannned inside a container gets the ip address of your container. First know the ip address of docker container through the following command

$ docker inspect

ssh into the vm you created above

$ ssh ubuntu@

password: ubuntu