VM Story

Contents

  1. What are virtual machines?
  2. What are virtual deployment?
    1. Network functions
    2. Data center
    3. Finance Applications
    4. Cloud computing
    5. Machine learning application
  3. About modern servers
    1. Multi core CPUs
      1. HT, Multicore , socket
    2. Memory
      1. NUMA, pNUMA, Memory banks
    3. Network cards
    4. Hard disk
      1. SSD
  4. How does virtualization work?
    1. Host OS
    2. Guest OS
    3. Virtualization stacks
  5. Virtualization tunning
    1. CPU pinning
    2. Partitioning
    3. Memory
    4. NIC card
  6. Alternative solution of VM
    1. Docker
  7. Multi tenancy
  8. Security aspects in VM
    1. Data security
    2. Hacking
    3. Virus attacks
    4. Information theft
  9. Networking aspects inside VM
    1. Tunnel
    2. North South traffic
    3. East west traffic
  10. Setup a VM in a linux system using KVM
    1. virsh

Courtesy: Internet articles, blogs, books and talks from experts

What are virtual machines?

Virtualization is a way of partitioning hardware components like CPU, memory, NIC etc of a bare metal servers to different software functions.

About modern servers/computers

Multi-core CPUs

Hyper-Threading

A single physical CPU core with hyper-threading appears as two logical CPUs to an operating system. Hyper-threading allows the two logical CPU cores to share physical execution resources.

This can speed things up somewhat—if one virtual CPU is stalled and waiting, the other virtual CPU can borrow its execution resources.

Multiple Cores

Multi core CPU has separate central processing units within same socket. All cores use same main-memory using same data-bus (PCI Bus). A dual-core CPU literally has two central processing units on the CPU chip. A quad-core CPU has four central processing units, an octa-core CPU has eight central processing units, and so on

Socket

Socket is a hardware where CPU unit is inserted in motherboard. Each socket has its own power, cooling mechanism and other hardware like sockets to the RAM, graphic cards etc.

Multiple cores inside same socket is much efficient
o) less latency to communicate between cores, because the cores can communicate more quickly, since part of same socket hardware

Example

For example, from operating system point of view, One socket quad-core hyper-thread will show following information

: Socket                        1
: Core                          4
: Logical processor             8

Memory

NUMA, pNUMA, Memory banks

Multi-processor systems share same main memory. The problem in this case is that, now a system can starve several processors at the same time, notably because only one processor can access the computer’s memory at a time.

Non-uniform memory access (NUMA) is a computer memory design used in multiprocessing, where the memory access time depends on the memory location relative to the processor.

NUMA attempts to address this problem by providing separate memory for each processor, avoiding the performance hit when several processors attempt to address the same memory. For problems involving spread data (common for servers and similar applications), NUMA can improve the performance over a single shared memory by a factor of roughly the number of processors (or separate memory banks)

Of course, not all data ends up confined to a single task, which means that more than one processor may require the same data. To handle these cases, NUMA systems include additional hardware or software to move data between memory banks. This operation slows the processors attached to those banks, so the overall speed increase due to NUMA depends heavily on the nature of the running tasks.

NUMA is used in a symmetric multiprocessing ( SMP ) system. An SMP system is a “tightly-coupled,” “share everything” system in which multiple processors working under a single operating system access each other’s memory over a common bus or “interconnect” path.

When a processor looks for data at a certain memory address, it first looks in the L1 cache on the microprocessor itself, then on a somewhat larger L1 and L2 cache chip nearby, and then on a third level of cache that the NUMA configuration provides before seeking the data in the “remote memory” located near the other microprocessors.

NUMA can be thought of as a “cluster in a box.” The cluster typically consists of four microprocessors interconnected on a local bus to a shared memory on a single motherboard. This unit can be added to similar units to form a symmetric multiprocessing system in which a common SMP bus interconnects all of the clusters. Such a system typically contains from 16 to 256 microprocessors. To an application program running in an SMP system, all the individual processor memories look like a single memory. Each of these clusters is viewed by NUMA as a “node” in the interconnection network. NUMA maintains a hierarchical view of the data on all the nodes.

Example of configuration in a VM

An ESXi host has 2 pSockets, each with 10 Cores per Socket, and has 128GB RAM per pNUMA node, totalling 256GB per host.

If you create a virtual machine with 128GB of RAM and 1 Socket x 8 Cores per Socket, vSphere will create a single vNUMA node. The virtual machine will fit into a single pNUMA node.

If you create a virtual machine with 192GB RAM and 1 Socket x 8 Cores per Socket, vSphere will still only create a single vNUMA node even though the requirements of the virtual machine will cross 2 pNUMA nodes resulting in remote memory access. This is because only the compute dimension in considered.

The optimal configuration for this virtual machine would be 2 Sockets x 4 Cores per Socket, for which vSphere will create 2 vNUMA nodes and distribute 96GB of RAM to each of them.

Example from few sample multi-core servers

Each 10 Core CPU in 2 socket motherboard. No hyper-thread. Core are arranged in 5-5 block with few gaps (0 to 4 and 8 to 12) for die layout or hot plug-able Core module.

cpu_layout.py //(c) Intel Corporation
============================================================
Core and Socket Information (as reported by '/proc/cpuinfo')
============================================================

cores =  [0, 1, 2, 3, 4, 8, 9, 10, 11, 12]
sockets =  [0, 1]

        Socket 0        Socket 1
        --------        --------
Core 0  [0]             [1]
Core 1  [2]             [3]
Core 2  [4]             [5]
Core 3  [6]             [7]
Core 4  [8]             [9]
Core 8  [10]            [11]
Core 9  [12]            [13]
Core 10 [14]            [15]
Core 11 [16]            [17]
Core 12 [18]            [19]

Core distribution with NUMA nodes

lscpu
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                20
On-line CPU(s) list:   0-19
Thread(s) per core:    1
Core(s) per socket:    10
Socket(s):             2
NUMA node(s):          2
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 63
Model name:            Intel(R) Xeon(R) CPU E5-2687W v3 @ 3.10GHz
Stepping:              2
CPU MHz:               3201.476
BogoMIPS:              6203.45
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              25600K
NUMA node0 CPU(s):     0,2,4,6,8,10,12,14,16,18
NUMA node1 CPU(s):     1,3,5,7,9,11,13,15,17,19

Core with hyper thread enabled

cpu_layout.py  //(c) Intel Corporation
============================================================
Core and Socket Information (as reported by '/proc/cpuinfo')
============================================================
cores =  [0, 1, 2, 3, 4, 8, 9, 10, 11, 16, 17, 18, 19, 20, 24, 25, 26, 27]
sockets =  [0, 1]
         Socket 0        Socket 1
             --------        --------
Core 0  [0, 36]         [1, 37]
Core 1  [2, 38]         [3, 39]
Core 2  [4, 40]         [5, 41]
Core 3  [6, 42]         [7, 43]
Core 4  [8, 44]         [9, 45]
Core 8  [10, 46]        [11, 47]
Core 9  [12, 48]        [13, 49]
Core 10 [14, 50]        [15, 51]
Core 11 [16, 52]        [17, 53]
Core 16 [18, 54]        [19, 55]
Core 17 [20, 56]        [21, 57]
Core 18 [22, 58]        [23, 59]
Core 19 [24, 60]        [25, 61]
Core 20 [26, 62]        [27, 63]
Core 24 [28, 64]        [29, 65]
Core 25 [30, 66]        [31, 67]
Core 26 [32, 68]        [33, 69]
Core 27 [34, 70]        [35, 71]

Core distribution in NUMA node

lscpu
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                72
On-line CPU(s) list:   0-71
Thread(s) per core:    2
Core(s) per socket:    18
Socket(s):             2
NUMA node(s):          2
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 63
Model name:            Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz
Stepping:              2
CPU MHz:               2300.170
BogoMIPS:              4604.01
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              46080K
NUMA node0 CPU(s):     0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,44,46,48,50,52,54,56,58,60,62,64,66,68,70
NUMA node1 CPU(s):     1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39,41,43,45,47,49,51,53,55,57,59,61,63,65,67,69,71

Leave a comment