Académique Documents
Professionnel Documents
Culture Documents
FREE
DVD HYBRID AGENT AWS Lambda
ADMIN
ADMIN
Network & Security
AWS Lambda
Scale up and save with serverless
monitoring in the cloud
dm-writecache
Improve random write
throughput to slow disks
55
0 29074 86640 4
WWW.ADMIN-MAGAZINE.COM
MAG DOWN LOAD.oRG
LATEST MAGAZINES
HIGH QUAllTY TRUE-PDF
MAG DOWN LOAD.ORG
Welcome to ADMIN W E LCO M E
W W W. A D M I N - M AGA Z I N E .CO M A D M I N 55 3
S E RV I C E Table of Contents
This issue emphasizes performance Save time and simplify your workday Virtual environments are becoming
tuning, tweaking, and adaptations with with these useful tools for real-world faster, more secure, and easier to set
various tools and techniques. systems administration. up and use. Check out these tools.
30 Rook
Ceph distributed storage and
Kubernetes container orchestration
come together.
16 VoIP and NAT
Secure transparent IP address
transitions through NAT firewalls and 46 Prowler for AWS Security
gateways for Voice over IP. An AWS security best practices
assessment, auditing, hardening, and
forensics readiness tool.
News Security
Find out about the latest ploys and Use these powerful security tools
toys in the world of information to protect your network and keep
22 SchedViz technology. intruders in the cold.
Visualize how the Linux kernel scheduler
allocates jobs among cores and the 8 News 52 Regex Vulnerabilities
performance consequences. • Canonical now offers an Ubuntu Pro Regular expressions are invaluable for
image for AWS checking data, but a vulnerability could
• Vulnerable Docker instance sought make them ripe for exploitation.
out by Monero malware
• Cumulus Networks enhances their 54 nftables
network-specific Linux The latest packet filter implementation
• SUSE adds SUSE Linux Enterprise to promises better performance and
the Oracle Cloud Infrastructure simpler syntax and operation.
4 A D M I N 55 W W W. A D M I N - M AGA Z I N E .CO M
Table of Contents S E RV I C E
15.1
64-BIT
64 Serverless Uptime Monitoring 88 Fibre Channel SAN
Monitoring with AWS Lambda serverless Bottlenecks
technology reduces costs and scales to Discover the possible bottlenecks in
your infrastructure automatically. Fibre Channel storage area networks
and learn how to resolve them.
Service
3 Welcome
94 Performance Tuning Dojo
4 Table of Contents
6 On the DVD
Your sensei reveals three of his favorite
benchmarking tools: time, hyperfine,
See p 6 for details
98 Call for Papers
and bench.
W W W. A D M I N - M AGA Z I N E .CO M A D M I N 55 5
S E RV I C E On the DVD
On the DVD
OpenSUSE Leap is a community distribution that shares
a common code base with SUSE Linux Enterprise (SLE)
and coordinates with SLE releases (i.e., SLE is also in
version 15). SUSE recommends Leap for “Sysadmins,
Enterprise Developers, and ‘Regular’ Desktop Users.”
Please note that the image on this DVD is not the Live
version and will try to install the new operating system.
Q Released December 2019
Q Gnome or KDE (Plasma 5.12) desktop, as well as
lightweight options
Q Appropriate for traditional and software-defined
infrastructure
Q Comprehensively tested for hardened codebase
Resources
6 A D M I N 55 W W W. A D M I N - M AGA Z I N E .CO M
NEWS ADMIN News
Tech News
Canonical Now Offers an Ubuntu Pro Image for AWS
Ubuntu rules the cloud. According to The Cloud Market (https://thecloudmarket.com/stats#/by_plat-
form_definition), Ubuntu is the most widely used cloud image used on the Amazon Elastic Compute
Cloud (with nearly 370K images deployed). Not one to be satisfied with being at the top of the
digital heap, Canonical (https://canonical.com/) – the company behind Ubuntu (https://ubuntu.com/) – has
released a new version of their venerable Ubuntu platform.
Ubuntu Pro was created specifically for Amazon Web Services. This new image ships with the
Canonical standard Ubuntu Amazon Machine Image and layers on top of that security and compli-
ance subscriptions. Specifically, Ubuntu Pro includes:
• Up to 10 years of package and security updates for Ubuntu 18.04, and up to eight years for 14.04
and 16.04
• Kernel Livepatch for continuous security patching without reboots
• Customized FIPS and Common Criteria EAL-compliant components (for environments that re-
quire FedRAMP, PCI, HIPAA, and ISO compliance)
• Patch coverage for Ubuntu’s infrastructure and app repositories for all types of open source services
• System management with Landscape
• Integration with AWS security and compliance features, such as AWS Security Hub and AWS
CloudTrail (applicable from 2020)
• Subscriptions available for Ubuntu Advantage support packages (https://ubuntu.com/support)
Ubuntu Pro is available via the AWS Marketplace (https://aws.amazon.com/marketplace/search/results?x=0&
y=0&searchTerms=ubuntu+pro) and the prices range from free to $0.33 per hour (for software plus AWS
usage fees).
(which downloads a bash script that would install the XMRig cryptocurrency miner).
The issue was discovered by security firm Bad Packets LLC. Bad Packets also found
that the malware contained a self-defense measure that not only disables security, but
© wamsler, 123R
F.com shuts down processes associated with rival cryptocurrency-mining botnets.
8 A D M I N 55 W W W. A D M I N - M AGA Z I N E .CO M
ADMIN News NEWS
To avoid such a vulnerability, Troy Mursch (cofounder and chief research officer of Bad Packets
LLC) says Docker container admins should immediately check to see if they are exposing API end-
points to the Internet. If so, admins should close exposed ports and stop/delete any unrecognized
containers.
W W W. A D M I N - M AGA Z I N E .CO M A D M I N 55 9
F E AT U R E S Linux dm-writecache
Kicking
It Into
Overdrive
With the dm-writecache Linux kernel module, you can gain a noticeable improvement in random write
throughput when writing to slower disk devices. By Petros Koutoupis
The idea of block I/O caching isn’t What Is I/O Caching? for read operations, the general idea
revolutionary, but it still is an extremely is to read it from the slower device
complex topic. Technically speaking, A computer cache is a component no more than once and maintain
caching as a whole is complicated and (typically leveraging some sort of that data in memory for as long as it
a very difficult solution to implement. It performant memory) that temporar- is still needed. Historically, operat-
Lead Image © lightwise, 123R.com
all boils down to the I/O profile of the ily stores data for current write and ing systems have been designed to
ecosystem or server on which it is be- future read I/O requests. In the event enable local (and volatile) random
ing implemented. Before I dive right in, of write operations, the data to be access memory (RAM) to act as
I want to take a step back, so you un- written is staged and will eventu- this temporary cache. Although it
derstand what I/O caching is and what ally be scheduled and flushed to the performs at stellar speeds, it has its
it is intended to address. slower device intended to store it. As drawbacks:
10 A D M I N 55 W W W. A D M I N - M AGA Z I N E .CO M
Linux dm-writecache F E AT U R E S
W W W. A D M I N - M AGA Z I N E .CO M A D M I N 55 11
F E AT U R E S Linux dm-writecache
mentioned already, the focus of Other Caching Tools distribution running a 4.18 kernel or
dm-writecache is strictly writeback later and to have a version of Logical
caching and nothing more: no read Tools earning honorable mention Volume Manager 2 (LVM2) installed
caching, no write-through caching. include: at v2.03.x or above. I will also show
The thought process for not caching Q RapidDisk. This dynamically al- you how to enable a dm-writecache
reads is that read data should already locatable memory disk Linux volume without relying on the LVM2
be in the page cache, which makes module uses RAM and can also be framework and instead manually in-
complete sense. used as a front-end write-through voke dmsetup.
and write-around caching node for
Listing 2: Volume Labels slower media. Identifying and Configuring
Q Memcached. A cross-platform us-
$ sudo pvs
erspace library with an API for ap-
Your Environment
PV VG Fmt Attr PSize PFree
/dev/nvme0n1 lvm2 --- <232.89g <232.89g plications, Memcached also relies Identifying the storage volumes and
/dev/sdb lvm2 --- <6.37t <6.37t on RAM to boost the performance configuring them is a pretty straight-
of databases and other applica- forward process (Listing 1).
tions. In my example, I will be using both
Listing 3: Volume Group Created Q ReadyBoost. A Microsoft product, /dev/sdb and /dev/nvme0n1. As you
$ sudo vgs ReadyBoost was introduced in might have already guessed, /dev/
VG #PV #LV #SN Attr VSize VFree Windows Vista and is included in sdb is my slow device, and /dev/
vg-cache 2 0 0 wz--n- 6.59t 6.59t later versions of Windows. Similar nvme0n1 is my NVMe fast device.
to dm-cache and bcache, ReadyBoost Because I do not necessarily want
enables SSDs to act as a cache for to use my entire SSD (the rest could
Listing 4: Physical Volumes Present slower HDDs. be used as a separate standalone
$ sudo pvs or cached device elsewhere), I will
PV VG Fmt Attr PSize PFree Working with dm-writecache place both the SSD and HDD into a
/dev/nvme0n1 vg-cache lvm2 a-- 232.88g 232.88g single LVM2 volume group. To be-
/dev/sdb vg-cache lvm2 a-- <6.37t <6.37t The only prerequisites for using gin, I label the physical volumes for
dm-writecache are to be on a Linux LVM2:
12 A D M I N 55 W W W. A D M I N - M AGA Z I N E .CO M
Linux dm-writecache F E AT U R E S
vg-cache /dev/sdb Listing 8: Fast Logical Volume Created from NVMe Drive
Logical volume "slow" created.
$ sudo lvs vg-cache -o+devices
LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert Devices
and verify that the logical volume has
fast vg-cache -wi-a----- 10.00g /dev/nvme0n1(0)
been created (Listing 5). slow vg-cache -wi-a----- 5.93t /dev/sdb(0)
Using the fio benchmarking utility, I
run a quick test with random write I/
Os to the slow logical volume and get Listing 9: <fio> Test
a better understanding of how poorly $ sudo fio --bs=4k --ioengine=libaio --iodepth=32 --size=10g --direct=1 --runtime=60 \
it performs (Listing 6). --filename=/dev/vg-cache/fast --rw=randwrite --numjobs=1 --name=test
I see an average of 1.4 kibibytes test: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=32
per second (KiBps) throughput. Al- fio-3.12
though that number is not great, it Starting 1 process
is expected when sending a number Jobs: 1 (f=1): [w(1)][100.0%][w=654MiB/s][w=167k IOPS][eta 00m:00s]
of small random writes to an HDD. test: (groupid=0, jobs=1): err= 0: pid=1225: Sat Oct 12 19:20:18 2019
Remember, with mechanical and write: IOPS=168k, BW=655MiB/s (687MB/s)(10.0GiB/15634msec); 0 zone resets
movable components, a large per- [ ... ]
centage of the time is spent seeking Run status group 0 (all jobs):
to new locations on the disk platters. WRITE: bw=655MiB/s (687MB/s), 655MiB/s-655MiB/s (687MB/s-687MB/s), io=10.0GiB (10.7GB), run=15634-15634msec
If you recall, this method introduces
latency and will take much longer
for the disk drive to return with an Listing 10: Conversion
acknowledgment that the write is
$ sudo lvs -a vg-cache -o devices,segtype,lvattr,name,vgname,origin
persistent to disk. Devices Type Attr LV VG Origin
Now, I will carve out a 10GB logi- /dev/nvme0n1(0) linear Cwi-aoC--- [fast] vg-cache
cal volume from the SSD and label slow_wcorig(0) writecache Cwi-a-C--- slow vg-cache [slow_wcorig]
it fast, /dev/sdd(0) linear owi-aoC--- [slow_wcorig] vg-cache
W W W. A D M I N - M AGA Z I N E .CO M A D M I N 55 13
F E AT U R E S Linux dm-writecache
Now, convert both volumes into a $ sudo blockdev --getsz /dev/vg-cache/slow $ sudo dmsetup message /dev/mapper/U
single cache volume, 12744687616 wc 0 flush
$ sudo lvconvert --type writecache U You will plug this number into the Now it is safe to enter
--cachevol fast vg-cache/slow next command and create a write-
cache device mapper virtual node $ dmsetup remove /dev/mapper/wc
activate the new volume, called wc with a 4K blocksize:
to remove the mapping.
$ sudo lvchange -a y vg-cache/slow $ sudo dmsetup create wc --table U
"0 78151680 writecache s U
Conclusion
and verify that the conversion took /dev/vg-cache/slow /dev/U
effect (Listing 10). vg-cache/fast 4096 0" By using the newly introduced
Now it’s time to run fio (Listing 11). dm-writecache device mapper Linux
At about 460MiBps, it’s almost 330 Assuming that the command returns kernel module, you are able to
times faster than the plain old HDD. without an error, a new (virtual) de- achieve a noticeable improvement
This is awesome. Remember, the vice node will be accessible from / in random write throughput when
NVMe is a front-end cache to the dev/mapper/wc. This is the dm-write- writing to slower disk devices.
HDD, and although all writes are cache mapping. Now you need to run Also, nothing is preventing you
hitting the NVMe, a background fio again, but this time to the newly from using the remainder of the
thread (or more than one) schedules created device (Listing 12). NVMe device in the original vol-
flushes to the backing store (i.e., Although it isn’t near the standalone ume group and mapping it as a
the HDD). NVMe speeds, you can see a wonderful cache to other, slower devices on
If you want to remove the volume, improvement of random write opera- your system. Q
type: tions. At 90 times the original HDD per-
formance, you observe a throughput of
$ sudo lvconvert --splitcache vg-cache/slow 136MiBps. I am not entirely sure what The Author
parameters are not being configured for Petros Koutoupis is currently a senior perfor-
Now you are ready to map the NVMe the volume during the dmsetup create mance software engineer at Cray for its Lustre
drive as the writeback cache for the to match that of the earlier LVM2 ex- High Performance File System division. He is
slow spinning drive with dmsetup ample, but this is still pretty darn good. also the creator and maintainer of the Rapid-
(in the event that you do not have a To remove the device mapper cache Disk Project. Petros has worked in the data
proper version of LVM2 installed). To mapping, you first need to flush storage industry for well over a decade and has
invoke dmsetup, you first need to grab forcefully (and manually) all pending helped to pioneer many of the technologies
the block count of the slow device: write data to disk: unleashed in the wild today.
14 A D M I N 55 W W W. A D M I N - M AGA Z I N E .CO M
F E AT U R E VoIP and NAT
Number, Please
We show you how to secure transparent IP address transitions through
NAT firewalls and gateways for Voice over IP. By Mathias Hein
Mapping internal IP addresses to ment of various strategies by the In- private and public IP addresses.
external IP addresses is essential for ternet Engineering Task Force (IETF) NAT uses tables to assign the IP ad-
Voice over IP (VoIP) communications for covering a wide environment with dresses of a private (internal) net-
through network address translation the available addresses. One of the in- work to public IP addresses (Figure
(NAT) gateways and firewalls. Session termediate solutions, called NAT (RFC 1). The internal IP addresses remain
Initiation Protocol (SIP) is the signal- 3022) [1] or PAT (port and address hidden. NAT services exchange the
ing protocol for establishing VoIP con- translation), uses conversion between sender and receiver IP addresses in
nections; however, SIP-based com-
munications have problems working
through firewalls and session border
controllers, and all too often, VoIP
calls or some unified communications
functions fail because of NAT. In this
article, I show you how IT manag-
ers can resolve these issues with the
session traversal utilities for NAT
(STUN), traversal using relays around
NAT (TURN), and Interactive Connec-
tivity Establishment (ICE) techniques
Lead Image © studiom1, 123RF.com
NAT Characteristics
Some years ago, the limited availabil-
ity of IP addresses led to the develop- Figure 1: NAT links the internal network with the Internet through the translation of IP addresses.
16 A D M I N 55 W W W. A D M I N - M AGA Z I N E .CO M
VoIP and NAT F E AT U R E
the IP header. The simplest form of tive NAT mapping address. Any longs. If there is a match, the address
address conversion is known as static attempt by an external machine is converted and forwarded to the
NAT. Address translation converts a to send the packets to another ad- right IP address on the internal net-
private IP address sent from a private dress mapping will result in the work – theoretically.
address space into a public IP address packets being dropped. In practice, however, this process is
to be received in a public address far more complicated. For example,
space. In the reply packet, this con- PAT Mechanisms two internal machines communicate
version takes place in reverse order. with a common external IP address
The types of NAT systems include: The PAT mechanism maps all IP ad- and both transmit a DNS request to
Q Full cone NAT: IP address conver- dresses of a private network to a the DNS server operated by the ISP
sion takes place independently of single public IP address (Figure 2). for the company in question. The
a previous outbound connection In this way, a completely private net- DNS server operated by the ISP re-
on the basis of fixed address en- work only needs a single registered sides on the external network from
tries. Every user of the external public IP address. Some manufactur- the point of view of the DNS clients,
network can send their packets to ers also refer to the PAT function as which means that all DNS queries
the public IP port. The packets are “hidden NAT.” In practice, if two in- always pass through the NAT pro-
automatically forwarded from the ternal computers share an external IP cess and address conversion always
NAT system to the computer with address on the basis of the private IP takes place.
the corresponding address. addresses, an address conflict inevita- The DNS clients transmit their DNS
Q Restricted cone: Address mapping bly occurs. If both internal computers requests to the DNS server on the
is only performed if it was trig- communicate simultaneously with public network. The packets transmit-
gered by an outgoing connection. external communication partners, ted to the public IP network thus con-
If an internal computer sends its the NAT component must decide to tain the following IP/TCP/UDP infor-
packets to an external computer, which internal computer the received mation: the same IP source address,
the NAT system uses mapping to packet will be forwarded. Because the same IP destination address, and
translate the client address. The the routing or forwarding decision is the same destination port number
external computer can then send based only on the IP addresses inte- (UDP port 53 for DNS queries). Only
its packets directly back to the grated into the IP header, this prob- the source port numbers differ in the
internal client (via address map- lem cannot be solved. DNS queries, and it is exactly this in-
ping). However, the NAT system As with dynamic address mapping, formation that is used to identify the
blocks all incoming packets from the NAT component only has to cre- internal connections.
other senders. ate a corresponding mapping table Most operating systems start the as-
Q Port-restricted cone: Similar to during the connection setup and, signment of the sender ports with the
restricted cone NAT, address map- with that, is able to assign the indi- value 1025 and then assign the source
ping only takes place if it was vidual connections to the correct IP port numbers sequentially to the indi-
triggered by an outgoing connec- addresses. The NAT process simply vidual connections. Under certain cir-
tion (identified by the IP and port searches the mapping table for the cumstances, both IP transmitters can
address). connection to which the packet be- use the same source port numbers for
Q Symmetric cone: Fundamentally
different from the NAT mecha-
nisms described so far, mapping
from the internal to the public
IP port address depends on the
target IP address of the packet
to be transmitted. For example,
if a client with the address pair
10.0.0.1:8000 is transmitting to
external computer B, address map-
ping is performed to the external
address pair 202.123.211.25:12345.
If the same client sends its packets
from the same port (10.0.0.1:8000)
to a different destination address
(computer A), it is mapped to the
address 202.123.211.25:45678. The
external hosts (A and B) can only
send their packets to the respec- Figure 2: PAT translates all internal IP addresses into just one public IP address.
W W W. A D M I N - M AGA Z I N E .CO M A D M I N 55 17
F E AT U R E VoIP and NAT
communication with the DNS server. Q Other local computers cannot use the information can be transmitted
In this case, a conflict is unavoidable. this port because of the fixed as- over TCP as well as UDP. The new
To avoid this statistical possibility of signment of a port number to a STUN can also be used to negotiate
a perfect address equation, the PAT specific computer. optional attributes and authentication
process not only converts the IP ad- Q Many applications select the port with VoIP servers.
dresses but also the port numbers, dynamically, making it difficult to
ensuring that the internal IP compo- determine beforehand or to select TURN as a Last Resort
nents always use an individual port a port from a port range.
number to communicate with the The STUN mechanism for transpar- STUN enables a client to determine
external IP resources. ently routing VoIP streams across the correct transport address on
NAT systems enables a VoIP endpoint which the terminal device can be
SIP Log Problems to determine the correct public IP reached from the public network. If
address, provides a mechanism for direct communication between the
SIP, according to RFC 3261 [2], is to- checking connections between two two SIP terminals is not possible and
day’s standard signaling mechanism endpoints, and provides additional STUN does not provide functional ad-
for real-time communication streams mechanisms for maintaining NAT dress mapping, the services of a relay
in an IP environment. However, SIP- address mappings using a keepalive computer are used. This mechanism
based communication also has a flaw: protocol (Figure 3). was published in RFC 5766 – “Tra-
A terminal device on the LAN cannot An earlier version of STUN described versal Using Relays Around NAT
communicate directly with a commu- in RFC 3489 [3] – now referred to as (TURN)” [5].
nication partner if one or more NAT “classic STUN” – required a complete The goal of TURN is to provide the
functions (e.g., in firewalls) exist in revision of the STUN concept on the client a publicly accessible address/
the communication channel for secu- basis of experience gained in prac- port tuple even in these situations.
rity reasons. tice. The new STUN (according to The only way to achieve this in all
When NAT converts IP addresses RFC 5389 [4]) is now just a mecha- cases is to route the data through
as described above, some protocols, nism used in conjunction with other a TURN server that can be reached
including SIP, communicate the specifications (e.g., SIP-OUTBOUND, on the public network. For this pur-
endpoint addresses when establish- TURN, and ICE). pose, a client on the TURN server
ing a connection. If the addresses The task of a standalone STUN can request an endpoint on which it
do not match, the terminals do not server is to provide the correct trans- will then be publicly accessible. The
communicate. Several NAT traversal port addresses using the STUN bind- server will then forward the packets
methods can now be used to elimi- ing function. A STUN server must be to the client.
nate this problem – but more about able to send and receive messages Because TURN behaves like port-
that later. by the UDP and TCP protocols. A restricted NAT here, the process does
plain vanilla STUN server provides not undermine the security functions
NAT Traversal and VoIP only a partial solution to the prob- of NAT and firewalls. For a client that
lem of correct transfer over NAT has defined an endpoint on a server
One tried and tested means for gateways. For this reason, a STUN via TURN, it must first send a packet
working around NAT components is server always collaborates with other to the clients from which it wants to
manual device configuration, wherein components. STUN is more like a receive packets. Operating servers
NAT is configured to forward cer- tool within a more comprehensive on well-known ports behind NAT is
tain data packets to a specific local NAT gateway solution. The following therefore not possible. The protocol
computer. NAT usually determines STUN uses are currently defined: is based on STUN and shares its mes-
forwarding on the basis of the des- Q Interactive connectivity establish- sage structure and basic mechanisms.
tination port in the data packet and ment (ICE) Although TURN always makes it
therefore requires a port number (or Q Client-oriented SIP connections possible to establish a connection,
port range) and the IP address of the to external resources (SIP-OUT- redirecting all traffic through the
local computer for port forwarding. BOUND) TURN server places a heavy load on
With the help of fixed forwarding Q NAT behavior discovery (BEHAVE- the server. Therefore, TURN should
by port number, the local computer NAT) only be considered as a last resort if
outside the network can be reached For VoIP endpoints, STUN provides a other methods like STUN do not lead
on a fixed port (range). The big ad- mechanism for correctly determining to success.
vantage of port forwarding is that it is the IP address and the port currently
the only NAT traversal technique that used at the other end of a NAT gate- ICE as a Lubricant
actually works for many applications, way or router (transition between
although it is offset by a number of the private and a public IP address In 2004, the IETF began to develop
important disadvantages: range). In contrast to classic STUN, the ICE technique. For any type of
18 A D M I N 55 W W W. A D M I N - M AGA Z I N E .CO M
VoIP and NAT F E AT U R E
session protocol, ICE ensures trouble- dress information contained in the ICE goes through six steps to estab-
free passage through all types of NAT invitation message are known as the lish a connection:
and firewalls. ICE was designed so “candidates,” which are the potential Step 1. The call initiator collects
that the required addressing functions communication endpoints for the SIP the IP and port addresses of all po-
can be implemented with the SIP agent. When an invitation message tential communication candidates
protocol and thus also with the Ses- reaches the call recipient, the latter before the actual call. The first
sion Description Protocol (SDP). ICE also runs the ICE address collection candidates are sought by the inter-
acts as a uniform framework around functions and transmits specific ad- faces of the local computer (host).
STUN and TURN. Additionally, ICE dresses in its SIP reply. Both agents If the host has several interfaces,
supports TCP as well as UDP media then check the possible connections the agent obtains a candidate from
sessions. that are implemented by STUN mes- each interface. The candidates of
Instead of only STUN or TURN, an sages from an agent to the other end the computer interfaces (including
ICE client is able to determine the of the communication path. A check virtual interfaces) are referred to
required addresses with both meth- is performed to discover which pair as host candidates. The agent then
ods. Both addresses are transmitted of candidates works. Once a func- directly contacts the STUN server
to the communication partner along tioning pair of candidates has been on any host interface. The results
with the local interface addresses found, the media stream begins to of these tests are server-reflexive
in the subsequent SIP call setup flow between the two communica- candidates, which translate to the IP
message. The elements of the ad- tion partners. and port addresses of the outermost
W W W. A D M I N - M AGA Z I N E .CO M A D M I N 55 19
F E AT U R E VoIP and NAT
NAT on the path between the agent Step 5. The caller and the called party time. The caller usually confirms the
and the STUN server and is usually have exchanged the necessary SDP candidate pair found by this process
the NAT facing the public Internet. messages. The agents involved in the to the other agent, concluding the
Finally, the agent also receives all call know all candidates for transfer- selection process.
the candidates from TURN servers. ring the media streams. Note that cer- All previous processes (candidate
These IP and port addresses reside tain applications (e.g., videophones) collection and connection tests) take
on the relay servers. generate more than one media place before the phone rings at the
Step 2. Each candidate is prioritized stream. ICE then performs the most called agent’s end; consequently,
after the agent has collected its can- important part of its tasks. Each agent the connection setup is minimally
didates. The highest priority defines pair knows the possible candidates delayed by ICE. The advantage, how-
the candidate to be used. As a rule, and the corresponding candidates of ever, is that ghost calls and miscon-
relay candidates receive the lowest its peer – the list of possible candi- nections (i.e., the phone rings, but
priority because they have the high- date pairs. Each agent calculates the the called party hears nothing) are
est voice delay. priority of the candidate pairs (com- eliminated.
Step 3. According to the identified bined priority of the individual candi- If the ICE handshake reveals that
and prioritized candidates, the agent dates), and the candidate couple with the candidate pair differs from the
generates its SIP INVITE request to the highest priority has the optimal default setting selected in the SDP
establish the call. The SDP header path between the two communication message (IP and port addresses),
is part of the INVITE request, which partners. the caller initiates an update of the
the caller uses to transmit the con- Step 6. For the final review of the default setting on the basis of a SIP
nection information required for the candidate pair, ICE conducts con- re-INVITE message to synchronize
call, including the codec, its param- nection checks on the basis of STUN all intermediate SIP elements that do
eters, and the IP and port addresses transactions from each agent. The not support ICE but need to know
to be used. ICE extends SDP by add- STUN transactions use the IP and through which addresses the media
ing some new attributes. The most port addresses of the selected candi- streams are running.
important of these is the candidate date pairs, which grow in proportion
attribute. Because the agent might to the square of the number of candi- Conclusions
know more than one possible can- dates, and control their bidirectional
didate, it transmits a separate can- accessibility. This process makes a Correct mapping of the internal IP
didate attribute in the SDP header parallel review of each candidate addresses to external IP addresses
for each possible media stream. The pair problematic. ICE checks the can- is essential to enabling unhindered
attribute contains the IP and port ad- didate pairs sequentially by priority. VoIP communication through NAT
dresses for the candidate concerned, Every 20ms each agent generates a gateways and firewalls. STUN, TURN,
its priority, and the type of candidate STUN transaction for the next pair and ICE not only ensure a transparent
(host, server reflexive, or relay). Ad- of candidates in the list. If an agent transition via NAT gateways but also
ditionally, the SDP message contains receives a STUN request for a candi- improve the security of the SIP envi-
information for safeguarding the date pair, it immediately generates a ronment as a whole. Q
STUN functions. STUN transaction in the opposite di-
Step 4. SIP transmits the SIP INVITE rection, known as a triggered check,
message with the corresponding accelerating the entire ICE process. Info
SDP information over the network. After completing the review of a [1] RFC 3022: [https://tools.ietf.org/html/
If the called agent also supports ICE, candidate pair, the agent knows that rfc3022]
the phone will ring. The party be- it has found a connection pair for [2] RFC 3261: [https://tools.ietf.org/html/
ing called collects its candidates and transmitting the media stream cor- rfc3261]
generates a preliminary SIP response, rectly. Because the checks are carried [3] RFC 3489: [https://tools.ietf.org/html/
which signals to the caller that the out according to the priorities of the rfc3489]
SIP request is still being processed. candidate pairs, the first functioning [4] RFC 5389: [https://tools.ietf.org/html/
The preliminary response contains an candidate pair represents the best rfc5389]
SDP message with the communica- possible connection between the two [5] RFC 5766: [https://tools.ietf.org/html/
tion partner’s candidates. communication partners at the given rfc5766]
20 A D M I N 55 W W W. A D M I N - M AGA Z I N E .CO M
F E AT U R E S SchedViz
Behind Time
The Google SchedViz tool lets you visualize how the Linux kernel
scheduler allocates jobs among cores and whether they are being
usurped. By Samuel Bocetta
SchedViz [1] is one of a variety of priority to the various tasks that your is doing. The kernel is instrumented
open source tools recently released system needs to run? with hooks called tracepoints; when
by Google that allows you to visual- A round-robin approach would assign certain actions occur, any code
ize how your programs are being each task processing time in such a hooked to the relevant tracepoint is
handled by Linux kernel scheduling. way that each would receive equal called with arguments that describe
The tool allows you to see exactly time. In practice, however, some the action. This data is referred to as
how your system is treating the vari- tasks – such as those related to the a “trace.”
ous tasks it is running and allows core functions of your OS – are of SchedViz captures these “traces” and
you to fine-tune the way resources higher priority than others. allows you to visualize them. A com-
are allotted to each task. SchedViz makes use of a basic fea- mand-line script can capture the data
SchedViz is designed to overcome ture of the Linux kernel: the ability over a specified time and then load it
a specific problem: The basic Linux to capture data in real time about into SchedViz for as much analysis as
tools available for scheduling [2] what each core of a multicore system you care to apply. You can also keep
don’t allow you to see very much.
In practice, this means that most
people guess how to schedule system
resources, and given the complexity
of modern systems, these guesses are
often wrong.
Multiprocessing
Modern operating systems (OSs)
execute multiple processes simultane-
ously by splitting the computing load
across multiple cores and running
Lead Image © 36clicks, 123RF.com
22 A D M I N 55 W W W. A D M I N - M AGA Z I N E .CO M
SchedViz F E AT U R E S
saved traces to compare any modifi- In other situations, a process that Users can already pin threads explic-
cation you make. you would like to prioritize is made itly to a CPU or a set of CPUs or can
A basic example of a trace loaded to wait while another executes. exclude threads from specific CPUs
into SchedViz for viewing is seen in SchedViz allows you to see this with the use of features like sched_se-
Figure 1. Two processes are running happening. taffinity() [6] or cgroups [7], which
simultaneously (green and blue). Q Evaluate different scheduling poli- are both available in most Linux
The blue process, known as the cies. Linux has many ways of im- environments. However, such restric-
“victim thread,” is likely to suffer a plementing scheduling policies [3] tions can also make scheduling even
performance lag because it has been that determine which processes tougher. SchedViz allows you to see
interrupted by the green process, will run where and for how long. exactly how and when these rules are
which has swapped in to the blue If you are seeking to improve sys- being enforced, allowing you to as-
thread’s core. tem performance by manipulating sess their effectiveness.
In practice, behavior like that in Fig- these policies, SchedViz is invalu-
ure 1 is likely to result in suboptimal able, because it allows you to see a Installing SchedViz
performance. There is no obvious rea- visual representation of how they
son why the green process swapped are being applied. SchedViz is hosted on GitHub [8],
cores right at the end of its processing At the moment, the primary use that and the process for installing it will
time, but by doing so, it interrupts most system administrators will have be familiar to most advanced users.
another thread running on a different for SchedViz is to manage the way To begin, clone the repository:
core. If the blue process needs to run tasks are assigned across multicore
quickly, particularly if it is a critical processors. As Google put it in their git clone U
system process, you would like to blog, “not all cores are equal” [1], https://github.com/google/schedviz.git
stop this kind of behavior. and that’s because the structure of
SchedViz allows you to see issues like the memory hierarchy found in most Next, install the dependencies. Be-
this on a pannable, zoomable graph modern systems can make it costly cause SchedViz has quite a few of
that shows all the cores of a multicore to shift a thread from one core to these, it requires yarn, so head to
system. A more detailed trace of a another, especially if that shift moves the Yarn website [9] and follow the
three-core system is seen in Figure 2. it to a new non-uniform memory ac- instructions there. You should also
Although it might seem inefficient to cess (NUMA) node [4]. This move make sure your version of Node.js is
allocate resources in this way, with is particularly a problem when it later than 10.9.0.
each process getting a short period of comes to handling modern encryption Now you need the GNU building
time before the core swaps to another algorithms [5] that are becoming an tools and an unzip utility, so install
process, this is how typical core- integral part of working with web ser- them now if you don’t have them. On
scheduling processes work. vices and cloud storage. Debian, you can run:
The SchedViz visualization tools aims
to achieve a number of key goals:
Q Quantify task starvation caused
by round-robin queueing. In the
above example, it might be that
the blue process is running slowly
because the yellow process is as-
signed the same priority. This case
is known as “task starvation,” and
can be a significant drain on per-
formance in complex systems.
Q Identify primary antagonists steal-
ing work from critical threads.
Some processes, as seen, steal a
lot of resources from others that
may be more important. The “pri-
mary antagonists” are the biggest
drain on the performance of many
systems, and finding out which
processes are acting in this way is
extremely useful.
Q Determine when core allocation
choices yield unnecessary waiting. Figure 2: The core at the bottom is alternating between two threads (yellow and blue).
W W W. A D M I N - M AGA Z I N E .CO M A D M I N 55 23
F E AT U R E S SchedViz
Figure 3: The collections page is the main SchedViz menu. From here, you can perform all of the core functions of the program.
24 A D M I N 55 W W W. A D M I N - M AGA Z I N E .CO M
SchedViz F E AT U R E S
W W W. A D M I N - M AGA Z I N E .CO M A D M I N 55 25
MAG DOWN LOAD.oRG
LATEST MAGAZINES
HIGH QUAllTY TRUE-PDF
MAG DOWN LOAD.ORG
TO O L S Exchange Hybrid Agent
Mailbox
Migration
Exchange’s Hybrid Agent takes the complexity out of migrating
from a local Exchange environment to Exchange Online.
By Christian Schulenburg
When it comes to leveraging the uses Hybrid Modern Authentica- Access Server (CAS) role. Exchange
full Office 365 feature set, migrat- tion, you need to keep on using the 2010 or newer is required. It must be
ing mailboxes to Exchange Online classic Exchange Hybrid topology. installed on Windows Server 2012 R2
is one of the greatest challenges. Additionally, Hybrid Agent does not or 2016 with .NET Framework 4.6.2 or
Unlike migrating within an organiza- cover MailTips, Message Tracking, higher. If Hybrid Agent and Exchange
tion, moving to Exchange Online is and Multi-Mailbox Search. If your are set up on a server, you need to en-
problematic, because mailboxes are setup uses these functions across the sure compatibility between Exchange
shifted between two separately man- board, again, keep on using the clas- and .NET [1] to avoid the use of an
aged organizations. sic model. unsupported combination. Beyond
This connection between an on-prem- Hybrid Agent is constantly being opti- this, the server only needs to be a do-
ises Exchange instance and Exchange mized – improvements to the preview main member and have access to the
Online is known as a hybrid connec- were delivered just two months after Internet.
tion. Microsoft refers to this connec- the first launch. In its first release in The only required output connec-
tion as the Exchange Modern Hybrid February 2019, Hybrid Agent only tions are ports 443 and 80; the latter
and has extended its Hybrid Configu- supported a single installation, which is only used for certificate revocation
ration Wizard (HCW) with Hybrid was a big limitation because it of- list checks. The agent communicates
Agent (Figure 1) to facilitate the con- fered no redundancy options, free/ with Azure Application Proxy, an
nection. With HCW, Hybrid Agent busy information could not be viewed Azure proxy service with a client-
establishes a connection between the in an offline scenario, and move ac- specific endpoint that leads to your
local Exchange and Exchange Online, tions were not carried out. With the online environment. Availability
reducing the requirements for exter- April 2019 updated version, several information and mailbox migrations
nal DNS records, certificate updates, agents now can be installed in a local are managed by the Azure Applica-
and incoming firewall network con- organization, and you can now view tion Proxy. If the agent is not in-
nections – all of which made the task status information for Hybrid Agent stalled on an Exchange server with
complex in the past. and use Hybrid Agent instead of spe- CAS, you also need to enable ports
cific Exchange servers to address load 5985 and 5986 to the CAS servers
Lead Image © Natee Srisuk, 123RF.com
26 A D M I N 55 W W W. A D M I N - M AGA Z I N E .CO M
Exchange Hybrid Agent TO O L S
fore installation. Start by integrating lessly, I am selecting the minimal which you can download after con-
the script as follows: configuration here. If you do not see firming.
the Hybrid configuration window, you Once this is done, set up the send
Import Modules .\HybridManagement.psm1 have already successfully set up a hy- and receive connectors. Email traffic
brid topology. is secured by TLS; you need to select
The following call runs the actual Next, you need to check the domain a valid certificate for this in the next
test: ownership. Verification is similar to step. The external hostname must be
domain verification in Office 365: En- entered in the certificate; it must be
Test-HybridConnectivity -testO365Endpoints ter the displayed DNS-TXT record in possible to resolve this name exter-
your DNS zone and confirm owner- nally, and it must be accessible over
For everything to run smoothly, you ship. Now select the topology. Hybrid port 25. Hybrid Agent is not respon-
need to make sure that at least one Agent is offered to you as part of the sible for routing email, only for mak-
identical email domain is set up as Exchange Modern Hybrid topology, ing the appropriate configurations. You
the accepted domain in each Ex-
change organization.
W W W. A D M I N - M AGA Z I N E .CO M A D M I N 55 27
TO O L S Exchange Hybrid Agent
28 A D M I N 55 W W W. A D M I N - M AGA Z I N E .CO M
TO O L S Rook
Castling
Ceph distributed storage and Kubernetes container orchestration come together with Rook. By Martin Loschwitz
Hardly a year goes by that does not The most important advantage is in almost every scenario. Many Open-
see some kind of disruptive technol- undoubtedly that Ceph integrated Stack vendors are migrating their dis-
ogy unfold and existing traditions into the Kubernetes workflow with tributions to Kubernetes, and because
swept away. That’s what the two the help of Rook [1] can be controlled OpenStack almost always comes with
technologies discussed in this article and monitored just like any other Ceph in tow, Kubernetes will also
have in common. Ceph captured the Kubernetes resource. Kubernetes is include Ceph. However, I’ll show you
storage solutions market in a flash. aware of Ceph and its topology and how to get started if you don’t have a
Meanwhile, Kubernetes shook up the can adapt it if necessary. However, a ready-made OpenStack distribution –
market for virtualization solutions, setup in which Ceph sits under Ku- and don’t want one – with a manual
not only grabbing market share off bernetes and only passes persistent integration.
KVM and others, but also off industry volumes through to it, knowing noth-
giants such as VMware. ing about Ceph, is not integrated and The Beginnings: Kubernetes
When two disruptive technologies lacks homogeneity.
such as containers and Ceph are mix- Any admin wanting to work with
ing up the same market, collisions Getting Started with Rook Rook in a practical way first needs a
can hardly be avoided. The tool that working Kubernetes, for which Rook
Lead Image © satori, 123RF.com
brings Ceph and Kubernetes together ADMIN introduced Rook in detail needs only a few basic necessities.
is Rook, which lets you roll out a some time ago [2], looking into its In this article, I do not assume that
Ceph installation to a cluster with advantages and disadvantages. Since you already have a running Kuber-
Kubernetes and offers advantages then, much has happened. For ex- netes available, which gives those
over a setup where Ceph sits “under” ample, if you work with OpenStack, who have had little or no practical ex-
Kubernetes. Rook will be available automatically perience with Kubernetes the chance
30 A D M I N 55 W W W. A D M I N - M AGA Z I N E .CO M
Rook TO O L S
to get acquainted with the subject. recently, this was almost automati- Listing 1: Installing CRI-O
Setting up Kubernetes is not compli- cally Docker, but not all of the Linux
cated. Several tools promise to handle community is Docker friendly. An # modprobe overlay
this task quickly and well. alternative to Docker is CRI-O [3], # modprobe br_netfilter
W W W. A D M I N - M AGA Z I N E .CO M A D M I N 55 31
TO O L S Rook
# kubectl apply U to get up and running as Kubernetes. In other words, by applying the
-f https://raw.githubusercontent.com/ U Before doing so, however, it makes ready-made Rook definitions from
coreos/flannel/U sense to review the basic architecture the Rook Git repository to your Ku-
2140ac876ef134e0ed5af15c65e414cf26827915/ U of a Ceph cluster. Rolling out the bernetes instance, you automatically
Documentation/kube-flannel.yml necessary containers with Rook is not create a Rook cluster with a working
rocket science, but knowing what is Ceph that utilizes the unused disks
This command loads the Flannel defi- actually happening is beneficial. on the target systems.
nitions directly into the running Ku- To review the basics of Ceph and how Experienced admins might now
bernetes Control Plane, making them it relates to Rook, see the box “How be thinking of using the Kuber-
available for use. Ceph Works.” netes Helm package manager for a
To roll out Rook (Figure 2) in fast rollout of the containers and
Rolling Out Rook Kubernetes, you need OSDs and solutions. However, it would fail
MONs. Rook makes it easy, because because Rook only packages the op-
As a final step, run the kubeadm join the required resource definitions can erator for Helm, but not the actual
command (generated previously by be taken from the Rook source code cluster.
kubeadm init), which runs the com- in a standard configuration. Custom Therefore, your best approach is to
mand on all nodes of the setup except Resource Definitions (CRDs) are check out Rook’s Git directory locally
the Control Plane. Kubernetes is now used in Kubernetes to convert the (Listing 3). In the newly created
ready to run Rook. local hard drives of a system into ceph/ subfolder are two files worthy
Thanks to various preparations by OSDs without further action by the of note: operator.yaml and cluster.
Rook developers, Rook is just as easy administrator. yaml. (See also “The Container Stor-
32 A D M I N 55 W W W. A D M I N - M AGA Z I N E .CO M
Rook TO O L S
W W W. A D M I N - M AGA Z I N E .CO M A D M I N 55 33
TO O L S Rook
Conclusions
Figure 4: When creating the storage class for CephFS, you again need to change the size
parameter from 1 to 3 for production. Rook in Kubernetes provides a quick
and easy way to get Ceph up and run-
replicated. In fact, you determine the to 3 for both dataPools and metadata- ning and use it for container work-
replication level with the size entry Pool (Figure 4). loads. Unlike OpenStack, Kubernetes
(1 would not be enough here). The To create the custom resource defini- is not multiclient-capable, so the
mystery remains as to why the Rook tion for the CephFS service, type: “one big Ceph for all” approach is
developers do not simply adopt 3 as far more difficult to implement than
the default. kubectl create -f filesystem.yaml with OpenStack. For this reason, ad-
As soon as you have edited the file, mins tend to roll out many individual
issue the create command; then, dis- To demonstrate that the pods are now Kubernetes instances instead of one
play the new rook-block storage class: running with the Ceph MDS com- large one. Rook is ideal for exactly
ponent, look at the output from the this scenario, because it relieves the
kubectl create -f storageclass.yaml command: admin of a large part of the work:
kubectl get sc -a maintaining the Ceph cluster.
# kubectl -n rook-ceph get pod U Rook version 1.x [5] is now available
From now on, you have the option -l app=rook-ceph-mds and is considered mature for deploy-
of organizing a Ceph block device ment in production environments.
from within the working Ceph cluster, Like the block device, CephFS can Moreover, Rook is now an official
which relies on a persistent volume be mapped to its own storage class, Cloud Native Computing Foundation
claim (PVC) (Listing 4). which then acts as a resource for Ku- (CNCF) project; thus, it is safe to as-
In a pod definition, you then only bernetes instances in the usual way. sume that many more practical fea-
reference the storage claim (lm-exam- tures will be added in the future. Q Q Q
ple-volume-claim) to make the volume What’s Going On?
available locally. Info
If you are used to working with [1] Rook: [https://rook.io]
Using CephFS Ceph, the various tools that provide [2] “Cloud-native storage for Kubernetes with
insight into a running Ceph cluster Rook” by Martin Loschwitz, ADMIN, issue
In the same directory is the filesys- can be used with Rook, too. How- 49, 2019, pg. 47,
tem.yaml file, which you will need ever, you do need to launch a Pod [http://www.admin-magazine.com/
if you want to enable CephFS in ad- especially for these tools in the form Archive/2019/49/Cloud-native-storage-for-
dition to the Ceph block device; the of the Rook Toolbox. Kubernetes-with-Rook/]
setup is pretty much the same for A CRD definition for this is in the [3] CRI-O: [https://cri-o.io]
both. As the first step, you need to Rook examples, which makes get- [4] Flannel:
edit filesystem.yaml and correct the ting the Toolbox up and running [https://github.com/coreos/flannel]
value for the size parameter again, very easy before connecting to [5] Rook versions: [https://github.com/rook/
which – as you know – should be set Rook: rook/releases]
Q
34 A D M I N 55 W W W. A D M I N - M AGA Z I N E .CO M
CO N TA I N E R S A N D V I RT UA L I Z AT I O N FAI.me
Assembly Line
If you are looking for a way to build images quickly and easily, FAI.me is the place to go. By Martin Loschwitz
In popular clouds, the providers ages on demand, for both bare metal comes with a pre-installed operating
usually roll out standard distribution and use in the cloud. In this article, system. To take this into account,
images. SUSE, Red Hat, and Canoni- I introduce FAI.me and explain what Lange has implemented FAI.me on
cal offer these explicitly, and there is happens in the background. two subpages on the FAI website for
no reason why you should not use cloud images [3] and bare metal [4].
them. However, these images may Instant Images Both pages are quite straightforward.
have one or two annoying features,
such as missing packages, wrong FAI.me provides the functionality Clouds
configurations, or other everyday dif- of FAI without the kind of tinkering
ficulties. that’s otherwise necessary. Basically, If you look at the cloud page, you
Changing a finished image is not it’s not much more than a graphical only have a few – really important –
trivial. Instead, many admins start web-based interface for fai-diskim- parameters to set. At the very top of
rebuilding from the source and, age, which assembles bootable OS the form, for example, you need to
sooner or later, give up. In most images on demand. Images for bare enter both the target size of the image
cases it is not possible to achieve the metal installations as well as for and its format. The background to
same image quality as that of the clouds are included, but FAI.me offers this is that if you build an image for
distributors. Either the DIY images a whole host of extremely practical AWS, it needs a different format than
Lead Image © Nataliya Hora, 123RF.com
are bulky and far too big or they functions. for KVM, which usually wants the
don’t work well. Naturally, the bootable FAI images for QCOW2 format for hard drives.
This problem is exactly what FAI.me bare metal differ significantly from You can define the hostname, but
addresses: The tool is an extension the cloud images. The one contains it is usually overwritten by the
of the Fully Automated Installer (FAI; the normal FAI installer, which starts software-defined networks in clouds
see also the “FAI Review” box) [1] its work after launching from the boot and their name resolution. Practi-
that builds operating system (OS) im- medium, whereas the cloud version cally, if you add your public SSH key
36 A D M I N 55 W W W. A D M I N - M AGA Z I N E .CO M
FAI.me CO N TA I N E R S A N D V I RT UA L I Z AT I O N
FAI Review
A short review of FAI will help you understand FAI.me. Although FAI is a USB stick. The local boot medium then behaves exactly as a network-
not new, the author Thomas Lange is continuously adding new features. based FAI, but with a few system-related limitations: If you change the
Moreover, a small but hard-working community has gathered around the
FAI configuration, you have to create new images afterward.
tool, keeping it up to date and ensuring that it can install Ubuntu and
In the first years of FAI’s existence, this function was limited to gen-
CentOS in addition to Debian.
The original purpose of FAI was clearly defined: After unpacking, new erating images for bare metal, but now FAI also provides functions for
servers install autonomously to the extent possible and without too building images for cloud environments. Taking Debian as an example,
much manual intervention. Quite remarkably, FAI was created back the command
in the late 1990s, long before automation tools such as Puppet or fai-diskimage -u cloudhost -S900M U
Ansible existed.
-cDEFAULT,DEBIAN,AMD64,FAIBASE,U
FAI offered the ability to roll out an OS automatically at an early stage.
In the standard configuration, it combines a number of different proto- DEMO,GRUB_PC,CLOUD U
cols. A DHCP server is supported by a TFTP server. Clients use the PXE /tmp/disk
protocol to obtain an IP and then load a bootloader via TFTP. A kernel, creates an image of an installed Debian system built for the AMD64
usually one from the installation routine, is responsible for taking care architecture in /tmp/disk; it contains the GRUB bootloader and needs
of the rest.
900MB of storage space (Figure 1). If you have fast Internet access on
The program needs PXE to boot into a custom environment where it can
roll out its various operating systems. Lange and volunteer FAI develop- the system on which you call the fai-diskimage command, the process
ers have implemented many features for this purpose. Scripts can be is also quick. It hardly takes a minute for the finished image to become
executed at different stages of the installation, which then implement available. Debian is happy to have the tool, because, among other things,
certain functions not provided out of the box in FAI. All FAI components the project uses it to create its official images for the cloud [2].
can be loaded from a central network server
for this purpose.
The highlight is that FAI does not depend on
the installation routine of an installer. If you
want to implement automation for SUSE,
CentOS, and Debian, you would theoretically
have to create three boot environments: for
AutoYaST, Kickstart, and preseeding.
FAI offers a mostly generic interface. Only
local modifications, such as the selection of
the packages to be installed automatically,
add pitfalls because not all packages for the
same components have identical names across
distributions.
Lange recognized early on that dependence
on the network can be quite disadvantageous.
Conceivably a DHCP server may exist, but
it then takes some time to integrate with
FAI – or DHCP is not allowed at all. Maybe the
systems you want to install just can’t use PXE
– not all network cards come with support for
this protocol. However, the network boot in
FAI will not work without a network either.
For many years, FAI has supported the pos-
sibility of generating a static image from a
precomposed FAI configuration, which you
can then burn to a CD-ROM or DVD or write to Figure 1: fai-diskimage creates a cloud image based on an existing FAI configuration.
to a cloud image, you don’t have to FAI.me just wants to know which able for use in OpenStack, have man-
specify it when starting the virtual packages it should integrate into the aged with around 260MB for years.
machine. image. Best be economical here: As
If you want to set a password for root, a rule, clouds are connected by fast Bare Metal
you can, but I strongly advise against lines, so it is advisable to keep the
it. Leaving the field empty is one less basic image as small as possible and If you want to build an image for use
attack vector, and it doesn’t mean hav- load the rest off the network or a lo- on bare metal instead, the effort is not
ing to do without root rights thanks to cal mirror as needed. much greater. Although a separate op-
sudo. If you then set the desired lan- The big distributors demonstrate this tion displays the advanced settings, it
guage and the release you want to use, vividly. The Ubuntu images, for ex- only takes you to the settings for the
you are virtually ready to start. ample, which Canonical makes avail- root password and lets you add a pub-
W W W. A D M I N - M AGA Z I N E .CO M A D M I N 55 37
CO N TA I N E R S A N D V I RT UA L I Z AT I O N FAI.me
lic SSH key. FAI assumes by default that maining settings largely correspond to bottom left for creating the image
it will create a user with a password, those of the cloud variant. is all it takes to start the automatic
who then becomes root with sudo. image building process (Figures
You can specify the partition scheme Push-Button Image 2 and 3). After a short wait, the
in a drop-down menu. FAI.me pro- browser then starts downloading the
vides several suggestions for the use Whether you want an image for bare image, which can then be used on a
of the Logical Volume Manager or / metal or for the cloud, at the end of USB stick, on a CD/DVD, or in the
home on your own partition. The re- the process, pressing the button at cloud.
As mentioned, no hocus-pocus is tak-
ing place in the background; instead,
the web interface calls fai-cd and
fai-mirror or fai-diskimage behind
the scenes and creates a matching
image on the fly. Therefore, you can
be absolutely sure that you always
get the packages for the latest Debian
GNU/Linux.
Unlike the big distributors, you decide
when to build the image, although
it means not using an official image,
but one you build yourself with FAI.
me. What Lange originally intended
as a showcase for FAI and to give us-
ers an understanding of FAI’s range of
functions is itself a very practical tool.
How It Works
To begin, you first set up FAI as if you
wanted to use it for the live installa-
Figure 3: … installation images that equip a physical host with an operating system. tion of nodes. Factors like DHCP can
38 A D M I N 55 W W W. A D M I N - M AGA Z I N E .CO M
FAI.me CO N TA I N E R S A N D V I RT UA L I Z AT I O N
be ignored – the purpose is to create and this is exactly what the Debian particular, is not difficult to set up if
bootable media. After that, you al- project does. It stores its FAI con- you make sure that GitLab has a vir-
ready have the option to create your figuration in Debian GitLab and uses tual machine on which FAI is execut-
own images with fai-cd and fai-dis- hooks to wire it to an FAI installa- able and that can access the GitLab
kimage. But that’s only half the battle. tion in such a way that the described repository itself to build images ac-
Users actually want to have this file mechanism is implemented. When cording to FAI rules.
embedded in a CI/CD process to en- a commit ends up in the master Instead of laboriously developing an
sure that images are automatically branch of the repository, GitLab then image factory yourself, it could be a
built when changes are made to the ensures that new images are created good idea to turn to FAI, especially
FAI configuration and that the images automatically. if the target system is Debian, with
are then available for download from If you prefer not to overwrite the old which FAI is particularly connected
a central location. images automatically, the recommen- through its author.
Therefore, connecting FAI to a CI/CD dation is to encode the date in the
tool such as Jenkins is a good idea, name. The example with GitLab, in Conclusions
Images or Automation? For many admins, building operating
On the basis of my own experience, FAI.me Linux on your hard disk with AutoYaST, Kick- system images is an unnecessarily
triggers two reactions that could hardly be start, the Debian preseeding method, or what- complicated exercise that requires
more different. On the one hand, enthusiastic ever your distribution uses as an automatic a huge amount of preparation. FAI
admins have needed a tool like this and had installation tool. According to this narrative, shows another way: By combin-
not yet found it. On the other hand, more con- then, the automation engineer handles the ing the appropriate parameters for
ventional admins with backgrounds in automa- rest of the work. fai-diskimage or fai-cd and fai-mir-
tion have turned up their noses.
However, this problem is easy to work around: ror, it builds generic disk images at
A conflict comes to light that plays an impor-
Continuous integration and continuous deliv- the command line in a very short
tant role in contemporary IT. Does it make
ery/deployment (CI/CD) environments based time.
more sense to work with operating system
images, or should you instead rely on the on Jenkins offer the ability to build OS images However, FAI itself cannot be set up
vendor’s installation tools and use automation completely automatically. Of course, FAI.me is easily and quickly. Anyone planning
to make the required adjustments? Although also an approach to circumventing precisely to install dozens, hundreds, or even
this discussion is undoubtedly still in full sway, the problem described. If you use FAI.me to thousands of systems automatically
many assumptions and fears are based on build your images, you can understand the with this solution will be happy to
obsolete knowledge. process in detail, and if you so desire, you put up with the overhead of the initial
Admins are absolutely right when they warn can also run FAI.me in an instance of its own, FAI installation: It’s guaranteed to pay
against monster images that cannot be re- which then contains local modifications – but dividends. Each new server that is
generated when you need them. Companies in a comprehensible way. installed in this way then reduces the
commonly find that a golden master image for The images built with FAI.me can just as easily total overhead and pays for itself.
the installation of new systems has “grown
be frugal operating system images that simply If you just want a sample of the FAI
historically”: It works, but nobody in the
prepare a host for use with Puppet, Ansible, atmosphere, FAI.me is the right place
company knows exactly what it contains. When
or some other automation system. By the to start. In a very short time, you
a new image has to be built, it often involves
massive overhead and consumes a huge way, this is more elegant by several orders can build disk images for Debian
amount of time. of magnitude than the automation structures that still offer some leeway for local
The same applies to images you can pick up that some administrators build themselves customizations. FAI.me is therefore a
from alternative “black box” sources from with scripting in Kickstart or AutoYaST or by very useful extension to FAI itself and
the Internet. One thing you do not want in preseeding. worth exploring. Q
your data center is a pre-owned image with a One thing should be clear by now: Nothing
built-in Bitcoin miner, although this is mostly works without operating system images. Info
discovered in the context of container images. They are essential in clouds because virtual [1] “Automatically install and configure sys-
However, the same caveat naturally also ap- instances cannot be built and started without tems” by Martin Loschwitz, ADMIN, issue
plies to images of entire operating systems. them. Installers from distributions are simply
By the way, when many admins think of im- 52, 2019, pg. 62:
not viable alternatives, because the current [http://www.admin-magazine.com/
ages, they think of bare metal deployments.
clouds do not support the PXE boot functional- Archive/2019/52/Automatically-install-and-
Because the local variance in this area is much
ity required in the first place. configure-systems]
higher than in defined environments such as
KVM or VMware, many people in the past be- In the end, as is often the case, a whole range [2] Debian cloud images: [https://salsa.debian.
lieved that monster images were legitimate or of gray tones exist, and those admins who find org/cloud-team/debian-cloud-images]
even necessary. a perfect mix of images on the one hand and
[3] FAI.me for cloud images:
Like with a pendulum, a countermovement automation on the other, will have a pleasing
[https://fai-project.org/FAIme/cloud]
of tinkerers categorically reject OS images. result. FAI.me is a promising and well-proven
[4] FAI.me for installation images:
Instead, its proponents say you should install component in such a context.
[https://fai-project.org/FAIme]
W W W. A D M I N - M AGA Z I N E .CO M A D M I N 55 39
CO N TA I N E R S A N D V I RT UA L I Z AT I O N New S3 Services at Amazon
Class Society
Each Amazon storage class addresses a different usage profile; we examine the new classes to help you make
the right choice. By Thomas Drilling
AWS introduced several new storage So, if you know the most common under its Amazon S3 Service Level
services and databases at re:Invent access patterns to your data stored Agreement [1]. By the way, such a
2018, including new storage classes for in S3, you can optimize costs by service is by no means available for
Amazon Simple Storage Service (S3). intelligently choosing the right stor- all AWS services.
In the meantime, new releases (S3 age class. The new S3 Intelligent-Tiering
Intelligent-Tiering and S3 Glacier Deep memory class also has a stability of
Archive) have become available that High-Availability SLAs 99.999999999 percent with an avail-
quickly boost the number of storage ability of 99.9 percent, just as in the
classes in the oldest and most popular The individual storage classes differ S3 Standard-IA class. In the case of
of all AWS services from three to six. in terms of availability and durabil- the S3 One Zone-IA storage class,
In this article, I present the newcomers ity. Because AWS generally replicates however, replication only takes place
and their characteristics. data within a region (with the excep- within a single availability zone,
Amazon’s Internet storage has al- tion of the S3 One Zone-IA class) resulting in reduced availability of
ways supported storage classes, be- across all availability zones, Amazon 99.5 percent. Replication beyond re-
tween which users can choose when S3 is basically a simple, key-based gions does not take place in AWS to
uploading an object and which object store. Amazon S3, for exam- improve further availability or con-
Lead Image © stillfix, 123RF.com
they can also switch automatically ple, offers 99.99 percent availability sistency, because this would contra-
later using lifecycle guidelines. The in the standard storage class and dict the corporate philosophy with
individual storage classes have dif- 99.99999 percent permanence, which regard to data protection. However,
ferent price models and availability means that of 10,000 stored files, one the user can configure automatic
classes, each of which optimally file is lost every 11 million years, on replication to another region in S3 if
addresses a different usage profile. average. AWS even guarantees this so desired.
42 A D M I N 55 W W W. A D M I N - M AGA Z I N E .CO M
New S3 Services at Amazon CO N TA I N E R S A N D V I RT UA L I Z AT I O N
Comparison of Storage data returned by S3 is charged at For this automation, however, AWS
$0.0007/GB, all data scanned at charges an additional monthly moni-
Classes
$0.002/GB. Lifecycle transition and toring and automation fee per object.
Although S3 has made do with retrieval requests are free, as are Specifically, S3 Intelligent-Tiering
three memory classes – Standard, DELETE and CANCEL requests. monitors the access patterns of the
Standard-IA, and Glacier – for many Q The price of S3 storage manage- objects and moves objects that have
years, three additional memory ment depends on the functions not been accessed for 30 days in suc-
classes are now available: Intelligent- included. For example, S3 object cession to Standard-IA. If an object
Tiering, One Zone-IA, and Glacier tagging costs $0.01/10,000 tags per in Standard-IA is accessed, AWS
Deep Archive, all with a durability of month. automatically moves it back to S3
99.999999999 percent. The documen- Q For outgoing transmissions, AWS Standard. There are no retrieval fees
tation still also lists the Standard with allows up to 1GB/month free of when using S3 Intelligent-Tiering and
Reduced Redundancy (RRS) storage charge. The next 9.999TB/month no additional grading fees are charged
class with a stability of 99.99 percent. is charged at $0.09/GB, the next when objects switch between access
Currently, AWS does not recommend 40TB/month at $0.085/GB, the levels. This makes the class particu-
the use of RRS – originally intended next 100TB/month at $0.07/GB, larly suitable for long-term data with
for non-critical, reproducible data and the next 150TB/month at initially unknown or unpredictable
such as thumbnails – because the $0.05/GB. access patterns.
standard storage class is now cheaper A complete price overview can be
anyway. As Table 1 shows, the inclu- found on the S3 product page [2]. Assigning Storage Classes
sion of RRS would mean that there
are seven storage classes. S3 with Intelligent S3 storage classes are generally con-
figured and applied at the object
Gradation level, so the same bucket can contain
Amazon S3 Costs
The new Intelligent-Tiering memory objects stored in S3 Standard, S3
Apart from the fact that prices for class is primarily designed to op- Standard-IA, S3 Intelligent-Tiering,
all AWS services generally vary be- timize costs. This approach works or S3 One Zone-IA. Glacier Deep Ar-
tween regions, S3 storage has four because AWS continuously analyzes chive, on the other hand, is a service
cost drivers: storage (storage prices), the data for access patterns and in its own right. Users can upload
retrieval (retrieval prices), manage- automatically transfers the results objects to the storage class of their
ment (S3 storage management), and to the most cost-effective access choice at any time or use S3 lifecycle
data transfer, where moving data to level. The two target memory classes guidelines to transfer objects from
the cloud does not cost anything. In involved in intelligent tiering are S3 Standard and S3 Standard-IA to
US East regions, for example, the S3 Standard and Standard-IA. As you S3 Intelligent-Tiering. For example, if
Standard storage class pricing (in may know, storage is cheaper in the user uploads a new object into a
early 2020) looks like this: Standard-IA, but retrieval is more bucket via S3 GUI, they can simply
Q Storage price is $0.023/GB for the expensive. Although retrieval is pos- select the desired storage class with
first 50TB. sible at any time with the same ac- the mouse (Figure 1).
Q Retrieval price is $0.005/1,000 cess time and latency, the AWS pric- When uploading from the CLI, the
PUT, COPY, POST, or LIST requests ing for this memory class stipulates memory class is given as a parameter,
and $0.0004/1,000 for GET, SE- that the objects are rarely read after --storage-class. The values STANDARD,
LECT and all other requests. All the initial write. REDUCED_REDUNDANCY, STANDARD_IA, ONE-
W W W. A D M I N - M AGA Z I N E .CO M A D M I N 55 43
CO N TA I N E R S A N D V I RT UA L I Z AT I O N New S3 Services at Amazon
44 A D M I N 55 W W W. A D M I N - M AGA Z I N E .CO M
New S3 Services at Amazon CO N TA I N E R S A N D V I RT UA L I Z AT I O N
allows object properties or metadata API actions to S3 objects on a mas- For a start, users can specify a list
to be change for any number of S3 sive scale. The Batch Operations of target objects in an S3 inventory
objects. This approach also applies feature can also be used to perform report that lists all objects of an S3
to copying objects between buckets custom Lambda functions on billions bucket or prefix. Optionally, you can
or replacing tag sets, changing ac- or trillions of S3 objects, enabling specify your own list of target objects.
cess controls, or retrieving/restoring highly complex tasks, such as image You then select the desired API ac-
objects from S3 Glacier in minutes or video transcoding. Specifically, the tion from a prefilled options menu in
rather than months. feature takes care of retries, tracks the S3 Management Console. New S3
Until now, companies have often progress, sends notifications, gener- Batch Operations [3] are available in
had to spend months of development ates final reports, and delivers the all AWS regions now. Operations are
time writing optimized application events for all changes made and tasks charged at $0.25/job or $1.00/million
software that could apply the required performed to CloudTrail. object operations performed, on top
of charges associated with any opera-
tion S3 Batch Operations performs for
you (e.g., data transfer, requests, and
other charges).
Conclusions
Amazon S3 is far more than a file
storage facility on the Internet, and
even experienced users often don’t
know all of its capabilities, especially
because AWS is constantly adding
new features. Those who know the
access patterns to their data can save
a lot of money. Additionally, AWS
now provides a degree of automation
with the new S3 Intelligent-Tiering
memory class (at an extra charge). Q
Info
[1] Amazon S3 Service Level Agreement:
[https://aws.amazon.com/s3/sla/?nc1=h_ls]
[2] Prices for Amazon S3:
[https://aws.amazon.com/s3/pricing/?
nc1=h_ls]
[3] S3 Batch Operations:
Figure 2: The lifecycle guidelines in Amazon S3 can be determined from the management [https://docs.aws.amazon.com/AmazonS3/
user interface, as well. latest/user-guide/batch-ops.html]
W W W. A D M I N - M AGA Z I N E .CO M A D M I N 55 45
CONTAINERS AND VIRTUALIZATION Prowler for AWS Security
Prowling AWS
Snooping Around
Prowler is an AWS security best practices assessment, auditing, Organizations can segregate depart-
mental duties and, therefore, security
hardening, and forensics readiness tool. By Chris Binnie controls between multiple accounts;
commonly this might mean the use
Hearing that an external, indepen- you are involved with a project at of 20 or more accounts. With these
dent organization has been commis- the seminal greenfield stage, and you concerns and, if you blink a little too
sioned to spend time actively attack- have yet to learn what goes where slowly, it’s quite possible that you
ing the cloud estate you have been and how it all fits together. To add will miss a new AWS feature or ser-
tasked with helping to secure can be to the complexity, if you are using vice that needs to be understood and,
a little daunting – unless, of course, Amazon Web Services (AWS), AWS once deployed, secured.
Fret not, however, because a few open
Table 1: Checks and Group Names source tools can help mitigate the pain
Description No./Type of Checks Group Name before an external auditor or penetra-
Identity and access management 22 checks group 1 tion tester receives permission to
attack your precious cloud infrastruc-
Logging 9 checks group 2
ture. In this article, I show you how to
Monitoring 14 checks group 3
install and run the highly sophisticated
Networking 4 checks group 4 tool Prowler [1]. With the use of just a
Critical priority CIS CIS Level 1 cislevel1 handful of its many features, you can
Critical and high-priority CIS CIS Level 2 cislevel2 test against the industry-consensus
Extras 39 checks extras benchmarks from the Center for Inter-
Forensics See README file [4] forensics-ready net Security (CIS) [2].
GDPR See website [5] gdpr
HIPAA See website [6] hipaa What Are You Lookin’ At?
When you run Prowler against the
Listing 1: Installing Prowler overwhelmingly dominant cloud
$ git clone https://github.com/toniblyx/prowler.git provider AWS, you get the chance to
Lead Image © Dmitry Naumov, 123RF.com
46 A D M I N 55 W W W. A D M I N - M AGA Z I N E .CO M
Prowler for AWS Security CONTAINERS AND VIRTUALIZATION
W W W. A D M I N - M AGA Z I N E .CO M A D M I N 55 47
CONTAINERS AND VIRTUALIZATION Prowler for AWS Security
48 A D M I N 55 W W W. A D M I N - M AGA Z I N E .CO M
Prowler for AWS Security CONTAINERS AND VIRTUALIZATION
Prowling
Figure 5: Prowler sets itself up at the start of the auditing run with useful colored output
To recap, you have created an AWS for clarity as it goes.
user and attached your newly cre-
ated policy to that user. Good practice
would usually be to create an IAM
role, too, and then attach the policy
to the new role if multiple users need
to access the policy. The command
aws configure lets the AWS command-
line client know exactly where to find
your credentials.
You can now cd to your prowler di-
rectory to run the script that fires up
Prowler. You probably remember that
the directory was created during the
GitHub repository cloning process in
the early stages. Figure 6: The tests are extremely thorough and well considered.
W W W. A D M I N - M AGA Z I N E .CO M A D M I N 55 49
CONTAINERS AND VIRTUALIZATION Prowler for AWS Security
$ ./prowler -p custom-profile -r eu-west-1 $ ./prowler | ansi2html -la > U and considered documentation, and
prowler-audit.html is a lightweight and reliable piece of
Although the command only points at software. I prefer the HTML reports,
one region, Prowler will traverse the Similarly, you can output to JSON or but running the JSON through the
other regions where needed to com- CSV with the -M switch: jq program is also useful for easy-to-
plete its auditing. read output.
$ ./prowler -M json > prowler-audit.json Having scratched the surface of this
Breaking and Entering clever open source tool, I trust you’ll
Just change json to csv (in the file be tempted to do the same and to
The README file offers some other name, too) if you prefer a CSV file. keep an eye on your security issues in
useful options in the examples I shame- The well-written Prowler docs also of- an automated fashion. Q
lessly repeat and show in this section. fer a nice example of saving a report
If you ever want to check one of the to an S3 bucket:
tests individually, use: Info
$ ./prowler -M json | aws s3 cp - U [1] Prowler:
$ ./prowler -c check32 s3://your-bucket/prowler-audit.json [https://github.com/toniblyx/prowler]
[2] CIS: [https://www.cisecurity.org]
After the first Prowler run to make Finally, if you’ve worked with security [3] AWS Security Blog:
sure it runs correctly, then a handy audits before, you’ll know that reach- [https://aws.amazon.com/blogs/security/
tip is to spend some time looking ing an agreed level of compliance is tag/cis-aws-foundations-benchmark/]
through the benchmarks listed earlier the norm; therefore if, for example, you [4] Prowler README:
to figure out what you might need only needed to meet the requirements [https://github.com/toniblyx/prowler/blob/
to audit against, instead of running of CIS Benchmark Level 1, you could master/README.md]
through all the many checks. ask Prowler to focus on those checks [5] GDPR: [https://ico.org.uk/for-
It’s also not such a bad idea if you only: organisations/guide-to-data-protection/
find the check numbers from the guide-to-the-general-data-protection-
Prowler output and focus on spe- $ ./prowler -g cislevel1 regulation-gdpr/]
cific areas to speed up your report [6] HIPAA: [https://www.hhs.gov/
generation time. Just delimit your If you want to check against hipaa/for-professionals/security/
list of checks with commas after the multiple AWS accounts at once, then laws-regulations/index.html]
-c switch. refer to the README file for a clever [7] Toni de la Fuente: [https://blyx.com]
Additionally, use the -E command one-line command that runs Prowler [8] Git: [https://git-scm.com/book/en/v2/
switch across your accounts in parallel. A Getting-Started-Installing-Git]
useful bootstrap script is offered, as [9] Linux package managers:
$ ./prowler -E check17,check24 well, to help you set up your AWS [https://packaging.python.org/guides/
credentials via the AWS client and installing-using-linux-tools]
to run Prowler against lots of checks run Prowler, so it’s definitely worth
while excluding only a few. a read.
Additionally, a nice troubleshooting The Author
Lookin’ Oh So Pretty section looks at common errors and Chris Binnie’s latest book, Linux Server
the use of multifactor authentica- Security: Hack and Defend, shows how hackers
As you’d expect, Prowler produces tion (MFA). Suffice it to say that the launch sophisticated attacks to compromise
a nicely formatted text file for your README file is comprehensive, easy servers, steal data, and crack complex
auditing report, but harking back to to follow, and puts some other docu- passwords, so you can learn how to defend
the pip command earlier, you might mentation to shame. against such attacks. In the book, he also shows
remember that you also installed you how to make your servers invisible, perform
the ansi2html package, which al- The End Is Nigh penetration testing, and mitigate unwelcome
lows the mighty Prowler to produce attacks. You can find out more about DevOps,
HTML by piping the output of your Prowler boasts a number of checks DevSecOps, Containers, and Linux security on
results: that other tools miss, has thorough his website: https://www.devsecops.cc.
50 A D M I N 55 W W W. A D M I N - M AGA Z I N E .CO M
S EC U R I T Y Regex Vulnerabilities
like a denial of service attack. + (one or more times), and * (zero or whether a word is part of a language
A regular expression describes a more times). The plus sign that appears – for example, whether the user input
language, wherein you define the after the first closing square bracket is an email address. If this doesn’t
language that you want to accept as and in front of the @ sign therefore says mean anything to you, just imagine
input for an application. Email ad- that the @ symbol of an email address a ticket vending machine for the
dresses provide a simple but useful must be preceded by at least one to any subway or a train. It assumes differ-
example. The World Wide Web Con- number of the specified characters. ent states (e.g., the welcome screen,
52 A D M I N 55 W W W. A D M I N - M AGA Z I N E .CO M
Regex Vulnerabilities S EC U R I T Y
ticket selection, payment, or ticket word is part of the language, and the ^(a+)+$
printout) and expects different types state machine accepts it. If further
of input from the operator that cause characters are read from the input, At first glance this does not seem to be
a transition from one state to another. the machine proceeds to state 3 and a problematic expression; for example,
If all the user input matches the pre- does not accept the input. the input aaaa has only 16 (2^4) pos-
viously defined input language, the For each regular expression in your sible sequences of state transitions.
machine accepts the input and prints application, you can create state ma- However, if you enter a 16 times, you
a ticket. chines and use them to check the in- already have 2^16 = 65,536 possible
State machines that check whether put. State machines can become quite sequences, all of which need to be
an input word is part of a language large, with many states and transi- checked. The number doubles with
work in the same way. Each letter of tions. In particular, relations with + each additional a in the input.
an input word changes the state of and * quantifiers, large classes, or the Occasionally, developers also use
the machine. If the state machine is dot (.) for all possible input symbols user input to create regular expres-
in an accepting state after the input, lead to very large state machines. sions. Imagine you want to prevent a
the word is part of the language. Fig- username from being included in the
ure 2 shows a simple state machine Attack with ReDoS password you are using. If an attacker
with four states and the transitions chooses (a+)+ as the username and
between these states. State 1 is ac- A state machine’s performance when types a 40 times as the password, the
cepting, all other states are not. The checking regular expressions depends state machine would have to check
state machine has an input alphabet on many factors: the size of the state 2^40 possible sequences. An attacker
that comprises only the letters a and machine (i.e., the number of states could thus deliberately cause a denial
b. Words are accepted in which an a and transitions) and, of course, the of service if they had some idea of how
is followed by another a or a b. The input. A state machine checks each the application checked user input.
state machine is equivalent to the possible sequence of states in turn
regular expression: until it accepts one of the sequences, Conclusions
which means, in the worst case, the
^a[ab]+$ run time for a state machine that Regular expressions are useful for
checks a regular expression can grow checking user input and are deployed
If the input word starts with a b, the exponentially in relation to the input. in web applications and on firewalls
machine changes to state 3. From More bad news is that you often don’t or proxy servers. However, they also
this state, it cannot transition to any even notice that a regular expression have pitfalls that are not immediately
other state, especially not the accept- results in a large state machine with a obvious. For this reason, you should
ing state 1, which means the word correspondingly long run time. always test the regular expressions
input is not part of the language and The Open Web Application Security you use intensively, because the dam-
is not accepted. If it reads an a first, Project (OWASP) lists the regular ex- age potential is not always apparent.
the machine switches to state 2. From pression denial of service (ReDoS) [1] If you have inadvertently developed a
there, either an a or b will cause it as an attack and shows some regular vulnerable regular expression, some-
to transition to the accepting state 1. expressions that have unexpectedly times simple adjustments or tolerable
If this terminates the input, then the bad worst-case run times, such as: inaccuracy in the recognition process
can make a broken or unsafe regular
expression work safely. Q
Info
[1] ReDoS: [https://www.owasp.org/index.
php/Regular_expression_Denial_of_Ser-
Figure 2: A state machine determines whether the user input is part of the specified language. vice_-_ReDoS]
W W W. A D M I N - M AGA Z I N E .CO M A D M I N 55 53
MAG DOWN LOAD.oRG
LATEST MAGAZINES
HIGH QUAllTY TRUE-PDF
MAG DOWN LOAD.ORG
S EC U R I T Y nftables
Screened
The latest nftables packet filter implementation, now available in the Linux kernel, promises
better performance and simpler syntax and operation. By Thorsten Scherf
The Linux kernel already contains a Parts of the old Netfilter framework classification is now far more sophis-
variety of packet filters, starting with use nftables, removing the need to ticated and elegant than it was in the
ipfwadm and followed by ipchains and develop new hooks, which are noth- days of iptables. For example, address
iptables. Kernel 3.13 saw the intro- ing more than certain points in the families now allow you to process
duction of nftables [1], which uses the network stack of the Linux kernel different packages with a single rule.
nft tool to create and manage rules. at which a packet is inspected and, If you wanted to examine IPv4 and
With the help of its own virtual ma- in the case of a match, one or more IPv6 packets in the past, you not only
Lead Image © alphaspirit, 123RF.com
chine, nftables ensures that rulesets actions executed. For this purpose, needed different rules, you even had
are converted into bytecode, which is tables that store chains exist at these to load them into the kernel with dif-
then loaded into the kernel. Not only hook points. The chains in turn con- ferent tools: iptables and ip6tables.
does it improve performance, but it tain the rules. The simple nftables inet table type
also allows administrators to enable The way in which the individual pack- includes both IPv4 and IPv6. Now,
new rules dynamically without having ets are now checked against the rules you can also merge different state-
to reload the entire ruleset. is another new feature of nftables. The ments with nftables. With iptables,
54 A D M I N 55 W W W. A D M I N - M AGA Z I N E .CO M
nftables S EC U R I T Y
This kind of facilitation can be Kernel-Dependent Routing in the Linux kernel by the /proc/sys/
found at many different places in Because the old Netfilter hooks are still net/ipv4/ip_forward or /proc/sys/net/
nftables. used, the route of a packet through the ipv6/ip_forward file. In this case, the
In addition to the hooks, nftables network stack with nftables is similar package only passes through the
continues to use Netfilter code for to that of Netfilter (Figure 1): In prerouting, forward, and postrouting
connection tracking, network address prerouting, a decision is made as to hooks.
translation (NAT), userspace queu- whether a network packet is either In these three hooks, the packet can
ing, and logging. The compatibility intended for a process on the lo- be rewritten by NAT in terms of the
layer is very helpful if you are migrat- cal machine or simply needs to be IP address and the port. Nftables al-
ing from iptables, because it lets you forwarded in-line with the routing lows changes to the target address
continue using the iptables netfilter table. In the first case, the package for the prerouting and input hooks,
tool, even if the underlying frame- reaches the local process by way and to the sender’s address for the
work is now nftables, not Netfilter. of the input entry point, where it is postrouting and output hooks. If you
If you don’t need Netfilter and prefer processed. It then passes through want to filter the packet instead, you
to have all new features available the output and postrouting hooks can create corresponding tables in
instead, you can use the new nft tool before leaving the network stack the input, forward, or output hooks
instead. again. and store your rulesets there. An-
To make sure the kernel you are us- If the package is not intended for a other innovation in nftables is the
ing supports nftables, call the modinfo local process, though, routing is per- new ingress entry point, which
tool (Listing 1). You should then see formed on the basis of existing rout- allows the filtering of packets on
some information about the nf_tables ing entries. Make sure that the kernel Layer 2, providing functions similar
kernel module. supports this routing, which is defined to the tc (traffic control) [2] tool.
Unlike Netfilter, nftables has no
predefined constructs for tables and
chains in which the actual rules end
up. Administrators need to create
these themselves with the nft tool:
W W W. A D M I N - M AGA Z I N E .CO M A D M I N 55 55
S EC U R I T Y nftables
Listing 2: nft list ruleset Creating a New Chain In the following example, I present
some simple rules to give you a feel
nft list ruleset -a The next step is to create a new for the new nftables syntax. The first
table inet firewall { chain within this table that has the rule ensures that nftables accepts all
chain incoming { task of incorporating the rules. As packets passing through the loopback
type filter hook input priority 0; policy accept; with the address families, which are interface:
iif "lo" accept # handle 5 linked with tables, different types of
ct state established,related accept # handle 7 chains exist: filter, nat, and route. nft add rule inet firewall incoming U
tcp dport ssh ct state new accept # handle 8 The filter chains can be created in iif lo accept
drop # handle 9 all hooks; nat chains are allowed in
} prerouting, input, output, and pos- Furthermore, new SSH connec-
} trouting hooks; and route chains tions (ct state new) to port 22 will
can only be created in output hooks. be allowed (tcp dport 22). Packets
Because the purpose of this example that belong to existing SSH con-
Listing 3: Revised Ruleset is to filter IP packets for a local com- nections are also allowed (ct state
nft list table inet firewall puter, you need to create a filter established,related) and are detected
table inet firewall { chain, assign it to the previously by nftables connection tracking. All
chain smtp-chain { created firewall table, and specify other packets are dropped:
counter packets 1 bytes 80 where in the network stack it should
} be placed: nft add rule inet firewall incoming U
chain incoming { ct state established,related accept
type filter hook input priority 0; policy accept; nft create chain inet firewall incoming { U nft add rule inet firewall incoming U
iif "lo" accept type filter hook input priority 0\; } tcp dport 22 ct state new accept
ct state established,related accept nft add rule inet firewall incoming drop
tcp dport ssh ct state new accept You have now successfully created
tcp dport https ct state new accept a base chain for filtering IP packets The individual objects and their hier-
tcp dport smtp ct state new jump smtp-chain within the firewall table, defined a archy are now displayed by nftables
drop default priority, and named the chain with nft list ruleset (Listing 2).
} incoming. The call to nft list chains The -a option ensures that the in-
} confirms that everything worked suc- ternal enumeration (handles) of the
cessfully: individual rules is also displayed. A
new rule can be inserted later easily
the arp address family in tables nft list chains enough with the command:
that are created as part of the in- table inet firewall {
put or output hooks, and netdev is chain incoming { nft add rule inet firewall incoming U
only allowed for ingress tables. If type filter hook input priority 0; U position 4 tcp dport 443 U
no address family is specified when policy accept; ct state new accept
creating a table, nftables uses ip by }
default. } This rule now also allows all new
To load rulesets into the kernel that connections to secure HTTPS port
ensure that both IPv4 and IPv6 pack- Within this chain you can then cre- 443. You do not have to worry about
ets are checked for their properties ate rules that ensure that incoming packets that belong to existing con-
and filtered, you need to create a and outgoing packets are inspected nections at this point, because they
table with the inet address family. In according to certain criteria, such as are already detected and accepted
the following example, this table is the source and target IP addresses, by the connection tracking match
named firewall: source or target ports, or state with handle 7. The matches already
variables (e.g., membership of an mentioned above are extremely
nft add table inet firewall existing connection). If all these cri- diverse in nftables and allow very
teria apply to a data packet flowing complex rulesets [3]. Thanks to
A call to nft list tables confirms through the network stack and thus tcpdump-based syntax, however, they
that the table was created correctly: through each Netfilter hook, a match look quite compact and can be un-
has occurred, and a specific action is derstood intuitively.
nft list tables inet performed. This action should also
table inet firewall be defined as a rule. For example, Flexible Sorting of Rules
you can tell nftables to accept or re-
If needed, you can limit the output to ject the packet in a match, or simply If you want to add some order into
certain address families. create a log entry. your rulesets, you can do this with
56 A D M I N 55 W W W. A D M I N - M AGA Z I N E .CO M
nftables S EC U R I T Y
nft add chain inet firewall smtp-chain nft add rule inet firewall incoming U other systems should be discarded
nft add rule inet firewall incoming U position 8 tcp dport { 25, 587 } U directly. For this, I create a new chain
position 8 tcp dport 25 U ip saddr @allow-smtp-set accept named forward in the kernel entry
ct state new jump smtp-chain point of the same name:
nft add rule inet firewall U In this case, a new rule is placed at
smtp-chain counter a defined point in the incoming chain nft create chain inet firewall forward { U
and uses the previously defined type filter hook forward priority 0\; }
Also important at this point is to allow-smtp-set to specify the sender
insert the jump rule at the correct posi- address. For the SMTP ports, on the The auditd-servers and httpd-servers
tion in the incoming chain; otherwise, other hand, an “anonymous set” is are each defined in a named set:
the rule would come after the drop used, which you can use directly in
catch-all statement and never be exe- a rule without having to define it be- nft add set inet firewall audit-servers { U
cuted. After the last changes, the new forehand. type ipv4_addr \; }
set of rules now looks like Listing 3. nft add element inet firewall U
The set is another new feature in Combining Functions audit-servers { 10.1.0.1, 192.168.0.1 }
nftables that lets you to merge ele- nft add set inet firewall http-servers { U
ments of a rule, such as an IP ad- The nice thing about nftables is that type ipv4_addr \; }
dress or port, into an array. You can many of the new functions can be nft add element inet firewall U
then use this array in the desired easily combined. Another function, http-servers { 10.1.1.1, 192.168.1.1 }
rule. The following example of a verdict maps demonstrates this very
named set assigns IP address ranges well. These maps are dictionaries that These named sets will be used in
to allow-smtp-set: use the structure of a named set as non-base chains, which I create in the
a key and non-base chains as a key next step:
nft add set inet firewall allow-smtp-set { U value if a match occurs. Although
type ipv4_addr\; flags interval\; } that might sound complicated, it is nft add chain inet firewall audit-chain
nft add element inet firewall allow-smtp- U actually quite simple. For the follow- nft add chain inet firewall http-chain
set { 10.1.0.0/24, 192.168.0.0/24 } ing example, the requirement is that
access to certain auditd and HTTPD Finally, the assignment takes place;
You can then access this named set in servers should only be possible from the target port for the HTTPD servers
any rule: certain IP addresses. Requests from is defined as the anonymous set:
W W W. A D M I N - M AGA Z I N E .CO M A D M I N 55 57
S EC U R I T Y nftables
nft add rule inet firewall audit-chain U wiki [4], which also offers useful nft -f /tmp/ruleset.nft
tcp dport 60 ip daddr @audit-servers help for getting started with the new
nft add rule inet firewall http-chain tcp U packet filter; you might also want to then loads the converted rules into
dport { 80, 443 } ip daddr @http-servers bookmark the nftables reference [3]. the nftables framework.
58 A D M I N 55 W W W. A D M I N - M AGA Z I N E .CO M
M A N AG E M E N T Loki
Shape Shifter
Grafana’s Loki is a good replacement candidate for the Elasticsearch, Logstash, and Kibana combination in
Kubernetes environments. By Martin Loschwitz
In conventional setups of the past, maintenance prove to be complex. A which functions are available and
admins had to troubleshoot fewer full-grown ELK cluster can massively which are missing.
nodes per setup and fewer technolo- consume resources, as well.
gies and protocols than is the case Unfortunately, you don’t have a lot The Roots of Loki:
today in the cloud, with its hundreds of alternatives. In the case of the
and thousands of technologies and popular competitor Splunk, a mere
Prometheus and Cortex
protocols for software-defined net- glance at the price list is bad for your If you follow Loki back to its roots, you
working, software-defined storage, blood pressure. However, the Grafana will come across some interesting de-
and solutions like OpenStack. In the developers are sending Loki [1] into tails: Loki is not a completely new de-
worst case, network nodes also need battle as a lean solution for central velopment; the Grafana developers ori-
to be checked separately. If you are logging, aimed primarily at Kuber- ented their work on Prometheus – but
searching for errors in this kind of netes users who are already using not directly. Loki was inspired by a Pro-
environment, you cannot put the re- Prometheus [2]. metheus fork named Cortex [3], which
quired logfiles together manually. Loki claims to avoid much of the extends the original Prometheus, add-
The Elasticsearch, Logstash, and overhead that is a fixed part of ELK. ing the horizontal scalability admins
Kibana (ELK) team has demonstrated In terms of functionality, the product often missed.
its ability to collect logs continuously can’t keep up with ELK, but most Prometheus itself has no scale-out
from affected systems, store them admins don’t need many features that story. Instead, the developers recom-
Lead Image © zlajo, 123RF.com
centrally, index the results, and thus bloat ELK in the first place. Unfor- mend running many instances in
make them searchable. However ELK tunately, ELK does not allow you to parallel and sharding the systems to
and its variations prove to be complex sacrifice part of the feature set for re- be monitored. Sending the incoming
beasts. Getting ELK up and running duced complexity. Loki from Grafana metric data to several Prometheus
is no mean achievement, and once opens up this door. In this article, I instances is intended to provide re-
it is finally running, operations and go into detail about Loki and describe dundancy in such a setup, but this
60 A D M I N 55 W W W. A D M I N - M AGA Z I N E .CO M
Loki M A N AG E M E N T
construct forces you to tie different Different from ELK resources. Because Prometheus and
Prometheus instances to a single in- its Cortex fork are easy to configure
stance of the graphics drawing tool Loki attaches itself to these labels and dynamically, Loki is far better suited
Grafana, often with unsatisfactory uses them to index the incoming log for operation in containers, as well.
results. messages, which marks the biggest
Cortex removes this Prometheus architectural difference from ELK. Loki in Practice
design limitation but has not yet For this very reason, Loki is far more
achieved the widespread distribution efficient and lightweight: It does not Loki can be virtualized easily and
level and popularity of its ancestor. meticulously evaluate incoming log that was even one of the core require-
Clearly, it was well enough known messages and store them on the basis ments of the developers. Because
to the Grafana developers, because of defined rules and keywords; rather, Loki requires fewer resources than
in their search for a suitable tool for it works on the basis of the labels at- ELK, it does not need massive hard-
their project they used Cortex as a tached to them. ware resources. Like Prometheus,
starting point, which also explains What sounds complicated in theory Loki is a Go application, which you
the slogan the Loki developers use to is simple and comprehensible in can get from GitHub [1]. However,
advertise their product: Loki is “like practice. Suppose, for example, an in- it is not necessary to roll out and
Prometheus, but for logs.” stance of the Cluster Consensus Man- launch Loki as a Go binary. In the
ager Consul is running in a Kuber- best cloud style, the Loki developers
Log Entries as Metric Data netes environment and produces log offer Docker images of the solution
messages. If you rely on Prometheus on Docker Hub, so you can deploy
Both Prometheus and its derivative for monitoring, you will use this tool them locally straightaway. Therefore,
Cortex are tools for monitoring, alert- to monitor Consul on the hosts. the only external task is to send the
ing, and trending (MAT). However, One metric that Prometheus uses for configuration file to the container.
they cannot be compared with the Consul is consul_service_health_sta-
well-known monitoring tools such tus, but if you are running a develop- Under the Hood
as Icinga 2 or Nagios, which primar- ment instance and a production in-
ily focus on event-based monitoring. stance of the environment, you could What looks so easy at first glance re-
MAT systems, on the other hand, are define an Env label that can assume quires a combination of several com-
designed to collect as many perfor- the value dev or prod. With Grafana ponents on the inside. In the style of
mance metrics as possible from the linked to Prometheus, different a cloud-native application, Loki com-
computers to be monitored. graphs could then be drawn by label. prises several components that need
From this data, the applied load can Loki does something very similar by to interact to succeed. However, the
be read off and the future load can be classifying the stored log entries by architecture on which Loki is based is
estimated; monitoring is more or less label so you can display log entries not that specific to Loki. It simply re-
a waste product. If you know how for prod and dev. cycles large parts of the development
many instances of the httpd process Although not as convenient as the work already done for Cortex (Fig-
are running on a system, you can full-text search feature to which ELK ure 1). Because Cortex works well,
use a suitable component to raise an users are accustomed, the Loki solu- there’s no reason why Loki shouldn’t.
alert as soon as a value drops below tion is far more frugal in terms of Log data that reaches Loki is grabbed
a certain threshold. Loki’s radically
revolutionary approach now consists
of treating the log data of the target
systems exactly as if they were regu-
lar metric data.
If you have already set up a complete
Prometheus for an environment, you
will have dealt with labels, which are
useful in Prometheus to distinguish
between metrics. Admins typically
use labels for certain values: An
http_return_codes metric could have
a value label, which in turn takes
tags of 200, 403, 404, and so on. Ulti-
mately, labels help admins keep the
total number of all metrics reasonably
manageable, limiting the overhead Figure 1: Loki inherits much of its design from Cortex, which sees itself as a more scalable
needed for storage and processing. Prometheus. © Grafana
W W W. A D M I N - M AGA Z I N E .CO M A D M I N 55 61
M A N AG E M E N T Loki
by the Distributor component; several briefly with the Ingesters to find log Where Do the Logs
instances of this service are usually entries that have not yet been moved
running. With large-scale systems, to storage. Otherwise, read and write
Originate?
the number of incoming log messages operations function completely inde- So far I have described how Loki
can quickly reach many millions pendently. works internally and how it stores
depending on the type of services and manages data. However, the
running in the cloud, so a single Scaling Works question of how log messages make
Distributor instance would hardly be their way to Loki has not yet been
sufficient. However, it would also be Looking at the overall Loki construct, clarified. This much is true: The
problematic to drop these incoming it becomes clear that the design of Prometheus Node Exporters are not
log messages into a database without the solution fits perfectly with the suitable here because they are tied
filtering and processing. If the data- requirements faced by the developers: to numeric metric data. Prometheus
base survived the onslaught, it would scalable, cost-effective with regard to itself does not have the ability to
inevitably become a bottleneck in the the required hardware, and as flexible process metric data other than num-
logging setup. as possible. bers, which is why the existing Pro-
The active instances of the Distribu- The index ends up with Cassandra, metheus exporters cannot handle log
tor therefore categorize the incom- Bigtable, or DynamoDB, all of which messages.
ing data into streams on the basis of are known to scale horizontally In the setup described here, the Loki
labels and forward them to Ingesters, without restrictions. The chunks are tool promtail attaches itself to existing
which are responsible for processing stored in an object store such as Ama- logging sources, records the details
the data. In concrete terms, process- zon S3, which also scales well. The there, and sends them to predefined
ing means forming log packages components belonging to Loki itself, instances of the Loki server. The
(chunks) from the incoming log mes- such as the Distributors and Queriers, “tail” in the name is no coincidence:
sages, which can be compressed by are stateless and therefore scale to Much like the tail Linux command,
Gzip. Like the Distributors, the sev- match requirements. it outputs the ends of logs in Pro-
eral Ingester instances also run at the Only the Ingester is a bit tricky. Un- metheus format.
same time, forming a ring architecture like its colleagues, it is a stateful During operation, you could also
over which a Distributor applies a application that simply must not let Promtail handle and manipulate
consistent hash algorithm to calculate fail. However, the implemented ring (rewrite) logfiles. Experienced Pro-
which of the Ingester instances is mechanism provides the features metheus jockeys will quickly notice
currently responsible for a particular required for sharding, so you can that Loki is fundamentally different
label. deploy any number of Ingesters to from Prometheus in one design as-
Once an Ingester has completed a suit needs. Loki scales horizontally pect: Whereas Prometheus collects
chunk of a log, the final step en route without limits. Because it does not its metric data from the monitoring
to central logging then follows: stor- store the contents of the incoming targets itself, Loki follows the push
ing the information in the storage log data, it has a noticeably smaller principle – the Promtail instances
system to which Loki is connected. hardware footprint than a comparable send their data to Loki.
As already mentioned, Loki differs ELK stack.
considerably from its predecessor The Loki documentation contains Graphics with Grafana
Prometheus, for which a time series detailed tips on scalability, but
database is a key aspect. briefly, to scale horizontally, Loki Because Loki comes from the Grafana
Loki, on the other hand, does not needs the Consul cluster consensus developers, the aggregated log data is
handle metrics, but text, so it stores mechanism to coordinate the work only displayed with this tool. Grafana
the chunks and information about steps beyond the borders of nodes. version 6.0 or newer offers the neces-
where they reside separately. The If you want to use Loki in this way, sary functions. The rest is simple:
index lists all known chunks of log it is a very good idea to read and Set up Loki as a data source as you
data, but the data packets themselves understand the corresponding docu- would for Prometheus. Grafana then
are located on the same storage facil- mentation, because a scaled Loki displays the corresponding entries.
ity configured for Loki. setup of this kind is far more com- The query language naturally has
What is interesting about the Loki ar- plex than a single instance. certain similarities in Loki and Cortex
chitecture is that it almost completely Loki is noticeably easier to imple- and therefore in Prometheus. Even
separates the read and write paths. ment than Prometheus, because Loki complex queries can be built. At the
If you want to read logs from Loki does not save the payload (i.e., the end of the day, Grafana turns out to
via Grafana, a third service is used log data) itself at the end. This task be a useful tool for displaying logs
in Loki, the Querier, which accesses is handled by external storage, which with the Loki back end. If you prefer
the index and stored chunks in the provides the high availability on a less graphical approach, the logcli
background. It also communicates which Loki relies. command-line tool is an option, too;
62 A D M I N 55 W W W. A D M I N - M AGA Z I N E .CO M
Loki M A N AG E M E N T
as expected, however, it is not par- output by Kubernetes, processing all quired; then, you can integrate it into
ticularly convenient. labels that belong to the metric data Prometheus accordingly. The Loki
automatically. Operations Manual lists appropriate
Strong Duet Loki ultimately inherits all these ad- metrics.
vantages: If Loki is attached to an ex-
In principle, the team of Loki and isting Kubernetes Prometheus setup, No Multitenancy
Promtail can be used completely in- the same label rules can be recycled,
dependently of a container solution, making the setup easy to use. Finally, Loki also unfortunately in-
just like Prometheus. However, the herited a “feature” from Prometheus.
developers doubtlessly prefer to see Monitoring Loki The program does not support user
them used in combination with Ku- administration and therefore treats
bernetes, and indeed, the solution is The best solution for centralized log- all users that access it equally. Loki is
particularly well suited to Kubernetes. ging is useless if it is not available not usable for multitenancy. Instead,
On the one hand, Prometheus and in a crisis. The same applies if the you need to run one Loki instance per
Cortex have been very closely con- admin has forgotten to integrate im- tenant and secure it such that access
nected to Kubernetes from the very portant logfiles into the Loki cycle. by external intruders is not possible. Q
beginning – so far, in fact, that Pro- However, the Loki developers offer
metheus can attach itself directly support for both scenarios. A small
to the Kubernetes master servers to service named Loki Canary systemati- Info
find the list of systems to monitor cally searches systems for logfiles that [1] Loki: [https://github.com/grafana/loki/]
with fully automated node discov- Loki does not collect. [2] Prometheus: [https://github.com/
ery. Additionally, Prometheus is Both Loki and Promtail can even out- prometheus/prometheus]
perfectly capable of collecting, inter- put metric data about themselves via [3] Cortex:
preting, and storing the metric data their Prometheus interfaces, if so re- [https://github.com/cortexproject/cortex]
M A N AG E M E N T Serverless Uptime Monitoring
Light Work
Monitoring with AWS Lambda serverless technology reduces costs and
scales to your infrastructure automatically. By Chris Binnie
For a number of reasons, it makes workload” in an efficient and cost- and have access to an account in which
sense to use today’s cloud-native in- effective manner [1]. you can test.
frastructure to run software without In this article, I show you how to
employing servers; instead, you can get started with AWS Lambda. Once Less Is More
use an arms-length, abstracted server- you’ve seen that external connectiv-
less platform such as AWS Lambda. ity is working, I’ll use a Python script As already mentioned, be warned
For example, when you create a to demonstrate how you might use a that Lambda function networking in
Lambda function (source code and a Lambda function to monitor a web- AWS has a few quirks. For example,
run-time configuration) and execute site all year round, without the need Internet Control Message Protocol
it, the AWS platform only bills you of ever running a server. (ICMP) traffic isn’t permitted for run-
for the execution time, also called For more advanced requirements, ning pings and other such network
the “compute time.” Simple tasks I’ll also touch on how to get the discovery services:
usually book only hundreds of mil- internal networking set up correctly Lambda attempts to impose as few
liseconds, as opposed to running an for a Lambda function to communi- restrictions as possible on normal lan-
Elastic Compute Cloud (EC2) server cate with nonpublic resources (e.g., guage and operating system activities,
instance all month long along with EC2 instances) hosted internally but there are a few activities that are
its associated costs. in AWS. Those Lambda functions disabled: Inbound network connec-
In addition to reducing the cost and will also be able to connect to the tions are blocked by AWS Lambda,
removing the often overlooked ad- Internet, which can be challenging and for outbound connections only
ministrative burden of maintaining to get right. TCP/IP sockets are supported, and
a fleet of servers to run your tasks, On an established AWS infrastructures, ptrace (debugging) system calls are
AWS Lambda also takes care of the most resources are usually segregated blocked. TCP port 25 traffic is also
sometimes difficult-to-get-right auto- into their own virtual private clouds blocked as an anti-spam measure.
Lead Image © joingate, 123RF.com
matic scaling of your infrastructure. (VPCs) for security and organizational Digging a little deeper …, the Lambda
With Lambda, AWS promises that you requirements, so I’ll look at the work- OS kernel lacks the CAP_NET_RAW
can sit back with your feet up and flow required to solve both internal and kernel capability to manipulate raw
rest assured that “your code runs in external connectivity headaches. I as- sockets.
parallel” and that the platform will be sume that you’re familiar with the ba- So, you can’t do ICMP or UDP from a
“scaling precisely with the size of the sics of the AWS Management Console Lambda function [2].
64 A D M I N 55 W W W. A D M I N - M AGA Z I N E .CO M
Serverless Uptime Monitoring M A N AG E M E N T
(Be warned that this page is a little a DNS lookup. First, however, you assign an IAM profile, trimmed right
dated and things may have changed.) should create your function. Figure 1 down, by default: AWS wants you to
In other words, you’re not dealing shows the AWS Management Con- log in to CloudWatch to check the ex-
with the same networking stack that sole [3] Lambda service page with an ecution of your Lambda function.
you might find on a friendly Debian orange Create function button. The next screen in Figure 2 shows
box running in EC2. However, as I’ll If you’re wearing your reading the new function; you can see its
demonstrate in a moment, public Do- glasses, you might see that the name name in the Designer section and
main Name Service (DNS) lookups do of the function I’ve typed is internet- that it has Amazon CloudWatch
work as you’d hope, usually with the access-function. I’ve also chosen Py- Logs permissions by default. Fig-
use of the UDP protocol. thon 3.7 as the preferred run time. I ure 2 is only the top of a relatively
leave the default Author from scratch long page that includes the Designer
Less Said, The Better option alone at the top. options. Sometimes these options
For now, I ignore the execution role at are hidden and you need to expand
The way to prove that DNS lookups the bottom of the page and visit that them with the arrow next to the
work is, as you might have guessed, to again later, because the clever gubbins word Designer.
use a short script that simply performs behind the scenes will automatically Next, hide the Designer options by
clicking on the aforementioned ar-
row. After a little scrolling down,
you should see where you will paste
your function code (Figure 3). A
“Hello World” script, which I will
run as an example, is already in the
code area.
When I run the Hello World Lambda
function by clicking Test, I get a big,
green welcome box at the top of the
screen (I had to scroll up a bit), and
I can expand the details to show the
output,
{
Figure 1: The page where you will create a Lambda function. "statusCode": 200,
"body": "\"Hello from Lambda!\""
}
import socket
Figure 3: Your Lambda function code will go here in place of the Hello World example. def lambda_handler(event, context):
W W W. A D M I N - M AGA Z I N E .CO M A D M I N 55 65
M A N AG E M E N T Serverless Uptime Monitoring
Listing 1: DNS Lookup Output data = socket.gethostbyname_ex(U your script is correct), simply click
'www.devsecops.cc') the Test button again; you should get
START RequestId:
4e90b424-95d9-4453-a2f4-8f5259f5f263 Version: $LATEST print (data) another green success bar at the top
('www.devsecops.cc', [], [' 138.68.149.181' ]) return of the screen.
END RequestId: 4e90b424-95d9-4453-a2f4-8f5259f5f263 The green bar will show null, be-
REPORT RequestId: 4e90b424-95d9-4453-a2f4-8f5259f5f263 over the top of the Hello World exam- cause the script doesn’t actually
Duration: 70.72 ms ple and click the orange Save button output anything. However, if you
Billed Duration: 100 ms at the top. look in the Log Output section, you
Memory Size: 128 MB To run the function as it stands (using can see some output (Listing 1),
Max Memory Used: 55 MB
only the default configuration options with the IP address next to the DNS
Init Duration: 129.20 ms
and making sure the indentation in name you looked up.
Listing 2: handler.py
001 import json 048 def reportbody(self):
002 import os 049 return self.__get_property(self.REPORT_RESPONSE_BODY)
003 import boto3 050
004 from time import perf_counter as pc 051 @property
005 import socket 052 def cwoptions(self):
006 053 return {
007 class Config: 054 'enabled': self.__get_property(self.REPORT_AS_CW_METRICS),
008 """Lambda function runtime configuration""" 055 'namespace':
009 self.__get_property(self.CW_METRICS_NAMESPACE),
010 HOSTNAME = 'HOSTNAME' 056 }
011 PORT = 'PORT' 057
012 TIMEOUT = 'TIMEOUT' 058 class PortCheck:
013 REPORT_AS_CW_METRICS = 'REPORT_AS_CW_METRICS' 059 """Execution of HTTP(s) request"""
014 CW_METRICS_NAMESPACE = 'CW_METRICS_NAMESPACE' 060
015 061 def __init__(self, config):
016 def __init__(self, event): 062 self.config = config
017 self.event = event 063
018 self.defaults = { 064 def execute(self):
019 self.HOSTNAME: 'google.com.au', 065 sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
020 self.PORT: 443, 066 sock.settimeout(int(self.config.timeout))
021 self.TIMEOUT: 120, 067 try:
022 self.REPORT_AS_CW_METRICS: '1', 068 # start the stopwatch
023 self.CW_METRICS_NAMESPACE: 'TcpPortCheck', 069 t0 = pc()
024 } 070
025 071 connect_result = sock.connect_ex
026 def __get_property(self, property_name): ((self.config.hostname, int(self.config.port)))
027 if property_name in self.event: 072 if connect_result == 0:
028 return self.event[property_name] 073 available = '1'
029 if property_name in os.environ: 074 else:
030 return os.environ[property_name] 075 available = '0'
031 if property_name in self.defaults: 076
032 return self.defaults[property_name] 077 # stop the stopwatch
033 return None 078 t1 = pc()
034 079
035 @property 080 result = {
036 def hostname(self): 081 'TimeTaken': int((t1 - t0) * 1000),
037 return self.__get_property(self.HOSTNAME) 082 'Available': available
038 083 }
039 @property 084 print(f"Socket connect result: {connect_result}")
040 def port(self): 085 # return structure with data
041 return self.__get_property(self.PORT) 086 return result
042 087 except Exception as e:
043 @property 088 print(f"Failed to connect to {self.config.hostname}:{self.
044 def timeout(self): config.port}\n{e}")
045 return self.__get_property(self.TIMEOUT) 089 return {'Available': 0, 'Reason': str(e)}
046 090
047 @property 091 class ResultReporter:
66 A D M I N 55 W W W. A D M I N - M AGA Z I N E .CO M
Serverless Uptime Monitoring M A N AG E M E N T
W W W. A D M I N - M AGA Z I N E .CO M A D M I N 55 67
M A N AG E M E N T Serverless Uptime Monitoring
68 A D M I N 55 W W W. A D M I N - M AGA Z I N E .CO M
Serverless Uptime Monitoring M A N AG E M E N T
of the jigsaw puzzle is relatively inside a VPC so that they can access I could be forgiven for summarizing it
easy to pick up if you haven’t done nonpublic resources securely, as well in one sentence: “To access resources
it before. Remember to disable the as the Internet. Table 1 shows the inside a VPC, use a private subnet and
CloudWatch rule once you’ve finished workflow involved. a NAT gateway and then connect that
testing to avoid the potential of an A minor caveat is that if you’re test- to a public subnet, which by inference
email storm. ing against existing networking that has an Internet gateway attached for
Now that you have a shiny new is already running important services, external Internet access.”
working Lambda function that can be it’s possible to tie yourself in knots I’ve had success with the above ap-
scheduled to run whenever you like, and break things horribly. proach, so bear this workflow in
I’ll spend a moment looking at what To get started, try to create, where mind for future reference if you fore-
a more complex workflow might look possible, these new resources inside a see a need.
like if you were running your Lambda new VPC for testing purposes. Some
function inside a VPC. of the resources should definitely be Endless
deleted afterward – especially the
Don’t Be Careless Elastic Network Interface (ENI) – to No doubt you’ll be using serverless
save ongoing costs for Elastic IP ad- technologies more and more in the
At the beginning of this article, I men- dresses. Consider yourself suitably future. However, a few gotchas that
tioned that Internet access is trickier warned! introduce security risks still need some
if you have a more mature infrastruc- If you are familiar with the innards of attention. Sadly, they don’t magically
ture and host your Lambda functions AWS and have looked through Table 1, disappear when using an abstracted
platform, as some would hope. That
Table 1: Workflow for VPCs said, I hope you can see the benefits
Step Action Required of such abstraction, in terms of opera-
1 Check your VPC configuration and create a new one if needed. tional overhead and running costs. It’s
2 Create a private subnet specifically for your Lambda function, so you can isolate your
safe to say that with some basic script-
other services from potential security risks. ing skills, serverless technology makes
3 Create a public subnet in your VPC if one doesn’t exist. light work of numerous tasks. Q
4 Ensure an Internet gateway is present in the public subnet, and adjust your routing
table for outbound traffic to point at 0.0.0.0/0.
Info
5 Point your private subnet’s NAT gateway at the public subnet and point all traffic
(0.0.0.0/0) to the NAT gateway. [1] AWS Lambda:
[https://aws.amazon.com/lambda]
6 Create or adjust a security group for your network rules, “self referencing” the secu-
rity group to itself in a rule, if needed by your Lambda function. [2] Ping from Lambda function:
[https://forums.aws.amazon.com/thread.
7 Configure your Lambda function to use the correct VPC, subnet(s), and security group.
jspa?threadID=263968]
8 Add the suitable IAM permissions to your Lambda functions so that it can access the
[3] AWS Management Console:
resources of your VPC. Make sure these permissions are available to your IAM role:
[https://console.aws.amazon.com]
ec2:CreateNetworkInterface
[4] handler.py: [https://github.com/
ec2:DescribeNetworkInterfaces
base2Services/aws-lambda-port-check/
ec2:DeleteNetworkInterface
blob/master/handler.py]
ec2:DescribeSecurityGroups
[5] Example test: [https://github.com/
ec2:DescribeSubnets
base2Services/aws-lambda-port-check/
ec2:DescribeVpc
blob/master/test/example.json]
W W W. A D M I N - M AGA Z I N E .CO M A D M I N 55 69
M A N AG E M E N T Ansible Hybrid Cloud
Seamless
Extending your data center temporarily into the cloud during a customer rush might not be easy, but it
can be done, thanks to Ansible’s Playbooks and some AWS scripts. By Konstantin Agouros
Companies often do not exclusively necessarily retire them. With a few is, Secure Shell (SSH) for Linux VMs
use public cloud services such as Am- commands or clicks, you can simply and the Remote Desktop Protocol
azon Web Services (AWS) [1], Micro- assign more and faster resources. (RDP) for Windows VMs. By way of
soft Azure [2], or Google Cloud [3]. Things are different in the cloud, an example, when an AWS admin
Instead, they rely on a mix, known as where you have a service in mind. picks a database from the Database-
a hybrid cloud. In this scenario, you To operate it, you have to provide as-a-Service offerings, they can only
connect your data centers (private defined resources for a certain pe- access it through the IP address they
cloud) with the resources of a public riod of time, build these services in use to control the AWS Console.
cloud provider. The term “private an automated process, to the extent If you set up the virtual networks
cloud” is somewhat misleading, in possible (sometimes even from in the public cloud with private ad-
that the operation of many data cen- scratch), use them, and only pay dresses only, they are just as invisible
ters has little to do with cloud-based the public cloud providers for the from the Internet as the servers in
working methods, but it looks like the period of use. Then, you shut down your own data center.
name is here to stay. the machines, reducing resource re-
The advantage of a hybrid cloud is quirements to zero. Cloudbnb
that companies can use it to absorb If these resources include virtual ma-
peak loads or special requirements chines (VMs), you again build them At AWS, but also in the Google and
without having to procure new hard- automatically, use them, and delete Microsoft clouds, for example, the
ware for five- or six-digit figures. them. The classic server life cycle is concept of the virtual private cloud
In this article, I show how you can therefore irrelevant and is degraded to (VPC) acts as the account’s backbone.
add a cloud extension to an Ansible a component in an architecture that With an AWS account in each region,
[4] role that addresses local servers. an admin brings to life at the push of you can even operate several VPC in-
To do this, you extend a local Play- a button. stances side by side.
book for an Elasticsearch cluster so To connect to this network, the cloud
that it can also be used in the cloud, Visible for Everyone? providers offer a site-to-site VPN
and the resources disappear again service. Alternatively, you can set up
after use. One popular misconception about your own VPN gateway (e.g., in the
the use of public cloud services is form of a VM, such as Linux with
Cloudy Servers that these services are “freely ac- IPsec/OpenVPN) or a virtual firewall
cessible on the Internet.” This state- appliance, the latter of which offers
In classical data center operation, a ment is not entirely true, because a higher level of security, but usually
server is typically used for a project most cloud providers leave it to the comes at a price.
and installed by an admin. It then admin to decide whether to provide This service ultimately creates a
runs through a life cycle in which a service or a VM with a publicly ac- structure, that, conceptually, does not
Photo by JJ Ying on Unsplash
it receives regular patches. At some cessible IP address. Additionally, you differ fundamentally from the way
point, it is no longer needed or is out- usually have to activate explicitly all in which you would connect branch
dated. In the virtualized world, the the services you want to be acces- offices to the head office – with one
same thing happens in principle, only sible from outside, although this usu- difference: The public cloud provider
with virtual servers. However, for ally does not apply to the services can potentially access the data on the
performance reasons, you no longer required for administration -- that machines and in the containers.
70 A D M I N 55 W W W. A D M I N - M AGA Z I N E .CO M
Ansible Hybrid Cloud M A N AG E M E N T
Protecting Data started there must become part of two servers on which data nodes 3
the Elastic cluster. and 4 are to run. In between is a vir-
The second major security concern The following explanations assume tual firewall, by Fortinet in this case,
relates to storing data. Especially you have already written Ansible that terminates the VPN tunnel and
when processing personal informa- roles for installing the Elasticsearch- controls access with firewall rules.
tion for members of the European Logstash-Kibana (ELK) cluster. You This setup requires several configura-
Union (EU), you have to be careful will find a listing for a Playbook on tion steps in AWS: You need to create
for legal reasons about which of the the ADMIN FTP site [7]. Thanks to the VPC with a main network. On
cloud provider’s regions is used to the structure of these roles, you can this, you then assign all the subnets:
store the data. Relocating the cus- add more nodes by appending param- one internal (inside) and one accessi-
tomer database to Japan might turn eters to the Hosts file, and it includes ble from the Internet (outside). Then,
out to be a less than brilliant idea. installing the software on the node. you create an Internet gateway in the
Even if the data is stored on servers The roles that Ansible calls are de- outside subnet. Through this, the data
within the EU, the question of who termined by the Hosts file (usually traffic migrating toward the Internet
gets access still needs to be clarified. in /etc/ansible/hosts) and the vari- finds an exit from the cloud. For this
Encrypting data in AWS is possi- ables set in it for each host. Listing 1 purpose, you define a routing table
ble [5]. If you do not have confidence shows the original file. for the outside subnet that specifies
in your abilities, you could equip a Host 10.0.2.25 is the master node this Internet gateway as the standard
Linux VM with a self-encrypted vol- on which all software runs. The route (Figure 2).
ume (e.g., LUKS [6]) and not store other two hosts are the data nodes
the password on the server. With of the cluster. The variable do_ela Cloud Firewall
AWS, this does not work for system controls whether the Elasticsearch
disks, but it does at least for data role can perform installations. When In the next step, you create a security
volumes. After starting the VM, you expanding the cluster, this ensures group that comprises host-related
have to send the password. This pro- that Ansible does not reconfigure the firewall rules for AWS. Because the
cess can be automated from your own existing nodes – but more about the firewall can protect itself, the group
data center. The only possible route details later. opens the firewall for all incoming and
of access for the provider is to read outgoing traffic, although this could
the machine RAM; this risk exists Extending the Cluster be restricted. The next step is to create
where modern hardware enables live an S3 bucket that contains the starting
encryption, as well.
in AWS configuration and the license for the
As a last resort, you can ensure that The virtual infrastructure in AWS firewall. Next, you generate the config
the computing resources in the cloud comprises a VPC with two subnets. file for the firewall and upload it with
only access data managed by the lo- One subnet can be reached from the the license. For a rented, but more ex-
cal data center. However, you will Internet; the other represents the in- pensive, firewall, this license informa-
need a powerful Internet connection. ternal area, which also contains the tion can also be omitted.
Solving a Problem
Assume you have a local Elastic-
search cluster of three nodes: a mas-
ter node, which also houses Logstash
and Kibana, and two data nodes with
data on board (Figure 1).
You now want to provide this cluster
temporarily two more data nodes
in the public cloud. You could have
several reasons for this; for ex-
ample, you might want to replace
the physical data nodes because of Figure 1: A secure network architecture (here the rough structure) should connect the
hardware problems, or you might nodes in the local data center with those on AWS.
temporarily need higher performance
for data analysis. Because it is not Listing 1: ELK Stack Hosts File
typically worthwhile to procure new 10.0.2.25 ansible_ssh_user=root logstash=1 kibana=1 masternode=1 grafana=1 do_ela=1
hardware on a temporary basis, the 10.0.2.26 ansible_ssh_user=root masternode=0 do_ela=1
public cloud is a solution. The logic 10.0.2.44 ansible_ssh_user=root masternode=0 do_ela=1
is shown in Figure 1; the machines
W W W. A D M I N - M AGA Z I N E .CO M A D M I N 55 71
M A N AG E M E N T Ansible Hybrid Cloud
72 A D M I N 55 W W W. A D M I N - M AGA Z I N E .CO M
Ansible Hybrid Cloud M A N AG E M E N T
receives via cloud-init, and installs Listing 2: YAML Stack Definition Part 1
the software on the Linux VMs.
01 [...] 47 ToPort: 65535
Cloud-init could also install the soft- 02 Resources: 48 CidrIp: 0.0.0.0/0
ware, but Ansible will set up exactly 03 FortiVPC:
49 VpcId:
the roles that helped to configure 04 Type: AWS::EC2::VPC
05 Properties: 50 Ref: FortiVPC
the local servers at the beginning.
06 CidrBlock: 51
I developed the CloudFormation
07 Ref: VPCNet 52 InstanceProfile:
template from the version by fire-
08 Tags:
wall manufacturer Fortinet [8]. I 53 Properties:
09 - Key: Name
simplified the structure, compared 10 Value: 54 Path: /
with their version on GitHub, so 11 Ref: VPCName 55 Roles:
that the template in the cloud only 12 56 - Ref: InstanceRole
13 FortiVPCFrontNet:
raises a firewall and not a cluster. 57 Type: AWS::IAM::InstanceProfile
14 Type: AWS::EC2::Subnet
Additionally, the authors of the 15 Properties: 58 InstanceRole:
Fortinet template used a Lambda 16 CidrBlock: 59 Properties:
function to modify the firewall con- 17 Ref: VPCSubnetFront 60 AssumeRolePolicyDocument:
figuration. Here, this task is done 18 MapPublicIpOnLaunch: true
61 Statement:
by the Playbook, which in turn uses 19 VpcId:
20 Ref: FortiVPC 62 - Action:
the template.
21 63 - sts:AssumeRole
In the CloudFormation template, the 22 FortiVPCBackNet: 64 Effect: Allow
process can be static. The two Linux 23 Type: AWS::EC2::Subnet
65 Principal:
VMs use CentOS as their operating 24 Properties:
66 Service:
system and should run on the internal 25 CidrBlock:
26 Ref: VPCSubnetBack 67 - ec2.amazonaws.com
subnet; you simply attach them to the
27 MapPublicIpOnLaunch: false 68 Version: 2012-10-17
template and the return values. List-
28 AvailabilityZone: !GetAtt 69 Path: /
ings 2 through 4 show excerpts from FortiVPCFrontNet.AvailabilityZone
the stack definition in YAML format. 70 Policies:
29 VpcId:
The complete YAML file can be down- 30 Ref: FortiVPC 71 - PolicyDocument:
loaded from the ADMIN anonymous 31 72 Statement:
FTP site [7]. 32 FortiSecGroup: 73 - Action:
33 Type: AWS::EC2::SecurityGroup
The objects of the AWS::EC2::Instance 74 - ec2:Describe*
34 Properties:
type are the VMs designed to extend 35 GroupDescription: Group for FG 75 - ec2:AssociateAddress
the Elastic stack (Listings 3 and 4). 36 GroupName: fg 76 - ec2:AssignPrivateIpAddresses
Because of the firewall, the VM is 37 SecurityGroupEgress: 77 - ec2:UnassignPrivateIpAddresses
more complex to configure; it has to 38 - IpProtocol: -1
78 - ec2:ReplaceRoute
39 CidrIp: 0.0.0.0/0
have two dedicated interface objects 79 - s3:GetObject
40 SecurityGroupIngress:
so that routing can point to it (List- 41 - IpProtocol: tcp 80 Effect: Allow
ing 3, line 11). 42 FromPort: 0 81 Resource: '*'
Importantly, the firewall instance and 43 ToPort: 65535
82 Version: 2012-10-17
both generated interfaces are located in 44 CidrIp: 0.0.0.0/0
45 - IpProtocol: udp 83 PolicyName: ApplicationPolicy
the same availability zone; otherwise,
46 FromPort: 0 84 Type: AWS::IAM::Role
the stack will fail. To this end, the VMs
contain descriptions, and the second
subnet contains the reference to the command, which specifies the name described and uploads it together with
availability zone of the first subnet. of the YAML file created and fills the license.
The UserData part of the firewall in- the parameters at the beginning of The next task creates the complete
stance (Listing 3, line 18) contains a the stack. The S3 bucket you want stack (Listing 6). What’s new is the
description file that tells the VM where to pass in must already exist. Both connection to the old Elasticsearch
to find the configuration and license the license and the generated con- Playbook or Hosts file. The latter has
file previously uploaded by Ansible. figuration should be uploaded up a group named elahosts, which adds
The network configuration has al- front. All these tasks are done by the IP addresses of the two new serv-
ready been described and is defined the Ansible Playbook, as shown in ers to the Playbook so that a total of
at the top of Listing 2. The finished Listings 5 through 9. five hosts are in the list for further
template can now be run at the com- The Playbook uses multiple “plays.” execution of the Playbook. However,
mand line with the The first (Listing 5) creates the con- some operations will only take place
figuration for the firewall and, if not on the new hosts. Listing 6 (lines 44
aws cloudformation create-stack available, the S3 bucket (line 20) as and 49) creates the newhosts group,
W W W. A D M I N - M AGA Z I N E .CO M A D M I N 55 73
M A N AG E M E N T Ansible Hybrid Cloud
Listing 4: YAML Stack Definition Part 3 now known, the Play- after installation. At the end, the
01 [...] book can define the IP Ansible script in Listing 7 waits for
02 ServerInstance: address. the reboot to occur and then for it
03 Type: "AWS::EC2::Instance"
When logging in to to reach the firewall again.
04 Properties:
the firewall for the A play now follows that teaches the
05 ImageId: "ami-0e1ab783dc9489f34" # Centos7 for paris
06 InstanceType: t3.2xlarge first time, the firewall local firewall what the VPN tunnel
07 AvailabilityZone: !GetAtt FortiVPCFrontNet.AvailabilityZone requires a password to the firewall looks like in AWS
08 KeyName: change. You can use (Listing 8). The VPN definition at
09 Ref: KeyName several methods to the other end was in the previously
10 SubnetId: set up Fortigate in uploaded configuration. Because of
11 Ref: FortiVPCBackNet
Ansible. However, the the described problems with the An-
12 SecurityGroupIds:
13 - !Ref ServerSecGroup
FortiOS network mod- sible modules for FortiOS (I suspect
14 ules that have been incompatibilities between Ansible
15 Server2Instance: included in the An- modules and the Python fosapi),
16 Type: "AWS::EC2::Instance" sible distribution for a the play uses Ansible’s URI method
17 Properties: while do not yet work to configure the firewall. Authenti-
18 ImageId: "ami-0e1ab783dc9489f34" # Centos7 for paris properly. The raw ap- cation for the API requires a login
19 [...]
proach is used here process; it then returns a token that
(Listing 7, line 10), is used in the following REST calls.
to which it adds the two hosts. which pushes the commands onto The configuration initially consists of
The next play (Listing 7) configures the device, as on the command line. the key exchange phase1 and phase2
the firewall. In its existing configu- The first two lines of the raw task parameters. The phase1 parameter
ration, the static IP address for the set the password, which resides on contains the password, crypto pa-
inside network card is missing – the instance ID in the AWS version. rameters, and IP address of the fire-
AWS only sets this when creating Because the license has already wall in AWS. The phase2 parameter
the instance. Because the data is been installed, the firewall reboots also provides crypto parameters
74 A D M I N 55 W W W. A D M I N - M AGA Z I N E .CO M
Ansible Hybrid Cloud M A N AG E M E N T
and data for the local and remote takes some time for the VPN con- Within AWS, all systems are pre-
networks. The configuration also nection to be ready for use, the play pared for IPv6, but this does not ap-
provides a route (line 62) that passes now waits for the master node of ply to the configuration used here.
the network on the AWS side to the the Elastic cluster until it can reach Therefore, the first task forces you to
VPN tunnel, and two firewall rules a new node via SSH. switch to IPv4. The second one up-
that allow traffic from and to the pri- The last piece of the Playbook fi- dates the configuration of the system.
vate network on the AWS side (lines nally installs Elasticsearch on the In the third task, the Elastic cluster
71 and 85). new node and adapts its configura- role finally installs and configures the
A bit further down (Listing 9), the tion to match the existing cluster. software.
Playbook sets the do_ela parameter The role takes the major version of Because Ansible only creates the
to 1 for the new hosts so that this Elasticsearch as a parameter and Elasticsearch user to which the elk-
role will also install Elasticsearch a path in which the Elasticsearch data/ folder should belong during
later. It uses 0 as the value for server can store the data, which al- the installation, the script also has
masternode, because the new hosts lows you to insert a separate mount to tweak the permissions and restart
are data nodes. Because it usually point on a data-only disk. Elasticsearch (starting in line 46).
W W W. A D M I N - M AGA Z I N E .CO M A D M I N 55 75
M A N AG E M E N T Ansible Hybrid Cloud
76 A D M I N 55 W W W. A D M I N - M AGA Z I N E .CO M
Ansible Hybrid Cloud M A N AG E M E N T
The Author
Konstantin Agouros is Head of Open Source Proj-
ects at matrix technology AG, where he and his
team advise customers on open source and cloud
topics. His latest book Software Defined Network-
ing: SDN-Praxis mit Controllern und OpenFlow
[Practical Applications with Controllers and Open-
Figure 4: The status of the Elastic cluster after expansion into the AWS Cloud. Flow] (in German) is published by de Gruyter.
W W W. A D M I N - M AGA Z I N E .CO M A D M I N 55 77
N U TS A N D B O LTS Python Code Analysis
In Profile
Profiling your Python code – as a whole or by function – shows where you ues to capture the profile accurately.
If the interval becomes too small,
should spend time speeding up your programs. By Jeff Layton however, it almost becomes deter-
ministic profiling, and run time is
To improve the performance of your I focus on two types: deterministic greatly increased.
applications, you need to conduct and statistical. Deterministic profil- If your code takes a long time to
some kind of dynamic (program, soft- ing captures every computation of execute (e.g., hours or days), deter-
ware, code) analysis, also called pro- the code and produces very accurate ministic profiling might be impossible
filing, to measure metrics of interest. profiles, but it can greatly slow down because the increase in run time is
A key metric for developers is time code performance. Although you unacceptable. In this case, statistical
(i.e., where is the code spending most achieve very good accuracy with profiling is appropriate because of the
of its time?), because it allows you to the profile, run times are greatly longer periods of time available to
focus on areas, or hotspots, that can increased, and you have to wonder sample performance.
be made to run faster. whether the profiling didn’t ad- In this article, I focus on profiling
And, this might seem obvious, but versely affect how the code ran. For Python code, primarily because of
if you don’t profile for code optimi- example, did the profiling cause the a current lack of Python profiling
zation, you could flounder all over computation bottlenecks to move to but also because I think the process
the code improving sections you a different place in the code? of profiling Python code, creating
think might be bottlenecks. I have Statistical profiling, on the other functions, and using Numba to then
seen people spend hours working a hand, takes periodic “samples” of compile these functions for CPUs or
particular part of their code when a the code computations and uses GPUs is a good way to help improve
simple profile showed that portion them as representations of the profile performance.
of the code contributed very little to of the code. This method usually To help illustrate some tools you
the overall run time. I admit that I has very little effect on code perfor- can use to profile Python code, I
Lead Image ©-Yong Hian Lim, Fotolia.com
have also done this; however, once I mance, so you can get a profile that will use an example of an idealized
profiled the code, I found that I had is very close to the real execution molecular dynamics (MD) applica-
wasted my time and needed to focus of the code. You do have to wonder tion. I’ll work through some profil-
elsewhere. about the correct time interval to get ing tools and modify the code in a
Different kinds of profiling (e.g., an accurate profile of the applica- reasonable manner for better profil-
event-based, statistical, instru- tion while not affecting the run time. ing. The first, and probably most
mented, simulation), are used in Usually this means setting the time used and flexible, method I want to
different situations. In this article, intervals to smaller and smaller val- mention is “manual” profiling.
78 A D M I N 55 W W W. A D M I N - M AGA Z I N E .CO M
Python Code Analysis N U TS A N D B O LTS
Manual Profiling cProfile, as the name hints, is writ- statistical profiling (pure Python).
ten in C as a Python extension and The form of pprofile is:
The manual profiling approach is comes in the standard Python 3,
fairly simple but involves inserting which keeps the overhead low, so the $ pprofile some_python_executable arg1 ...
timing points into your code. Tim- profiler doesn’t affect the amount of
ing points surround a section of code time much. After the tool finishes, it prints anno-
and collect the total elapsed time(s) cProfile outputs a few stats about the tated code of each file involved in the
for the section, as well as how many test code: execution.
times the section is executed. From Q ncalls – Number of calls to the By default, pprofile profiling is de-
this information, you can calculate portion of code. terministic, which, although it slows
an average elapsed time. The timing Q tottime – Total time spent in the down the code, produces a very com-
points can be spread throughput the given function (excludes time plete profile. You can also use ppro-
code, so you get an idea of how much made in calls to subfunctions). file in a statistical manner, which
time each section of the code takes. Q percall – tottime divided by uses much less time:
The elapsed times are printed at the ncalls.
end of execution, to give you an idea Q cumtime – Cumulative time spent in $ pprofile --statistic .01 code.py
of where you should focus your ef- the specific function, including all
forts to improve performance. subfunctions. With the statistic option, you also
A key advantage of this approach is Q percall – cumtime divided by need to specify the period of time
its generally low overhead. Addition- ncalls. between sampling. In this example, a
ally, you can control which portions cProfile also outputs the file name of period of 0.01 seconds was used.
of the code are timed (you don’t have the code, in case multiple file are in- Be careful when using the statistic
to profile the entire code). A down- volved, as well as the line number of option because, if the sample time is
side is that you have to instrument the function (lineno). too long, you can miss computations,
your code by inserting timing points Running cProfile is fairly simple: and the output will incorrectly record
throughout. However, inserting these zero percent activity. Conversely, to
points is not difficult. $ python -m cProfile -s cumtime script.py get a better estimation of the time
An easy way to accomplish this uses spent in certain portions of the code,
the Python time module. Simple code The first part of the command tells you have to reduce the time between
from an article on the Better Program- Python to use the cProfile module. samples to the point of almost deter-
ming [1] website (example 16) is The output from cProfile is sorted ministic profiling.
shown in Listing 1. The code simply (-s) by cumtime
calls the current time before and after (cumulative time). Listing 1: Time to Execute
a section of code of interest. The dif- The last option on import time
ference is elapsed time, or the amount the command line
of time needed to execute that section is the Python code start_time = time.time()
# Code to check follows
of code. of interest. cProfile
a, b = 1,2
If a section of code is called repeat- also has an option c = a + b
edly, just sum the elapsed times for (-o) to send the # Code to check ends
the section and sum the number stats to an output end_time = time.time()
of times that section is used; then, file instead of time_taken = (end_time- start_time)
you can compute the average time stdout. Listing 2
print(" Time taken in seconds: {0} s").format(time_taken_in_micro)
through the code section. If the num- shows a sample of
ber of calls is large enough, you can the first few lines
do some quick descriptive statistics from cProfile on Listing 2: cProfile Output
and compute the mean, median, vari- a variation of the Thu Nov 7 08:09:57 2019
ance, min, max, and deviations. MD code. 12791143 function calls (12788375 primitive calls) in 156.745 seconds
W W W. A D M I N - M AGA Z I N E .CO M A D M I N 55 79
N U TS A N D B O LTS Python Code Analysis
The deterministic pprofile sample Line-by-Line Function can help. The line_profiler module
output in Listing 3 uses the same performs line-by-line profiling of
Profiling
code as the previous cProfile ex- functions, and the kernprof script al-
ample. I cut out sections of the The useful pprofile analyzes your lows you to run either line_profiler
output because it is very extensive. I entire code line by line. It can or standard Python profilers such as
do want to point out the increase in also do deterministic and statisti- cProfile.
execution time by about a factor of cal profiling. If you want to focus To have kernprof run line_profiler,
10 (i.e., it ran 10 times slower than on a specific function within your enter,
without profiling). code, line_profiler and kernprof
$ kernprof -l script_to_profile.py
Listing 3: pprofile Output
Command line: md_002.py which will produce a binary file,
Total duration: 1662.48s script_to_profile.py.lprof. To “de-
File: md_002.py code” the data, you can enter the
File duration: 1661.74s (99.96%) command:
Line #| Hits| Time| Time per hit| %|Source code
------+----------+-------------+-------------+-------+----------- $ python3 -m line_profiler U
1| 0| 0| 0| 0.00%|# md test code script_to_profile.py.lprof > results.txt
2| 0| 0| 0| 0.00%|
3| 2| 3.50475e-05| 1.75238e-05| 0.00%|import platform
and look at the results.txt file.
4| 1| 2.19345e-05| 2.19345e-05| 0.00%|from time import clock
To get line_profiler to profile only
(call)| 1| 2.67029e-05| 2.67029e-05| 0.00%|# :1009 _handle_fromlist
5| 1| 2.55108e-05| 2.55108e-05| 0.00%|import numpy as np
certain functions, put an @profile
(call)| 1| 0.745732| 0.745732| 0.04%|# :978 _find_and_load decorator before the function declara-
6| 1| 2.57492e-05| 2.57492e-05| 0.00%|from sys import exit tion. The output is the elapsed time
(call)| 1| 1.7643e-05| 1.7643e-05| 0.00%|# :1009 _handle_fromlist for the routine. The percentage of
7| 1| 7.86781e-06| 7.86781e-06| 0.00%|import time time, which is something I tend to
... check first, is relative to the total time
234| 0| 0| 0| 0.00%| # Compute the potential energy and forces for the function (be sure to remember
235| 12525000| 51.0831| 4.07849e-06| 3.07%| for j in range(0, p_num): that). The example in Listing 4 is
236| 12500000| 51.6473| 4.13179e-06| 3.11%| if (i != j):
output for some example code dis-
237| 0| 0| 0| 0.00%| # Compute RIJ, the displacement vector
cussed in the next section.
238| 49900000| 210.704| 4.22253e-06| 12.67%| for k in range(0, d_num):
239| 37425000| 177.055| 4.73093e-06| 10.65%| rij[k] = pos[k,i] - pos[k,j]
240| 0| 0| 0| 0.00%| # end for Example Code
241| 0| 0| 0| 0.00%|
242| 0| 0| 0| 0.00%| # Compute D and D2, a distance and a To better illustrate the process of
truncated distance using a profiler, I chose some MD
243| 12475000| 50.5158| 4.04936e-06| 3.04%| d = 0.0 Python code with a fair amount of
244| 49900000| 209.465| 4.1977e-06| 12.60%| for k in range(0, d_num): arithmetic intensity that could eas-
245| 37425000| 175.823| 4.69801e-06| 10.58%| d = d + rij[k] ** 2 ily be put into functions. Because
246| 0| 0| 0| 0.00%| # end for I’m not a computational chemist,
247| 12475000| 78.9422| 6.32803e-06| 4.75%| d = np.sqrt(d)
let me quote from the website: “The
248| 12475000| 64.7463| 5.19008e-06| 3.89%| d2 = min(d, np.pi / 2.0)
computation involves following
249| 0| 0| 0| 0.00%|
the paths of particles which exert a
250| 0| 0| 0| 0.00%| # Attribute half of the total
potential energy to particle J
distance-dependent force on each
251| 12475000| 84.7846| 6.79636e-06| 5.10%| potential = potential + 0.5 * other. The particles are not con-
np.sin(d2) * np.sin(d2) strained by any walls; if particles
252| 0| 0| 0| 0.00%| meet, they simply pass through each
253| 0| 0| 0| 0.00%| # Add particle J's contribution to the other. The problem is treated as a
force on particle I. coupled set of differential equations.
254| 49900000| 227.88| 4.56674e-06| 13.71%| for k in range(0, d_num): The system of differential equation
255| 37425000| 244.374| 6.52971e-06| 14.70%| force[k,i] = force[k,i] - rij[k] * is discretized by choosing a dis-
np.sin(2.0 * d2) / d
crete time step. Given the position
256| 0| 0| 0| 0.00%| # end for
and velocity of each particle at one
257| 0| 0| 0| 0.00%| # end if
time step, the algorithm estimates
258| 0| 0| 0| 0.00%|
259| 0| 0| 0| 0.00%| # end for
these values at the next time step.
... To compute the next position of
each particle requires the evaluation
80 A D M I N 55 W W W. A D M I N - M AGA Z I N E .CO M
Python Code Analysis N U TS A N D B O LTS
When you download the Python Line # Hits Time Per Hit % Time Line Contents
version of the code, it already has ==============================================================
several functions. To better illustrate 126 @profile
profiling the code, I converted it 127 def update(d_num, p_num, rmass, dt, pos, vel, acc, force):
128
to simple serial code and called it
129 # Update
md_001.py (Listing 5). Then, I pro-
130
filed the code with cProfile: 131 # Update positions
132 200 196.0 1.0 0.1 for i in range(0, d_num):
$ python3 -m cProfile -s cumtime md_001.py 133 75150 29671.0 0.4 8.1 for j in range(0, p_num):
134 75000 117663.0 1.6 32.2 pos[i,j] = pos[i,j] + vel[i,j]*dt + 0.5 *
Listing 6 is the top of the profile acc[i,j]*dt*dt
135 # end for
output ordered by cumulative time
136 # end for
(cumtime). Notice that the profile out-
137
put only lists the code itself. Because 138 # Update velocities
it doesn’t profile the code line by 139 200 99.0 0.5 0.0 for i in range(0, d_num):
line, it’s impossible to learn anything 140 75150 29909.0 0.4 8.2 for j in range(0, p_num):
about the code. 141 75000 100783.0 1.3 27.6 vel[i,j] = vel[i,j] + 0.5*dt*( force[i,j] *
I also used pprofile: rmass + acc[i,j] )
142 # end for
143 # end for
$ pprofile md_001.py
144
145 # Update accelerations.
The default options cause the code 146 200 95.0 0.5 0.0 for i in range(0, d_num):
to run much slower because it is 147 75150 29236.0 0.4 8.0 for j in range(0, p_num):
tracking all computations (i.e., it is 148 75000 57404.0 0.8 15.7 acc[i,j] = force[i,j]*rmass
not sampling), but the code lines 149 # end for
150 # end for
relative to the run time still impart
151
some good information (Listing 7).
152 50 32.0 0.6 0.0 return pos, vel, acc
Note that the code ran slower by
Listing 5: md001.py
## MD is the main program for the molecular dynamics simulation. #
# # Input, integer STEP_NUM, the number of time steps.
# Discussion: # A value of 500 is a small but reasonable value.
# MD implements a simple molecular dynamics simulation. # The default value is 500.
# #
# The velocity Verlet time integration scheme is used. # Input, real DT, the time step.
# The particles interact with a central pair potential. # A value of 0.1 is large; the system will begin to move quickly but the
# # results will be less accurate.
# Licensing: # A value of 0.0001 is small, but the results will be more accurate.
# This code is distributed under the GNU LGPL license. # The default value is 0.1.
# Modified: #
# 26 December 2014
# import platform
# Author: from time import clock
# John Burkardt import numpy as np
# from sys import exit
# Parameters: import time
# Input, integer D_NUM, the spatial dimension.
# A value of 2 or 3 is usual. def timestamp ( ):
# The default value is 3. t = time.time ( )
# print ( time.ctime ( t ) )
# Input, integer P_NUM, the number of particles.
# A value of 1000 or 2000 is small but "reasonable". return None
# The default value is 500. # end def
W W W. A D M I N - M AGA Z I N E .CO M A D M I N 55 81
N U TS A N D B O LTS Python Code Analysis
82 A D M I N 55 W W W. A D M I N - M AGA Z I N E .CO M
Python Code Analysis N U TS A N D B O LTS
about a factor of 10. Only the parts ing potential energy and forces. This function. Perhaps a bit counterin-
of the code with some fairly large code produced the output shown tuitively, I created a function that
percentages of time are shown. in Listing 8. Notice that the time initializes the algorithm and a sec-
The output from pprofile provides an to compute the potential and force ond function for the update loops
indication of where the code uses the update values is 181.9 seconds with and called the resulting code md_002.
most time: a total time of 189.5 seconds. Obvi- py. (My modified code is available
ously, this is where you would need to online.) Because the potential energy
* The loop computing <C>rij[k]<C>. focus your efforts
to improve code Listing 6: cProfile Output
* The loop summing <C>d<C> (collective performance. Sat Oct 26 09:43:21 2019
operation). 12791090 function calls (12788322 primitive calls) in 163.299 seconds
First Function
* Computing the square root of <C>d<C>. Ordered by: cumulative time
Creation
* Computing <C>d2<C>. The potential ncalls tottime percall cumtime percall filename:lineno(function)
energy and force 148/1 0.001 0.000 163.299 163.299 {built-in method builtins.exec}
* Computing the <C>potential<C> energy. computations 1 159.297 159.297 163.299 163.299 md_001.py:3()
are the dominant 12724903 3.918 0.000 3.918 0.000 {built-in method builtins.min}
* The loop computing the <C>force<C> array. part of the run 175/2 0.001 0.000 0.083 0.042 :978(_find_and_load)
time, so to better 175/2 0.001 0.000 0.083 0.042 :948(_find_and_load_unlocked)
Another option is to put timing points profile them, it 165/2 0.001 0.000 0.083 0.041 :663(_load_unlocked)
throughout the code, focusing primar- is best to isolate ...
ily on the section of the code comput- that code in a
W W W. A D M I N - M AGA Z I N E .CO M A D M I N 55 83
N U TS A N D B O LTS Python Code Analysis
and force computations change very Final Version md_003.py, has a properties function
little, I won’t be profiling this version that computes the potential energy
of the code. All I did was make sure The final version of the code moves and forces.
I was getting the same answers as in the section of code computing the po- The cProfile results don’t show any-
the previous version. However, feel tential energy and forces into a func- thing useful, so I will skip that out-
free to practice profiling it. tion for better profiling. The code, put. On the other hand, the pprofile
84 A D M I N 55 W W W. A D M I N - M AGA Z I N E .CO M
Python Code Analysis N U TS A N D B O LTS
W W W. A D M I N - M AGA Z I N E .CO M A D M I N 55 85
N U TS A N D B O LTS Python Code Analysis
Listing 9: md_003.py Output Excerpts (continued) and running on all eight cores of my
laptop (four “real” cores and four
210| 0| 0| 0| 0.00%|
211| 50| 0.000211716| 4.23431e-06| 0.00%| return force, kinetic, potential hyper-threading (HT) cores), it ran in
212| 0| 0| 0| 0.00%| about 3.6 seconds. I would call that
213| 0| 0| 0| 0.00%|# end def a success.
...
86 A D M I N 55 W W W. A D M I N - M AGA Z I N E .CO M
N U TS A N D B O LTS Fibre Channel SAN Bottlenecks
Tune Up
We discuss the possible bottlenecks in Fibre Channel storage area message and all buffer credits are
used up, no further data packets are
networks and how to resolve them. By Roland Döllinger transmitted until the sender receives
the message. Actual flow control of
In the past, spinning hard disks bus adapter (HBA) and a switch port, the data is handled by the higher
were often a potential bottleneck negotiate a number of FC frames, level SCSI protocol.
for fast data processing, but in the which are added to the input buffer Suppose a server writes data over
age of hybrid and all-flash storage as buffer credits at the other end, a Fibre Channel SAN to a remote
systems, the bottlenecks are shift- allowing the sender to transmit a cer- storage system; the FC frames are
ing to other locations on the storage tain number of frames to the receiver forwarded to multiple locations along
area network (SAN). I talk about on a network without having to wait the way in the B2B process, as is the
where it makes sense to influence for each individual data packet to be case whenever an HBA or a storage
the data stream and how possible confirmed (Figure 1). port communicates with a switch
bottlenecks can be detected at an For each data packet sent, the buf- port or two switches exchange data
early stage. To this end, I will be fer credit is reduced by a value of with each other over one or more
determining the critically important one, and for each data packet con- Inter-Switch Link (ISL) connections
performance parameters within a firmed by the other party, the value connected in parallel. With this FC
Fibre Channel SAN and showing op- increases by one. The remote sta- transport layer method – service
timization approaches. tion sends a receive ready (R_RDY) class 3 (connectionless without ac-
The Fibre Channel (FC) protocol is message to the sender as soon as knowledgement) optimized for mass
connectionless and transports data the frames have been processed storage data – many FC devices can
packets in buffer-to-buffer (B2B) and new data can be sent. If the communicate in parallel with high
mode. Two endpoints, such as a host sender does not receive this R_RDY bandwidth. However, this type of
Lead Image © 3dkombinat, 123RF.com
Figure 1: The Fibre Channel frames are transferred by the SAN from the sender (server) to the receiver (storage array) over the FC switch
ports by the connectionless buffer-to-buffer method.
88 A D M I N 55 W W W. A D M I N - M AGA Z I N E .CO M
Fibre Channel SAN Bottlenecks N U TS A N D B O LTS
communication also has weaknesses, speeds between endpoints on the number (LUN) of a storage system.
which quickly become apparent in SAN. For example, if the HBA oper- When the commands arrive, they are
certain constellations. ates at a bandwidth of 8Gbps while put into a kind of waiting loop be-
the front-end port on the storage fore it is their turn to be processed.
Backlog by R_RDY Messages system operates at 16Gbps, the stor- Especially for random I/O opera-
age port can process the data almost tions, this arrangement offers signifi-
One example of this type of backlog twice as fast as the HBA. In return, cant performance gain.
is an HBA or memory port that does at full transfer rate, the storage The number of I/O operations that
not return R_RDY messages to the system returns twice the volume of can be buffered in this queue is
sender because of a technical defect data to the HBA that it could pro- known as the queue depth. Important
or driver problem or that only returns cess in the same time. values include the maximum queue
R_RDY messages to the sender after Buffering the received frames also depth per LUN and per front-end
a delay. In turn, transmission of new nibbles away the buffer credits there, port of a storage array. These values
frames are delayed. Incoming data is which can cause a backlog and a are usually fixed in the storage sys-
then stored and consumes the avail- fabric congestion given a continu- tem and immutable. On the other
able buffer credits. The backlog then ously high data transfer volume. The hand, you can specify the maximum
spreads farther back and gradually situation becomes even more drastic queue depth on the server side of
uses up the buffer credits of the other with high data volumes at 4 and the HBA or in its driver. Make sure
FC ports on the route. 32Gbps. Such effects typically occur that the sum of the queue depths of
Especially with shared connections, at high data rates on the ports of the all LUNs on a front-end port does
all SAN subscribers who communi- nodes with the lowest bandwidth in not exceed its maximum permitted
cate over the same ISL connection are the data stream. queue depth. If, for example, 100
negatively affected because no buffer Additionally, too high a fan-in ratio of LUNs are mapped to an array port
credits are available for them during servers to the storage port is possible and addressed by their servers with
this period. A single slow-drain de- (i.e., too high a volume of data from a queue depth of 16, the maximum
vice can lead to a massive drop in the the servers arriving at the storage queue depth value at the array port
performance of many SAN devices port, which is no longer able to pro- must be greater than 1,600. If, on the
(fabric congestions). Although most cess the data). My recommendation other hand, the maximum value of
FC switch manufacturers have now is therefore to adapt the speed of the a port is only 1,024, the connected
developed countermeasures against HBA and storage port to a uniform servers can only work with a queue
such fabric congestions, they only speed and, depending on the data depth of 10 with these LUNs. It
take effect when the problem has transfer rates, maintain a moderate makes sense to ask the vendor about
already occurred and are only avail- fan-in ratio between servers and the the limits and optimum settings for
able for the newer generations of SAN storage port, if possible. the queue depth.
components. To reduce the data traffic fundamen- If a front-end port is overloaded be-
To detect fabric congestions at an tally over the ISLs, you will want to cause of incorrect settings and too
early stage, you at least need to moni- configure your servers such that the many parallel I/O operations, and
tor the ISL ports on the SAN for such hosts only read locally in the case of all queues are used up, the storage
situations. One indicator of this kind cross-location mirroring (e.g., with array sends a Queue_Full or Device_
of bottleneck is the increase in the the Logical Volume Manager) and Busy message back to the connected
zero buffer credit values at the ISL only access both storage systems servers, which triggers a complex
ports. These values indicate how when writing. With a high read rate, recovery mechanism that usually
often units had to wait 2.5μs for this approach immensely reduces ISL affects all servers connected to this
the R_RDY message to arrive before data traffic and thus the risk of poten- front-end port. On the other hand, a
further frames could be sent. If this tial bottlenecks. balanced queue depth configuration
counter grows to a value in the mil- can often tweak that extra share of
lions within a few minutes, caution Overcrowded Queue server and storage performance out
is advised. In such critical cases, the of the systems. If the mapped serv-
counters for “link resets” and “C3
Slows SAN ers or the number of visible LUNs
timeouts” at the affected ISL ports The SCSI protocol also has ways to change significantly, you need to
usually also grow. accelerate data flow. The Command update the calculations to prevent
Queuing and I/O Queuing methods gradual overloading.
Data Rate Mismatches supported by SCSI-3 achieve a sig-
nificant increase in performance. Watch Out for Multipathing
A similar effect as in the previous For example, a server connected to
case can occur if a large volume the SAN can send several SCSI com- Standard operating system settings
of data is transferred at different mands in parallel to the logical unit often lead to an imbalance in data
W W W. A D M I N - M AGA Z I N E .CO M A D M I N 55 89
N U TS A N D B O LTS Fibre Channel SAN Bottlenecks
traffic, so you will want to pay atten- (QoS) strategy allows for better plan- this method is not well suited for a
tion to Fibre Channel multipathing ning of data streams and means that more granular distribution of im-
of servers, wherein only one of sev- critical servers and applications can portant applications, because the
eral connections are actively used. be prioritized from a performance administrative overhead and techni-
This imbalance then extends to the perspective. cal limitations speak against it. For
SAN and ultimately to the storage Vendors of storage systems, SAN this purpose, it is possible to route
array. Potential performance bottle- components, or HBAs have differ- servers through the SAN through
necks occur far more frequently in ent technical approaches to this specially prioritized zones in the
such constellations. Modern stor- problem, but they are not related. In data flow. In this way, the frames
age systems today use active-active no place here can the data flow be of high-priority zones receive the
mode over all available controllers centrally controlled and regulated right of way and are preferred in the
and ports. You will want to leverage across all components. Moreover, event of a bottleneck.
these capabilities for the benefit of most solutions do not make a clear On the storage systems themselves,
your environment. distinction between normal opera- QoS functionalities have been estab-
Sometimes the use of vendor- tion and failure mode. For example, lished for some time and are there-
specific multipathing drivers can if performance problems occur fore the most developed. Depending
be expedient. These drivers are within the SAN, the storage system on the manufacturer or model, data
typically slightly better suited to stoically retains its prioritized set- throughput can be limited in terms
the capabilities of the storage array, tings, because it knows nothing of megabytes or I/O operations per
have more specific options, and about the problem. second for individual LUNs, pools,
are often better suited for monitor- Although initial approaches have or servers – or, in return, prioritized
ing than standard operating system been made for communication be- at the same level. Such functions re-
drivers. On the other hand, if you tween HBAs and SAN components quire permanent performance moni-
want to keep your servers regularly to act across the board, they only toring, which is usually available
patched, a certain version main- work with newer models and are under a free license with modern
tenance and compatibility check only available for a few perfor- storage systems. Depending on the
overhead can be a result of such mance metrics. Special HBAs and setting options, less prioritized data
third-party software. their drivers support prioritization is then permanently throttled or
at the LUN level on the server. The only sent to the back of the queue if
Optimizing Data Streams drawback is that you have to set up a bottleneck situation is looming on
each individual server, which can the horizon.
with QoS be a mammoth task with hundreds However, be aware that applications
Service providers who simulta- of physical servers – not to mention in a dynamic IT landscape lose pri-
neously support many different the effort of large-scale server virtu- ority during their life cycle and that
customers with many performance- alization. you will have to adjust the settings
hungry applications in their storage Various options also exist for pri- associated with them time and time
environments need to ensure that oritizing I/Os for SAN components. again. Whether you’re prioritizing
mission-critical applications are Basically, the data stream could be storage, SAN, or servers, you should
assigned the required storage per- directed through the SAN with the always choose only one of these
formance in a stable manner at all use of virtual fabrics or virtual SANs three levels at which you control the
times. An advantage for one applica- (e.g., to separate test and produc- data stream; otherwise, you could
tion can be a disadvantage for an- tion systems or individual customers easily lose track in the event of a
other. A consistent quality of service logically from each other). However, performance problem.
90 A D M I N 55 W W W. A D M I N - M AGA Z I N E .CO M
Fibre Channel SAN Bottlenecks N U TS A N D B O LTS
Determining Critical response times, however, you need operation as a baseline to compare
to differentiate between random with the current values of the prob-
Performance KPIs
and sequential access, because the lem situation.
The basis for the efficient provision block sizes of the two access types This comparison would reveal, for ex-
of SAN capacities is good, permanent differ considerably. For example, se- ample, whether performance-hungry
monitoring of all important SAN quential processing within a storage servers or applications are suddenly
performance indicators. You should array often takes far longer because generating 30 percent more I/O op-
know your key performance indi- of the larger block size than random erations after software updates and
cators (KPIs) and document these processing, and this difference is re- affecting other servers in the same
values over a long period of time. flected in response time. environment as noisy neighbors, or
Whether you work with vendor per- Many of the values in the storage ar- whether I/O operations can no longer
formance tools or with higher level ray differ depending on the system be processed by individual connec-
central monitoring software that architecture and cannot be set across tions because of defective compo-
queries the available interfaces (e.g., the board; you will need to contact nents or cables. However, you need
SNMP, SMI-S, or REST API), defining the vendor to find out at which uti- to gain experience in the handling
KPIs for servers, SAN, and storage is lization level a component’s perfor- and interpretation of the perfor-
decisive. On the server side, the re- mance is likely to be impaired and mance indicators from these tools to
sponse times or I/O wait times of the inquire about further critical measur- be sufficiently prepared for genuine
LUNs or disks are certainly an impor- ing points, as well. Various vendor problems. Storage is often mistakenly
tant factor, but the data throughput tools offer preset limits based on best suspected of being the endpoint of
(MBps) for the connected HBAs also practices, which can also be adapted performance problems.
can be helpful. to your own requirements. If you can make a well-founded and
Within the SAN you need to pay spe- Additionally, when planning the verifiable statement about the load
cial attention to all ISL connections, growth of your environment, make situation of your SAN environment
because often a bottleneck in data sure that if a central component (e.g., within a few minutes and precisely
throughput occurs, or, as described, an HBA on the server, a SAN switch, put your finger on the overload situa-
buffer credits are missing. Alerts are or a cache or processor board) fails, tion and its causes – or provide con-
also conceivable for all SAN ports the storage array can continue to trary evidence, backed up with well-
when 80 or 90 percent of the maxi- work without problems and does founded figures that help to discover
mum data throughput rate is reached, not lead to a massive impairment of where the problem is arising – you
which you can use to monitor all operations or even to outages of indi- will leave observers with a positive
HBAs and storage ports for this met- vidual applications. impression.
ric. However, you should be a little
more conservative with the monitor- Equipped for Emergencies Conclusions
ing parameters and feel your way for-
ward slowly. Experience has shown Even if you are familiar with the Given compliance with a few im-
that approaching bottlenecks are SAN infrastructure and have set up portant rules and monitoring in
often overlooked if too many alerts appropriate monitoring at key points the right places, even large Fibre
are regularly received and have to be (Table 1), performance bottlenecks Channel storage networks can be
checked manually. cannot be completely ruled out. A operated with great performance
component failure, a driver problem, and stability. If you give priority
Optimizing Array or a faulty Fibre Channel cable can to the most important applications
cause sudden problems. If such an at a suitable point, you can keep
Performance incident occurs and important appli- them available even in the event of
For a storage array, the load on the cations are affected, it is important a problem. If you are also trained
front-end processors, the cache write to gain a quick overview of the es- in the use of performance tools and
pending rate, and the response times sential performance parameters of have the values from normal opera-
of all LUNs presented to the servers the infrastructure. Therefore, it is tion as a reference, the causes of
are important values you will want very helpful if you have the relevant performance problems can often be
to monitor. In the case of the LUN values from unrestricted normal identified very quickly. Q
W W W. A D M I N - M AGA Z I N E .CO M A D M I N 55 91
N U TS A N D B O LTS initramfs and dracut
A New Beginning
If your Linux system is failing to boot, the dracut tool can be a convenient way to build a new ramdisk.
By Thorsten Scherf
After moving your hard disk to a tems, this is typically systemd. The You can view the contents of the disk
new system, the Linux system sud- init process can then use the drivers with the cpio tool. The associated file
denly fails to boot. Often this happens and programs provided by initramfs is simply a cpio archive, but lsinitrd
because of missing drivers in the to gain access to the root volume it- gives you a more elegant and conve-
ramdisk, which the kernel needs to self. The root volume is usually avail- nient approach:
boot the system. In this article, I take a able on a local block device but can
closer look at the handling of the init- also be mounted over the network, lsinitrd U
ramfs file and introduces dracut [1] as if required. For this to work, all the /boot/initramfs-$(uname -r).img | less
a practical helper. required drivers must, of course, be
Many users only see the initramfs available in the initramfs. If you are only interested in the ker-
(initial random access memory file- These can be drivers for LVM, RAID, nel drivers provided by this ramdisk,
system) archive file as yet another file the filesystem, the network, or a vari- you can restrict the output:
in the boot directory. It is automati- ety of other components. The details
cally created when a new kernel is of this depend on the individual con- lsinitrd /boot/initramfsU
installed and deleted again when the figuration of the system. For example, -$(uname -r).img | grep U
kernel is removed from the system. if the root filesystem is located on an -o '/kernel/drivers/.*xz'
But this initial ramdisk plays an im- encrypted partition, the tools for ac-
portant role, since it ensures that the cessing it must be available within The command in Listing 1 tells the
root filesystem can be accessed after the ramdisk. tool to display only the available net-
the computer has been restarted, to When installing a new kernel, the work card drivers.
be able to access all the tools that are ramdisk is automatically created and
necessary for the computer to con- installed based on the system proper- Support from dracut
tinue booting. ties. On RPM-based distributions, for
The GRUB2 bootloader, used in most example, the new-kernel-pkg tool is In some cases, you may now need
Lead Image © Barmaliejus, Fotolia.com
cases today, is responsible for load- used; it is called automatically as part to create a new ramdisk manually.
ing the Linux kernel (vmlinuz) and a of the kernel installation. By default, For example, if you want it to sup-
ramdisk (initramfs) into memory at the ramdisk resides alongside the ker- port new hardware or allow access
boot time. The kernel then mounts nel in the /boot directory, and a new to a newly encrypted volume, you
the ramdisk on the system as a root entry for the bootloader is created so have no alternative but to create
volume and then starts the actual that, after a reboot, the new kernel a new initramfs for the current
init process. On current Linux sys- loads with the appropriate initramfs. kernel. The easiest way to do this
92 A D M I N 55 W W W. A D M I N - M AGA Z I N E .CO M
initramfs and dracut N U TS A N D B O LTS
is to use the dracut tool, which is Listing 1: Available Network Card Drivers
a framework that provides specific
lsinitrd /boot/initramfs-$(uname -r).img | grep -o '/kernel/drivers/net/.*xz'
functions within an initial ramdisk
/kernel/drivers/net/ethernet/broadcom/bnx2x/bnx2x.ko.xz
based on modules. On a Fedora
/kernel/drivers/net/ethernet/broadcom/cnic.ko.xz
system these modules are located /kernel/drivers/net/mdio.ko.xz
in the /usr/lib/dracut/modules.d/
directory. For Linux veterans, dra-
cut also offers a wrapper named Listing 2: File Size Comparison
mkinitrd, but it is far less flexible
ls -ls /boot/initramfs-$(uname -r)*.img
than calling dracut directly. To cre- 24350 -rw-------. 1 root root 24932655 Apr 16 16:01 /boot/initramfs-4.20.10-200.fc29.x86_64.img
ate a new initramfs archive, just 69242 -rw-------. 1 root root 70901695 Apr 16 16:04 /boot/initramfs-4.20.10-200.fc29.x86_64-new.img
run the following command in the
simplest case:
Listing 3: Specify Kernel Version
dracut --force U
dracut --kver 3.10.0-957.el7.x86_64 /boot/initramfs-$(uname -r)-other-kernel.img
/boot/initramfs-$(uname -r).img ls -l /boot/initramfs-3.10.0-957.el7.x86_64.img
-rw-------. 1 root root 22913501 Apr 14 11:00 /boot/initramfs-3.10.0-957.el7.x86_64.img
The tool uses host-only mode by
default and overwrites the existing
initramfs file if the --force option is Clevis module are now part of the In the bootloader configuration you
set. In this mode, dracut only uses the initramfs: need to remove the rhgb and quiet
modules and drivers needed for the entries, if present, to ensure that
operation of the local system. If you lsinitrd /boot/U messages are displayed on the screen
plan to use the hard disk in a new initramfs-$(uname -r)U when booting.
system in the future, disable host-only -clevis.img|grep clevis Additionally, add the rd.shell and
mode as follows: rd.debug entries to the kernel line of
To include a specific kernel driver the bootloader so that dracut starts
dracut --no-hostonly U in the initramfs, you can use the a corresponding shell in case of an
/boot/initramfs-$(uname -r)-new.img command: error and outputs further debug mes-
sages. The dracut tool also writes
The fact that dracut now writes far dracut --add-drivers bnx2x U the messages to the /run/initramfs/
more data into the initramfs file is /boot/initramfs-$(uname -r)-bnx2x.img rdsosreport.txt file. Both changes
easily seen by comparing the sizes of can be made either statically in the
the two files (Listing 2). Here, too, the call to lsinitrd should bootloader configuration file or by
The following command shows which confirm that the drivers are in place dynamically editing the boot menu
modules – and thus functions – dra- in the archive. Which drivers or mod- entry.
cut provides: ules are required, of course, depends
on the system on which the initramfs Conclusions
dracut --list-modules is to be used.
By default, dracut always creates an Thanks to dracut, all the major Linux
If you want to use a new ramdisk initramfs archive for the kernel cur- distributions provide a framework
on a system on which the Clevis rently in use. In some cases, it may be for creating an initial ramdisk. The
encryption framework is required to necessary to create the archive file for framework is very flexible, supports
enable access to the root partition, a different kernel version. This is eas- booting a system from many differ-
the matching dracut module needs to ily done if the desired kernel version is ent sources, and enables block device
be included in the initramfs file. The specified with the --kver option when abstractions like RAID, LVM device
output from dracut --list-modules calling dracut (Listing 3). mapper, FCoE, iSCSI, NBD, and NFS.
should first confirm that dracut is fa- Thanks to its modular structure, the
miliar with the Clevis module. If this Troubleshooting the Shell tool can be easily combined with
is the case, include the module in the other frameworks to enable, say, au-
initramfs archive as follows: If the system does not boot as usual tomatic decryption of LUKS volumes
and access to the root volume is not through Clevis integration. Q
dracut --add clevis U possible, dracut provides a shell for
/boot/initramfs-$(uname -r)-clevis.img troubleshooting, if required. It is
a good idea to make the following Info
The following call should confirm changes to the bootloader configu- [1] dracut [https://dracut.wiki.kernel.org/
that the files belonging to the ration to facilitate troubleshooting. index.php/Main_Page]
W W W. A D M I N - M AGA Z I N E .CO M A D M I N 55 93
N U TS A N D B O LTS Performance Tuning Dojo
High
Definition
We take a look at three benchmarking tool favorites: time,
hyperfine, and bench. By Federico Lucifredi
At the Dragon Propulsion Labora- favorites from the simplest to the much CPU time was allocated in user
tory, we are partial to using the sim- more sophisticated. and kernel (sys) modes:
plest tool that will do the job at hand
– particularly when dealing with $ time sleep 1
Tempus Fugit
the inherent complexity that perfor- real 0m1.004s
mance measurement (and tuning) The benchmark archetype is time: user 0m0.002s
brings to the table. Yet that same simple, easy to use, and well under- sys 0m0.001s
complexity often requires advanced stood by most users. In its purest
tooling to resolve the riddles posed form, time takes a command as a What not everyone knows is that the
by performance questions. I will ex- parameter and times its execution in default time command is actually one
amine my current benchmarking tool the real world (real), as well as how of the bash-builtins [1]:
c Number of times the process was context-switched involuntarily (time slice expired)
The sixth Dojo was dedicated to GNU
e Wall clock time used by the process (seconds)
time’s amazing capabilities, and I
k Number of signals delivered to the process invite you to read up in your prized
p Average unshared stack size of the process archive of ADMIN back issues [3].
Lead Image © Lucy Baldwin, 123RF.com
94 A D M I N 55 W W W. A D M I N - M AGA Z I N E .CO M
Performance Tuning Dojo N U TS A N D B O LTS
The Author
Federico Lucifredi (@0xf2) is the Product
Management Director for Ceph Storage at
Red Hat and was formerly the Ubuntu Server
Project Manager at Canonical and the Linux
“Systems Management Czar” at SUSE. He
enjoys arcane hardware issues and shell-
scripting mysteries and takes his McFlurry
shaken, not stirred. You can read more from
him in the new O’Reilly title AWS System
Figure 4: Bench uses a pure HTML canvas to visualize results interactively. Administration.
96 A D M I N 55 W W W. A D M I N - M AGA Z I N E .CO M
S E RV I C E Contact Info / Authors
WRITE FOR US
Admin: Network and Security is looking • unheralded open source utilities
for good, practical articles on system ad- • Windows networking techniques that
ministration topics. We love to hear from aren’t explained (or aren’t explained
IT professionals who have discovered well) in the standard documentation.
innovative tools or techniques for solving We need concrete, fully developed solu-
real-world problems. tions: installation steps, configuration
Tell us about your favorite: files, examples – we are looking for a
• interoperability solutions complete discussion, not just a “hot tip”
• practical tools for cloud environments that leaves the details to the reader.
• security problems and how you solved If you have an idea for an article, send
them a 1-2 paragraph proposal describing your
• ingenious custom scripts topic to: edit@admin-magazine.com.
Contact Info
Editor in Chief While every care has been taken in the content of
Joe Casad, jcasad@linuxnewmedia.com the magazine, the publishers cannot be held re-
Managing Editors sponsible for the accuracy of the information con-
Rita L Sooby, rsooby@linuxnewmedia.com tained within it or any consequences arising from
Lori White, lwhite@linuxnewmedia.com the use of it. The use of the DVD provided with the
Senior Editor magazine or any material provided on it is at your
Ken Hess own risk.
NOW PRINTED ON recycled paper Copyright and Trademarks © 2020 Linux New
Localization & Translation
from 100% post-consumer waste; Ian Travis Media USA, LLC.
no chlorine bleach is used in the News Editor No material may be reproduced in any form
production process. Jack Wallen whatsoever in whole or in part without the writ-
ten permission of the publishers. It is assumed
Copy Editors
that all correspondence sent, for example, let-
Amy Pettle, Megan Phelps, Amber Ankerholz
Authors ters, email, faxes, photographs, articles, draw-
Layout ings, are supplied for publication or license to
Konstantin Agouros 70 Dena Friesen, Lori White third parties on a non-exclusive worldwide
Chris Binnie 46, 64 Cover Design basis by Linux New Media unless otherwise
Dena Friesen, Illustration based on graphics by stated in writing.
Samuel Bocetta 22 liu zishan, 123RF.com All brand or product names are trademarks
Roland Döllinger 88 Advertising of their respective owners. Contact us if we
Brian Osborn, bosborn@linuxnewmedia.com haven’t credited your copyright; we will always
Thomas Drilling 42 phone +49 89 3090 5128 correct any oversight.
Mathias Hein 16 Publisher Printed in Nuremberg, Germany by hofmann
Brian Osborn infocom GmbH on recycled paper from 100% post-
Ken Hess 3 Marketing Communications consumer waste; no chlorine bleach is used in the
Gwen Clark, gclark@linuxnewmedia.com production process.
Petros Koutoupis 10
Linux New Media USA, LLC Distributed by Seymour Distribution Ltd, United
Jeff Layton 78 2721 W 6th St, Ste D Kingdom
Lawrence, KS 66049 USA
Martin Loschwitz 30, 36, 60 ADMIN (ISSN 2045-0702) is published bimonthly
Customer Service / Subscription by Linux New Media USA, LLC, 2721 W 6th St, Ste D,
Federico Lucifredi 94 For USA and Canada: Lawrence, KS 66049, USA. January/February 2020.
Thorsten Scherf 54, 92 Email: cs@linuxnewmedia.com Periodicals Postage paid at Lawrence, KS. Ride-
Phone: 1-866-247-2802 Along Enclosed. POSTMASTER: Please send
Christian Schulenburg 26 (Toll Free from the US and Canada) address changes to ADMIN, 2721 W 6th St, Ste D,
For all other countries: Lawrence, KS 66049, USA.
Jack Wallen 8
Email: subs@linuxnewmedia.com Published in Europe by: Sparkhaus Media GmbH,
Matthias Wübbeling 52 www.admin-magazine.com Zieblandstr. 1, 80799 Munich, Germany.
98 A D M I N 55 W W W. A D M I N - M AGA Z I N E .CO M