Vous êtes sur la page 1sur 112

Demystifying A Glimpse Of Microservices ` 120

Serverless Computing With Kubernetes And Docker


Volume: 05 | Issue: 12 | Pages: 112 | September 2017

Simplifies Web
App Testing
How To Secure And Test
A Few Tips For
Web ApplicAtions Scaling Up Web

Getting Started
With PHP,
The Popular

Interview: Case Study: Open Source Enables

Karanbir Singh, PushEngage To Serve 20 Million
Project Leader, CentOS Push Notifications Each Day!

Register Now!
https://2ndQuadrant.com • info@2ndQuadrant.com
India +91 20 4014 7882 • USA +1 650 378 1218 • UK +44 870 766 7756

Experience Innovation
without lock-in!


More speed
More reliability
More scalability

29 A Primer on Software Defined
Networking (SDN) and the
OpenFlow Standard
35 Taming the Cloud:
Provisioning with Terraform
40 Visualising the Response
Time of a Web Server
Using Wireshark
42 DevOps Series Creating a
Virtual Machine for Erlang/
OTP Using Ansible
47 An Introduction to govcsim
(a vCenter Server Simulator)
57 A Glimpse of Microservices
with Kubernetes and Docker

59 Selenium: A Cost-Effective
Serverless Architectures:
Test Automation Tool for Demystifying Serverless Computing
Web Applications
65 Splinter: An Easy Way to Test
Web Applications
73 Crawling the Web with Scrapy
77 Five Friendly Open Source
Tools for Testing Web
81 Developing Research Based
Web Applications Using Red
Hat OpenShift
85 A Few Tips for Scaling Up
Web Performance
90 Regular Expressions in
Programming Languages:

The Story of C++
Using the Spring Boot Admin UI for
FOR U & ME Spring Boot Applications
88 Open Source Enables
PushEngage to Serve 20
Million Push Notifications

Each Day!
Eight Top-of-the-Line Open
Source Game Development 06 FOSSBytes 18 New Products 108 Tips & Tricks

4 | SEPtEmbER 2017 | OPEN SOURCE FOR YOU | www.OpenSourceForU.com

Rahul chopRa

Editorial, SubScriptionS & advErtiSing

Delhi (hQ)
d-87/1, okhla industrial area, phase i, new delhi 110020
ph: (011) 26810602, 26810603; Fax: 26817563
E-mail: info@efy.in

MiSSing iSSuES
e-mail: support@efy.in

back iSSuES
Kits ‘n’ Spares
new delhi 110020
ph: (011) 26371661, 26371662
E-mail: info@kitsnspares.com

nEwSStand diStribution
ph: 011-40596600
E-mail: efycirc@efy.in
ph: (022) 24950047, 24928520
E-mail: efymum@efy.in

ph: (080) 25260394, 25260023
E-mail: efyblr@efy.in

ph: 08800295610/ 09870682995
E-mail: efypune@efy.in

ph: (079) 61344948
E-mail: efyahd@efy.in


Getting Started with PHP, the Popular

power pioneer group inc.
ph: (86 755) 83729797, (86) 13923802595
E-mail: powerpioneer@efy.in

Programming Language JaPaN

tandem inc., ph: 81-3-3541-4166
E-mail: tandem@efy.in

OpenGurus SiNGaPORe
publicitas Singapore pte ltd
105 Communication Protocols for the ph: +65-6836 2272

“CentOS Internet of Things: A Few Choices

E-mail: publicitas@efy.in

Linux is TaiwaN
built on a lot Columns J.k. Media, ph: 886-2-87726780 ext. 10
E-mail: jkmedia@efy.in
of past 16 CodeSport
experience” uNiTeD STaTeS
20 Exploring Software: Importing E & tech Media
ph: +1 860 536 6677
GNUCash Accounts in GNUKhata E-mail: veroniquelamarque@gmail.com

printed, published and owned by ramesh chopra. printed at tara

art printers pvt ltd, a-46,47, Sec-5, noida, on 28th of the previous
month, and published from d-87/1, okhla industrial area, phase i, new
Dri Live (64-bit)
delhi 110020. copyright © 2017. all articles in this issue, except for

,D interviews, verbatim quotes, or unless otherwise explicitly mentioned,
Karanbir Singh, project

will be released under creative commons attribution-noncommercial




leader, CentOS 3.0 unported license a month after the date of publication. refer to
This distro is designed to be fast, easy to use and provide a minimal



yet complete desktop environment. It is a penetration testing and


http://creativecommons.org/licenses/by-nc/3.0/ for a copy of the


security assessment oriented Linux distribution, which offers a


network and systems analysis toolkit.

licence. although every effort is made to ensure accuracy, no responsi-


bility whatsoever is taken for any loss due to publishing errors. articles

tended, and sh that cannot be used are returned to the authors if accompanied by a

s unin oul
c, i d be
self-addressed and sufficiently stamped envelope. but no responsibility


terial, if found

is taken for any loss or delay in returning the material. disputes, if any,
d to t
he complex n

will be settled in a new delhi court only.

DVD of The Month
l e ma

September 2017



j Int
ob ern
Any t dat e
Note: a.

Test and secure your applications.

• BackBox Linux 5 Live
6 Year Newstand Price You Pay Overseas
• Mageia 6 GNOME Live (64-bit) GNOME Live (64-bit)
(`) (`)
Five 7200 4320 —
Mageia is a GNU/Linux-based operating system. It is a
three 4320 3030 —

community project, supported by a non-profit organisation
comprising elected contributors.
one 1440 1150 uS$ 120
ent cas
kindly add ` 50/- for outside delhi cheques.
em e th
lac is D
rep VD
please send payments only in favour of eFY enterprises Pvt ltd.
free do
or a es n
work p y.in f ot
roperly, write to us at support@ef
non-receipt of copies may be reported to support@efy.in—do mention
your subscription number.

www.OpenSourceForU.com | OPEN SOURCE FOR YOU | SEPtEmbER 2017 | 5

FOSSBYTES Compiled by:
Jagmeet Singh

Google launches ‘Made in India’ Angular 5 is out, with a focus

programme to showcase local on progressive Web apps
Google has released the next major
developers version of its JavaScript framework,
AngularJS. This latest version, Angular
Google has launched its own ‘Made in India’ initiative. The new development is 5, is the second major update in 2017.
designed to promote Indian While the initial release is a beta
developers, giving them a chance build of Angular 5, the search giant is
to feature their work on the Play clearly aiming to introduce major support
Store in a special section. for Google-driven progressive Web apps
“At Google Play, we are with the latest development. The new
committed to helping Indian version includes a build optimiser that
developers of all levels seize this helps to reduce the code of progressive
opportunity and build successful, apps designed through the framework.
locally relevant businesses,” said Google is working hard at
Purnima Kochikar, director of simplifying the effort that goes into
business development for games building progressive Web apps. The
and applications, Google Play. purpose of this new innovation is
Google highlights that more than 70 per cent of Internet users in India enter the to improve the experience for users
Web primarily using smartphones. This growth in smartphone usage has prompted accessing services through their
the company to encourage domestic developers to build more apps and games. mobile devices.
More content could pave the way for the Android maker to strengthen its presence
in India and around the globe.
Revealing some numbers, Google underlines that Indian users on Android
install more than a billion apps every month from Google Play, and this number is
growing by 150 per cent each year. The ‘Made in India’ initiative was launched as a
part of the App Excellence Summit that was recently hosted in Bengaluru. Google
also showcased success stories from developers including Dailyhunt, Healthifyme,
RailYatri and UrbanClap that are all natively building apps and services for the
Android platform. Skill-building consultation sessions and demonstration booths In addition to its progressive
were available at the venue for developers. Web app focus, Google is integrating
Indian developers who want to participate in the ‘Made in India’ programme Material Design components into
by Google need to fill in a self-nomination form. The apps need to be based on Angular 5. The design components in
Google’s ‘Build for Billions’ guidelines that were launched last year. Angular 5 are now compatible with
server-side rendering.
PiCluster v2.0 brings better container management Google is not the sole enabler to
for Docker deployments have enhanced browser-based apps.
Linux Toys has announced PiCluster 2.0. The new version of the open source Mozilla is also set to offer a native-
container management tool is written in Node.js and is designed to deliver an like experience on its Firefox browser
upgraded experience through cleaner CSS by bringing progressive Web apps
and JQuery dialogue windows. to the front. The team behind PWAs
PiCluster 2.0 brings automatic (progressive Web apps) is working
container failover to different hosts. towards making these as the technology
It fixes reported errors in npm build that everyone can use.
dependency as well as utilises The release cycle of Angular has
enhancements on the CSS front to deliver been quite aggressive. Google plans to
a fresh look to the tool’s Web console. release the next major version, slated as
Additionally, users can deploy container management without Internet access by Angular 6, sometime in March or April
using the Web server to deliver required libraries. next year. Meanwhile, the theme for
On booting up PiCluster 2.0, you’ll be welcomed with a new screen. The open Angular 5 is ‘Easier, smaller, faster’.
source community has also contributed a lot of features to the latest PiCluster

6 | SEPTEMBER 2017 | OPEN SOURCE FOR YOU | www.OpenSourceForU.com


version. One of the initial contributors worked on the fix for npm dependency GNOME’s disk utility to get
errors and pm2 support. Another notable contribution improved the Web console by large file support in v3.26
adding personalisation options. Though GNOME’s disk utility will
Previous versions of PiCluster were used to display a server icon on specific receive an update to version 3.26 in
operations. However, this new build shows the operating system’s or distribution’s September, it is now expected to receive
logo for each server. There is also an automatic container failover that helps you to features such as disk resize and repair
automatically migrate a container to another host after three failed attempts. functions. The new version will also get
Many developers have started contributing to the PiCluster project. You can large file support to handle giant files.
access the PiCluster 2.0 code through its GitHub repository. It also includes a The new disk utility will be launched
detailed readme to help you deploy the tool effectively. as part of the GNOME 3.26 release.
Kai Lüke, the developer of GNOME
ActiveRuby debuts with over 40 gems and frameworks Disk Utility, has published a blog post
ActiveState, the open source languages company, has graduated its Ruby release that highlights the new features in the
to the first beta version. The upcoming release. The latest version
commercially supported Ruby is touted to offer a file system resize.
distribution is supposedly far better Generally, it is not possible to estimate
than other available options. the exact space occupied by a specific
Ruby is actively used by a file system. So the new disk utility
diverse set of developers around package will resize file systems that are
the world. The language is in partitions. The future releases will also
preferred for its complete, simple, receive improved support for both NTFS
extensible and portable nature. and FAT file system resizing.
ActiveRuby is based on Ruby v2.3.4 and includes over 40 popular gems and The updated GNOME disk utility
frameworks, including Rails and Sinatra. There is also seamless installation will also have the ability to update
and management of Ruby on Windows to reduce configuration time as well as the window for power state changes.
increase developer and IT productivity. Additionally, the new version will
Enterprise developers can adopt the latest Ruby distribution release internally to prompt users when it stops any running
host Web applications. The Canadian company claims that ActiveRuby is far more jobs while closing an app. It will debut
secure and scalable for enterprise needs. The beta release of the language has fixed with better support for probing and
some issues of gem management to enhance security. unmounting of volumes.
The new ActiveRuby version also includes non-GPL licensed gems. All major GNOME developers will enable an
libraries for database connectors, such as MongoDB, Cassandra, Redis, PostgreSQL app menu entry in the new disk utility.
and MySQL, are also included. Additionally, ActiveRuby beta introduces cloud This will help you create an empty disk
deployment capabilities with Amazon Web Services (AWS) along with all the image. Likewise, you will get the option
necessary integration features for AWS. to check the displayed UUIDs for selected
“For enterprises looking to accelerate innovation without compromising volumes. GNOME 3.26 is scheduled to go
on security, ActiveRuby gives developers the much-needed commercial- live on September 13. You can download
grade distribution,” said Jeff Rouse, director of product management, Disk 3.25.4, which has been released for
ActiveState, in a statement. testing. Its source tarball is available for
ActiveRuby is currently available only for Windows. The release for Mac and download, and you can use it with your
Linux is supposed to roll out later in 2017. You can download the beta through the GNU/Linux distribution.
official ActiveState website.

www.OpenSourceForU.com | OPEN SOURCE FOR YOU | SEPTEMBER 2017 | 7


Arduino founder plans PayPal launches Technology Innovation Labs in India

‘sustainable’ growth PayPal has launched two of its Technology Innovation Labs in India to support
Massimo Banzi, the developer of developments specific to new age technology. Located at PayPal’s Chennai and
the Arduino board, has agreed to Bengaluru centres, the labs are the first in India,
acquire 100 per cent ownership of opened after the Palo Alto firm launched its US and
Arduino AG, the company that owns Singapore labs.
all Arduino trademarks. The latest “India is a hotbed for technology innovation
development is supposed to help the given its evolving startup ecosystem, diverse
company generate sustainable growth merchant profiles and enormous talent pool,” said
through its open source hardware and Mike Todasco, director of innovation, PayPal. “To
software developments. cater to their needs in the most effective manner, we are delighted to announce the
“This is the beginning of a new launch of our newest Technology Innovation Labs in India, where the focus will be
era for Arduino in which we will on fuelling new age technology and giving rise to unconventional ideas with the
strengthen and renew our commitment potential to transform the ecosystem we operate in,” he added.
to open source hardware and software, PayPal’s Technology Innovation Labs will support diverse fields including
while in parallel setting the company machine learning, artificial intelligence, data science, Internet of Things, penetration
on a sound financial course of testing, software-defined radios and wireless communication, virtual and
sustainable growth,” said Banzi, in an augmented reality, computer vision and basic robotics. The company will provide
official statement. equipment like Raspberry Pi boards with sensor kits, AlphaBot kits, Amazon Echo,
As a result of the acquisition, LeapMotion and 3D printers, among others.
Banzi, 49, has become the new “Enabling innovation and creating amazing experiences for our customers is at
chairman and CTO of Arduino. The the heart of PayPal’s global success, and the Innovation Lab is another step to foster
CEO, Federico Musto, has also been this spirit in our development centres in India,” said Guru Bhat, general manager —
replaced by Dr Fabio Violante. “In the technology and head of engineering, PayPal.
past two years, we have worked very In addition to providing relevant hardware to kickstart the innovative
hard to get to this point. We envision developments, PayPal is set to integrate its native incubation centre. Called the
a future in which Arduino will apply PayPal Incubator, the centre was launched back in 2013 with an aim to support
its winning recipe to democratise the India-origin startups.
Internet of Things for individuals,
educators, professionals and Google starts discriminating against poor quality Android apps
businesses,” said Dr Violante. Google is all set to improve the user experience
Developed as an open source on Android by enhancing its search and discovery
project back in 2003, Arduino is aimed algorithms on Play Store. This will have a direct
at providing affordable solutions to impact on apps that have quality issues.
individuals to build new devices. The A new Android vitals dashboard in the Google
boards under the Arduino range are Play Console was revealed at I/O 2017 earlier this
available as open hardware and are year. The technology is designed to understand and
compatible with a range of sensors and analyse inferior app behaviour such as excessive
actuators. Last month, the Banzi-led battery consumption, slow render times and
company even partnered with the LoRa crashes – and hence set benchmarks for what passes as a quality app.
Alliance to start building hardware “Developers focusing on performance can use the Play Console to help find and
with the LoRaWAN standard. fix a number of quality issues,” wrote Andrew Ahn, product manager of Google
In May this year, the Arduino Play, in a blog post.
Foundation began to build an open Google reports that the change in its algorithms has shown that more users
source ecosystem for sectors like have downloaded quality apps. The Android maker also recommends developers
education, IoT markets, makers and to examine ratings and reviews that they received on their apps to get additional
receivers. “Our vision remains to insights about their quality.
continue enabling anybody to innovate If you are about to launch your app and test its functionality in the alpha or beta
with electronics for a long time to stage, you can use the pre-launch report to fix issues ahead of mass downloads.
come,” said Banzi. Likewise, Android vitals can be applied to identify performance issues reported by
opt-in devices.

8 | SEPTEMBER 2017 | OPEN SOURCE FOR YOU | www.OpenSourceForU.com


Developers ask Adobe to World’s first software-defined data centre gets launched in India
open source Flash Player Pi Datacenters, India’s native enterprise-class data centre and cloud services
Many developers have not provider, has launched Asia’s largest Tier IV-certified data centre in Amaravati,
welcomed Adobe’s decision to end Vijayawada. The company claims the
support for the Flash Player plugin new offering, called Pi Amaravati, is the
in 2020. Thus, a petition seeking world’s first software-defined data centre.
the open source availability of “Pi Amaravati is a major milestone
Flash Player has been released for the entire team,” said Kalyan
on GitHub. Muppaneni, founder and CEO, Pi
While Adobe may have plenty Datacenters. The new data centre uses
of reasons to kill Flash, there the OpenStack virtualisation framework
are a bunch of developers who to deliver an advanced computing, storage and networking experience. It is capable
want to save it. GitHub user Juha of offering league modular colocation and hosting services with a capacity of up to
Lindstedt, the developer who 5,000 racks. Also, the company’s enterprise cloud platform Habour1 is powered by
has filed the petition, believes open source provider SUSE.
that Flash is an important part Vijayawada-based Pi Datacenters has recently been awarded Uptime Institute
of the Internet’s history. Killing Tier IV design certification — known as the highest standard for infrastructure,
support for Flash means that future functionality and capacity.
generations would not be able to “With the launch of Pi Amaravati, we will be offering highly innovative and
access old games, websites and tailored solutions with Infrastructure-as-a-Service (IaaS), Platform-as-a-Service
experiments, Lindstedt has said. (PaaS), Disaster-Recovery-as-a-Service (DRaaS) and a host of other cloud-enabled
product and services to our esteemed partners,” Muppaneni said.
Along with launching the Pi Amaravati data centre, Pi Datacenters has entered
into a Memorandum of Understanding (MoU) with companies like PowerGrid,
IRCTC, Mahindra and Mahindra Finance, Deutsche Bank and Unibic. These
partnerships will expand open source developments in the data centre space.

LibreOffice 5.4 has ‘incremental’ compatibility with

Microsoft Office files
The Document Foundation has released an update to the LibreOffice 5 series -- the
“Open sourcing Flash specs version 5.4, which has new features for Writer, Calc and Impress.
would be a good solution to keep In the list of major tweaks over the previous version, the Document Foundation
Flash projects alive safely for states that there are a large number of ‘incremental’ improvements to Microsoft
archival reasons,” the developer Office file compatibility. “Inspired by Leonardo da Vinci’s belief that ‘simplicity
wrote in the petition. is the ultimate sophistication’, LibreOffice
Interestingly, over 3,400 developers have focused on file simplicity
people have so far signed the as the ultimate document interoperability
petition, which compares Adobe sophistication,” said the non-profit organisation
Flash with saving and restoring of in a blog post.
old manuscripts. The developers The Writer element of the LibreOffice 5.4 brings improved compatibility for
who’ve signed the petition also Microsoft Word files. The ODF and OOXML files written by the LibreOffice suite
want the interactive artwork created are also more robust and easier to share than before.
with Flash to be saved. The simplicity concept translates the XML description of a new document with
The petition clearly states that 50 per cent smaller ODF/ODT files and 90 per cent smaller OOXML/DOCX files as
it is not requesting Adobe to release compared to Microsoft Office.
the licensed components. Instead, The other highlight of the latest LibreOffice update is the new standard colour
the petitioners are ready to volunteer palette based on the RYB colour model. The Document Foundation has integrated
for either bypassing the licensed better support for embedded videos and OpenPGP keys. Also, the rendering of
components or to replace them with imported PDF documents is much better in this version.
open source alternatives. The new version of Writer can help you import AutoText from MS Word
DOTM templates. Users can preserve the file structure of exported or pasted lists

10 | SEPTEMBER 2017 | OPEN SOURCE FOR YOU | www.OpenSourceForU.com


as plain text. This allows them to create custom watermarks for their documents.
Linux gets a preview Additionally, a new context menu is available to help users with footnotes, endnotes,
of Microsoft’s Azure styles and sections.
Container Instances The new version of Calc has support for pivot charts. Users can customise
Microsoft is adding a new service pivot tables and comment via menu commands. Impress helps users in specifying
to its cloud portfolio dubbed fractional angles while duplicating objects. There is also an auto save feature for
Azure Container Instances. While settings to help in duplicating an operation. This is a part of Calc as well as Impress.
the development is yet to receive LibreOffice 5.4 is available for download for Mac OS, Linux and Windows
Windows support, a public through its official website. The organisation has also improved the LibreOffice
preview for Linux containers online package with better performance and a more responsive layout. You can
is out to help developers access the latest LibreOffice source code as Docker images.
create and deploy containers
without the hassle of managing OpenSUSE Leap 42.3 is out with new KDE Plasma
virtual machines. and GNOME versions
Microsoft claims that Azure OpenSUSE has released the new version of its Leap distribution. Debuted as
Container Instances (ACI) takes OpenSUSE Leap 42.3, the new release is based on SUSE Linux Enterprise (SLE) 12
only a few seconds to start. The Service Pack 3.
configuration window is highly The new update includes
customisable. Also, users simply hundreds of updated packages. There
need to select the exact memory is the new SUSE version that is
and count of CPUs that they need. powered by Linux kernel 4.4. The
Designed to work with development team has spent a good
Docker and Kubernetes, the new eight months in producing this rock-
service allows developers to solid Leap build.
utilise container instances and The most notable addition in
virtual machines simultaneously OpenSUSE Leap 42.3 is the KDE
in the same cluster. Microsoft is Plasma 5.8 LTS desktop environment.
also releasing ACI connector for Users have the option to either pick
Kubernetes to help the deployment the latest KDE version or go with
of clusters to ACIs. GNOME 3.20. There is also a provision to install other supported environments.
“While Azure Container Apart from the new desktop environment options, the OpenSUSE Leap
Instances are not orchestrators update comes with a server installation profile and includes a full-featured text
and are not intended to replace mode installer. The platform also officially supports Open-Channel solid-state
them, they will fuel orchestrators drives through the LightNVM full-stack initiative. Likewise, there are numerous
and other services as a container architectural improvements for 64-bit ARM systems.
building block,” said Corey The OpenSUSE team has provided PHP5 and PHP7 support in the latest Leap
Sanders, director of compute, distro. There is also an updated graphics stack based on Mesa 17, and GCC 4.8.5 as
Azure, in a statement. a default compiler. Considering the list of new changes, OpenSUSE 42.3 appears
The company executives to be an advanced Linux version. It also comes preloaded with packages for
are hoping that ACIs will be streaming media, editing graphics, creating animation, playing games and building
used for fast bursting and 3D printing projects.
scaling. Virtual machines can The new OpenSUSE Leap version is available for download for both 32-bit and
be deployed alongside the cloud 64-bit systems. Existing OpenSUSE Leap users can upgrade their systems using the
to deliver predictable scaling built-in update system.
so that workloads can migrate
back and forth between two Google blocks Android spyware family Lipizzan
infrastructure models. Google’s Android Security and Threat Analysis teams have jointly discovered a
Windows support for ACI is new spyware family that gets distributed through various channels including Play
likely to be released in the coming Store. Called Lipizzan, the software has been detected in 20 apps that have been
weeks. In the meantime, you can test downloaded on fewer than 100 devices.
it on your Linux container system. Unlike some of the earlier spyware, Lipizzan is a multi-stage spyware that can be
used to monitor and exfiltrate email, text messages, location, voice calls and media. It

12 | SEPTEMBER 2017 | OPEN SOURCE FOR YOU | www.OpenSourceForU.com


is typically available as an innocuous-sounding app such as ‘Backup’ or ‘Cleaner’.

Once installed, the spyware downloads and loads a second ‘license verification’ implan
page that validates some abort eterisa
criteria on the hardware. no lead
“If given the all-clear, the and se
second stage would then root the platfor
device with known exploits and to imp
begin to exfiltrate device data to medica
a command and control server,” costs fo
the team, comprising Android
Security’s Megan Ruthven and
the Threat Analysis Group’s Ken
Bodzak and Neel Mehta, have written in a blog post.
The second stage of Lipizzan is capable of performing and exfiltrating results of
tasks such as call recording, VoIP, voice recording, location monitoring, screenshot
capturing and taking photos. Additionally, it is capable of helping attackers retrieve
data from apps like Gmail, Hangouts, KakaoTalk, LinkedIn, Messenger, Skype,
Snapchat, Viber and WhatsApp, among others.
Google researchers found the presence of Lipizzan while investigating Chrysaor
— a recently emerged spyware that was believed to be written by the NSO Group. Fig. 5: En
Once spotted clearly, Google’s Play Protect service released a notification on all
affected devices and removed the apps with Lipizzan from the online store.
Moreover, Google has enhanced Play Protect’s capabilities to continuously
detect and block targeted spyware on the Android platform. Developers need
to use official resources only when building their apps to ensure a secured and
safe experience.

GitHub adds new features to grow community engagements

Supporting open source efforts by developers, GitHub has brought
out a list of new features to enhance community engagements
around your projects. Fig. 6: Go
“Thanks to some subtle (and not so subtle) improvements
in the past few months, it’s now easier to make your first
contribution, launch a new project or grow your community on
GitHub,” the GitHub team wrote in a blog post.
First on the list of new features is contributor badges.
Being a maintainer, you can now see a ‘first-time contribution’ badge that helps
you review pull requests from users who have contributed to your projects for the
first time. The ‘first-time contributor’ badge becomes a ‘contributor’ badge in the
comments section once the pull request is merged. Furthermore, you can expose the Fig. 7: El
information in the additional flag via the GraphQL API.
Apart from providing badges to your contributors, you have been provided
with the option to add a licence file to your project using a new licence picker.
This new section helps you pick an appropriate licence by providing the full text.
It also allows you to customise any applicable fields prior to committing the file
or opening a pull request.
• www.lulu.com
As privacy is one of the major factors preventing you from contributing to
a new project, GitHub has added the ability to let you keep your email address • www.magzter.com
private. GitHub also provides a warning that lets you make an informed decision
• Createspace.com
about contributing to a project you were blocked from previously. Moreover,
blocked users on the platform will not be able to comment on issues or pull requests • www.readwhere.com
in third-party repositories. Fig. 8: Go

www.OpenSourceForU.com | OPEN32 February

SOURCE 2017|| electronics
FOR YOU For you
SEPTEMBER 2017 | 13

Microsoft is now a part of “We hope these improvements will help you make your first contribution, start
the Cloud Native Computing a new project, or grow your community,” GitHub concluded in its blog.
Foundation First launched in October 2007, GitHub is so far used by more than 23
Continuing its developments around million people around the globe. The platform hosts over 63 million projects
open source, Microsoft has now with a worldwide employee base of 668 people.
joined the Cloud Native Computing
Foundation (CNCF). The latest Mozilla aims to enhance AI developments with
announcement comes days after open source human voices
the Redmond company entered While elite digital assistants like Alexa, Cortana, Google Assistant and Siri have
the board of the Cloud Foundry so far been receiving inputs from
Foundation. users via the spoken word, Mozilla
“Joining the Cloud Native is planning to enhance all such
Computing Foundation is another existing artificial intelligence (AI)
natural step on our open source developments by open sourcing
journey, and we look forward to human voices on a mass level. The
learning and engaging with the Web giant has already launched a
community on a deeper level as project called Common Voice to
a CNCF member,” said Corey build a large-scale repository of
Sanders, partner director, Microsoft, voice recordings for future use.
in a joint statement. Mozilla has started capturing human voices since June to build its open source
Microsoft has chosen the database. The database will be live later this year to “let anyone quickly and easily
Platinum membership of the CNCF. train voice-enabled apps” that go beyond Alexa, Google Assistant and Siri.
Gabe Monroy, a lead product “Experts think voice recognition applications represent the ‘next big thing’.
manager for containers on Microsoft The problem is the current ecosystem favours Big Tech and leaves out the next
Azure and former Deis CTO, is wave of innovators,” said Daniel Kessler, senior brand manager, Mozilla, in a
joining CNCF’s governing board. recent blog post.
Led by the core team members Tech companies are presently using different voices to teach computers to
of the Linux Foundation, CNCF understand the variety of languages for their solutions. But the data sets with the
has welcomed the new move voice collections are mostly proprietary as of now. Therefore, a large number
of Microsoft. The non-profit of developers have no access to voice recording samples to test their own
organisation considers it a voice recognition projects. This ultimately leads to a limited number of apps
“testament to the importance and understanding our speech.
growth” of cloud technologies and Things are appearing to be changing with Common Voice. “The time has
believes the Windows maker’s come for an open source data set that can change the game. The time is right for
commitment to open source Project Common Voice,” Kessler stated. Mozilla is asking individuals to donate
infrastructure is a ‘significant asset’ their voice recordings either on the Common Voice Web page or by downloading a
to its board. dedicated iOS app. Once you are ready with your recording, you need to read a set
“We are honoured to have of sentences that will be saved into the system.
Microsoft, widely recognised as The recorded voices, which would come in a variety of languages with various
one of the most important enterprise accents and demographics, will be provided to third-party developers.
technology and cloud providers in In addition to simply receiving voice donations, Mozilla has built a model by
the world, join CNCF as a platinum which users will validate the recordings that are stored in the system. This process
member. Its membership, along will help train an app’s speech-to-text conversion capabilities.
with other global cloud providers All this will enable not just one or two but 10,000 hours of validated audio
that also belong to CNCF, is that will power tons of AI models in the near future. Notably, recordings received
a testament to the importance through the Common Voice initiative will be integrated into the Firefox browser as
and growth of cloud native well. But the main purpose of this exercise is to provide a public resource.
technologies,” stated Dan Kohn,
executive director of the Cloud
Native Computing Foundation. For more news, visit www.opensourceforu.com

14 | SEPTEMBER 2017 | OPEN SOURCE FOR YOU | www.OpenSourceForU.com

In this month’s column, we discuss a few interview questions
related to machine learning and data science.
Sandya Mannarswamy

s we have been doing over the last couple of term in the document; or you can employ Tf-Idf
months, we will continue to discuss a few count for each term-review combination, etc.
more computer science interview questions You had used a random forests classifier for
in this column as well, particularly focusing on topics sentiment classification. Now you are told that your
related to data science, machine learning and natural vocabulary size is 100,000. Would this change your
language processing. It is important to note that many decision about which classifier to use?
of the questions are typically oriented towards practical 3. For problem (1), you had decided to use a support
implementation or deployment issues, rather than just vector machine classifier. However, now you are
concepts or theory. So it is important for interview told that instead of just doing binary classification
candidates to make sure that they get adequate of the reviews, you need to classify them as one of
implementation experience with machine learning/ five categories, namely: (a) strongly positive, (b)
NLP projects before their interviews. Data science weakly positive, (c) neutral, (d) weakly negative,
platforms such as Kaggle (www.kaggle.com) host a and (e) strongly negative. You are given labelled
number of competitions that candidates can attempt data with these five categories now. Would you
to practice their skills on. Also, many of the data still continue to use the ‘support vector machine’
science or machine learning related academic computer (SVM) classifier? If so, can you explain how SVM
conferences host data challenge competitions such handles multi-class classification? If you decide to
as the KDD Cup (http://www.kdd.org/kdd-cup). Data switch from SVM to a different classifier, explain
science enthusiasts can sign on for these challenges and the rationale behind your switch.
hone their skills in solving real life problems. Let us 4. For the sentiment classification problem, other
now discuss a few interview questions. than the review text itself, you are now given
1. You are given 100,000 movie reviews that are additional data about the movies. This additional
labelled as positive or negative. You have been told data includes the reviewers’ names, address,
to perform sentiment analysis on the new incoming age, country of residence, date of review and the
reviews by classifying each review as positive or specific movie genre they are interested in. This
negative, which is a simple binary classification additional data contains both numeric and string
problem. Can you explain what features you would data, with some of the features being categorical.
use for this classification problem? Once you A country’s name is string data, and the movie
decide on your set of features, how would you go genre is string data which is actually categorical.
about selecting which classifier to use? What kind of data preprocessing would you do on
2. Let us assume that you decided to use the ‘bag this additional data to use it with your classifier?
of words’ approach in the above problem with 5. Generally, interviewers expect you to be familiar
each vocabulary term becoming a feature for your with some of the popular libraries that can be used
classifier. Essentially, you can construct a feature for data science. So some of the questions can
set where the dimensions of this set are the same be library-specific as well. In question (4), you
as the size of your vocabulary, and each feature may be asked to mention how you would convert
corresponds to a specific term in the vocabulary. categorical data to numeric form. Can you write a
The feature value can either be the count of the piece of Python code to do this conversion?
term or merely the presence or absence of the 6. Let us assume that you decided to use a SVM

16 | SEPTEMBER 2017 | OPEN SOURCE FOR YOU | www.OpenSourceForU.com

Guest Column CodeSport

classifier for the sentiment classification problem. You features which are on widely varying scales, you have
find that your classifier takes a long time to fit the training decided to do feature scaling. Should you do data scaling
data. How would you reduce the training time? List all the once for the entire training data set and then perform the
possible approaches. k-fold cross validation? Or should you do the feature
7. One of our readers suggested feature scaling/data scaling within each fold of cross-validation? Explain the
normalisation as a preprocessing step before you train reason behind your choice.
your model always. Is she correct? Is feature scaling or 13. You are given a data set which has a large number of
normalisation always needed in all types of classifiers? features. You are told that only a handful of these features
Why do you think feature scaling can help achieve faster are relevant in predicting the output variable. Will you use
convergence of your learning procedure? One of the well- Lasso regression or ridge regression in this case? Explain
known methods of feature scaling is Min-Max scaling. the rationale behind your choice. As a follow-up question,
By feature scaling, you are actually throwing away your when would you prefer ridge regression over Lasso?
knowledge of the maximum and minimum values that the 14. Decision tree classifiers are very popular in supervised
feature can take. Wouldn’t the loss of this information affect machine learning problems. Two well-known tree classifiers
the accuracy of your classifier on unseen data? If you are are random forests and gradient boosted decision trees. Can
using a decision tree classifier or random forests, should you you explain the difference between the two of them? As
still do feature scaling? If yes, explain why. a follow-up question, can you explain ensemble learning
8. Scikit-learn is a popular machine learning library available methods in general? When would you opt for an ensemble
in Python, which provides ready-made implementations classifier over a non-ensemble classifier?
of several classifiers such as decision tree, support vector 15. You are given a data set in which many of the variables
machine, random forests, logistic regression, multilayer are categorical string variables. You decided to encode the
perceptron, etc. These classifiers provide a ‘predict’ function, categorical variables with One Hot Encoding. Consider that
which predicts the output for a given data instance. They you have a variable called ‘country’, which can take any of
also provide a ‘predict_proba’ function, which returns the the 20 values. With One Hot encoding, you end up creating
probability for each sample (data instance) belonging to a 20 new feature variables in place of the single ‘country’
specific output class. For instance, in the case of the movie variable. On the other hand, if you use label encoding,
review sentiment prediction task, with two classes positive you convert the categorical string variable to a categorical
and negative, the ‘predict_proba’ function would return the numerical variable. Which of the two methods leads to the
probability of the sample belonging to the positive sentiment ‘curse of the dimensionality’ problem? When would you
category and negative sentiment category. When would prefer to go for One Hot encoding vs label encoding?
you use the ‘predict_proba’ function in your sentiment Please do send me your answers to the above questions. I
classification task? will discuss the solutions to these questions in next month’s
9. In the sentiment classification problem on the movie reviews column. I also wanted to alert readers about a new deep
data, you found that some of the reviews did not have the learning specialisation course by Prof. Andrew Ng coming
date, country of reviewer and the movie genre. How would up soon on the Coursera platform (https://www.coursera.
you handle these missing data? Note that these features were org/specializations/deep-learning). If you are interested in
not numeric; so what kind of data imputation would make becoming familiar with deep learning, there is no better teacher
sense in this case? than Prof. Ng whose machine learning course on Coursera is
10. In the movie reviews training labelled data set, you are now being taken by more than a million students.
given certain additional data features that include: (a) the star If you have any favourite programming questions/software
rating reviewers give to the movie, (b) whether they would like topics that you would like to discuss on this forum, please send
to watch it again, and (c) whether they liked the movie. Would them to me, along with your solutions and feedback, at sandyasm_
you use these additional features in your training data to train AT_yahoo_DOT_com. Till we meet again next month, wishing all
your model? If not, explain why you wouldn’t. our readers wonderful and productive days ahead.
11. What is the data leakage problem in machine learning
and how do you avoid it? Does the scenario mentioned in
question (10) fall under the data leakage category? Detailed By: Sandya Mannarswamy
information on data leakage and its avoidance can be found The author is an expert in systems software and is currently
working as a research scientist at Conduent Labs India (formerly
in this well-written and must-read paper ‘Leakage in data
Xerox India Research Centre). Her interests include compilers,
mining: formulation, detection, and avoidance’ which was programming languages, file systems and natural language
presented at the KDD 2011 conference and is available at processing. If you are preparing for systems software interviews,
http://dl.acm.org/citation.cfm?id=2020496. you may find it useful to visit Sandya’s LinkedIn group ‘Computer
Science Interview Training India’ at http://www.linkedin.com/
12. You are using k-fold cross validation for selecting the hyper-
parameters of your model. Given that your training data has

www.OpenSourceForU.com | OPEN SOURCE FOR YOU | SEPTEMBER 2017 | 17

Pocket-friendly Earbuds with
Bluetooth heart-rate monitor
speaker from from Jabra
Kodak Audio and connectivity devices
Super Plastronics Pvt Ltd, a brand manufacturer, Jabra, has introduced
licensee of Kodak, has launched its superior quality earbuds for music and
first Bluetooth portable speaker in voice calls, called Jabra Elite Sport.
India, the Kodak 68M. The device The device sports advanced wireless ` 18,999
offers a complete sound experience connectivity, which filters out background
in an affordable price, company noise ensuring distraction-free usage. With IP67 certification, the earbuds
sources claim. The buds come with the ease of have a three-year warranty for damage
Apart from Bluetooth connectivity, portable charging and deliver 4.5 hours by sweat and water, enabling hassle-
the speaker supports an auxiliary wire of playtime. They have customisable free usage.
and a micro USB jack. fitting options, enabling users to stay The wireless Jabra Elite Sport is
It is equipped with a low sound connected comfortably during outdoor available in lime green, grey and black,
output with a reach of up to 10 metres. and sports activities. online and at retail stores.
Powered by a 3.7W battery, the A special button on the earbuds,
speaker is capable of lasting for over ‘Hear Through’, filters out surrounding Address: Jabra India Pvt Ltd, Redington
five hours. For enhancing the sound noise. The device has four microphones India Limited, New No. 27, NRS
experience, the device can also be and offers personalised fitness analysis Building, Velachery Road, Saidapet,
connected to an additional speaker. using an in-ear heart rate monitor. Chennai-600015.
The Kodak speaker can also be
connected to any TV, with or without
Bluetooth, making it a complete Mechanical keyboard
package for entertainment lovers. for gamers from Galax
The Kodak 68M speaker is
available online and at retail stores. Galax, the manufacturer of gaming
products, has unveiled its latest HOF Black
edition mechanical keyboard, specially
` 7000
designed for gamers. The keyboard uses
a genuine Cherry MX mechanical key and a mic-in jack. The hub allows the
switch with 50 million keystrokes for long users to connect USB devices of all
lasting and quick response, giving users a types quickly. It also has a magnetic,
stable and long-term option. detachable, soft-touch wrist rest to
The stylish-looking keyboard is built make prolonged use comfortable.
with an anodised (black)/baking paint The Xtreme Tuner Plus system
(white colour) aluminium plate. It offers enables users to customise the keyboard
up to 112 lighting effects with software by controlling Macros. It also has per-key
and 88 lighting effects without software. programming, back light setting and
Price: Its Macro keys make each key of the lighting patterns.
` 3,290 device programmable. The keyboard The HOF black edition carries
comes with media control buttons along a three-year warranty period and is
with a die-cast volume and lighting roller. available at Amazon.
Address: Super Plastronics Pvt With the n-key rollover, the company
Ltd, 1st Floor, Dani Corporate Park, claims the keyboard is 100 per cent anti- Address: Amazon India, Brigade
158 Dani Compound, Vidya Nagari ghosting. The HOF keyboard can enter Gateway, 8th Floor, 26/1, Dr
Road, Kalina, Santacruz East, all the signals accurately, even when Rajkumar Road, Malleshwaram West,
Mumbai – 400098; played faster or when pressing multiple Bengaluru, Karnataka – 560055;
Ph: 022-66416300 keys. It has a USB 2.0 hub with audio-out Ph: 1800-30009009

18 | SEPTEMBER 2017 | OPEN SOURCE FOR YOU | www.OpenSourceForU.com

Budget-friendly smartphone
with front LED flash from Lenovo
Lenovo has launched an affordable megapixel depth sensor with a dual
dual camera smartphone, the Lenovo LED-CCT flash module; and a 13
K8 Note. The device sports a 13.9cm megapixel front camera with an LED
(5.5 inch) full HD (1080 x1920 pixels) flash module for selfies.
display with Corning Gorilla Glass The connectivity options of the
protection. It is powered by a deca- device include 4G VoLTE, dual band
core MediaTek MT6797 SoC with (2.4GHz and 5GHz), Wi-Fi 802.11ac,
four Cortex-A53 cores clocked at Bluetooth v4.1, GPS, micro USB and a
1.4GHz and 1.85GHz, as well as two 3.5mm audio jack.
Cortex-A72 cores clocked at 2.3GHz. The Lenovo K8 Note comes in two
Built with 5000 series aluminium variants – 3GB RAM/32GB storage
and polycarbonate, the device is and 4GB RAM/64GB storage — both
supposed to be splash-resistant. available in ‘Fine Gold’ and ‘Venom
The dual SIM (Nano) device runs Black’ colours online and at retail stores.
on Android 7.1.1 Nougat and is backed Price:
with a huge 4000mAh battery with ` 12,999 for the 3GB
turbo charging. Address: Lenovo India Pvt Ltd, RAM/32GB storage option and
On the camera front, the Vatika Business Park, 1st Floor, `13,999 for the 4GB RAM/64GB
smartphone sports a rear 13 megapixel Badshah Pur Road, Sector-49, Sohna storage variant.
primary sensor, accompanied by a 5 Road, Gurugram-122001

Water-resistant Bluetooth headphones from Motorola

Motorola has introduced its Bluetooth Control Profile), enabling hands-free
in-ear headphones in India – the Verve calling and voice assistance.
Loop. Aimed at sports and fitness The headphones come with in-
enthusiasts, the headphones offer a line control buttons for volume, play/
hassle-free, comfortable fit during pause, etc, three sets of extra ear gels
outdoor and workout sessions. and three sets of ear hooks for stable
The device comes with an IP54 support. The Motorola Verve Loop
rating for water and splash resistance, is compatible with all Android and
enabling damage-free use. It is Apple smartphones and tablets, apart
designed to deliver a balanced, high from supporting Siri and Google
quality audio experience along with Now voice assistants.
easy Bluetooth pairing with voice Available in combinations of
prompts. Powered by a lithium-ion charcoal grey/black and orange/
battery, the device delivers up to six black, the headphones can be
hours of playback with a single charge. purchased online and at retail stores.
Company sources also claim that it Price:
provides one hour of play time on just ` 2,499
a 20-minute charge. The headphones Address: Motorola Solutions India,
offer balanced sound at any volume and 415/2, Mehrauli-Gurugram Road,
superb noise isolation. Features include Sector 14, Near Maharana Pratap
A2DP (Advanced Audio Distribution Chowk, Gurugram, Haryana – 122001;
Profile), HFP (Hands Free Protocol) Ph: 0124-4192000;
and AVRCP (Audio/Video Remote Website: www.motorola.in

The prices, features and specifications are based on information provided to us, or as available
on various websites and portals. OSFY cannot vouch for their accuracy. Compiled by: Aashima Sharma

www.OpenSourceForU.com | OPEN SOURCE FOR YOU | SEPTEMBER 2017 | 19

Exploring Software Guest Column

Importing GNUCash
Anil Seth

Accounts in GNUKhata
gkcore is the REST API core engine of GNUKhata. The GNUKhata
app comprises two applications — gkcore and gkwebapp. The
objective of this tutorial is to get to know the API.

NUKhata is an application developed using the return requests.post(gkhost + route,data=jsondata,headers
Pyramid Web Framework. It comprises two Web =hdrs).json()
applications – a core application called gkcore, and a
Web application called gkwebapp. You may easily get started def getOrg():
with the installation and development by referring to https:// gkdata = getJsonResponse(‘organisations’)[‘gkdata’]
gitlab.com/gnukhata/gkwebapp/wikis/home. first_org = gkdata[0]
As a way of learning how to extend GNUKhata, route= ‘/’.join([‘orgyears’,first_org[‘orgname’], first_
you may consider importing data from GNUCash into org[‘orgtype’]])
GNUKhata. Since the core and user interface are two gkdata = getJsonResponse(route)[‘gkdata’]
separate applications, a good way to learn the core return gkdata[0][‘orgcode’]
application interface is to create a utility program which will
add the GNUCash data. def orgLogin(orgcode):
The utility program will first need to log into the core gkdata = {‘username’:’anil’, ‘userpassword’:’pswd’,’orgc
server, and then issue the commands to add the needed data. ode’:orgcode}
Make sure that you are able to run the core and the Web return postJsonResponse(‘login’,json.dumps(gkdata))
applications; use the latter to create an organisation and an [‘token’]
admin user for the organisation. It is important to keep in mind orgcode = getOrg()
that the gkcore application needs to be run using the gkadmin gktoken = orgLogin(orgcode)
user, assuming that you are following the steps from the wiki
article; otherwise, it will not be able to access the database. Adding accounts
GNUCash can export the accounts and transactions in CSV
The login process files. In the current article, you may extract the accounts into a
You may examine gkwebapp/views/startup.py to understand file, accounts.csv. The Python CSV modules make it very easy
the logic of the steps needed for logging in. The process to handle a CSV file. The first row contains the column labels
involves selecting an organisation first, and then supplying the and should be ignored. You may use the DictReader for more
credentials of a user for that organisation. complex processing of the file. For this application, in which
In order to keep the code as simple as possible, as the only a few columns are needed, the CSV reader is adequate.
objective is to learn the API, select the first organisation. The There are a few differences in the top level account/group
login credentials are hard-coded. In case of any errors, the names of GNUCash and GNUKhata. So, you need to create a
utility will just crash and not attempt any error handling. dictionary to map the names from GNUCash to the ones used
Once the login is successful, a token is issued. This token in GNUKhata.
will authorise all subsequent calls to the core server. You will Some groups in the level below ‘Assets’ in GNUCash
notice that the calls to the core server are simple get or post appear as top level groups in GNUKhata, e.g., ‘Current
requests. The data objects transferred between the two are Assets’ and ‘Fixed Assets’. You may ignore ‘Assets’ from the
JSON objects. account hierarchy when transferring the data.
As before, the code below ignores error handling and
import requests, json assumes ‘all is well’:
gkhost = ‘’
def getJsonResponse(route,hdrs=None): import csv
return requests.get(gkhost + route, headers = hdrs).json() def addSubGroup(name,parent,header):
data = json.dumps({‘groupname’:name,
def postJsonResponse(route,jsondata,hdrs=None): ‘subgroupof’:parent})

20 | SEPTEMBER 2017 | OPEN SOURCE FOR YOU | www.OpenSourceForU.com

Exploring Software Guest Column

return postJsonResponse(‘groupsubgroups’,data,header) login_hdr = {“gktoken”:gktoken}

[‘gkresult’] toplevel_mapping = { ‘Capital’:’Capital’,
‘Current Assets’:’Current Assets’,
def addAccount(name,parent,header): ‘Fixed Assets’:’Fixed Assets’,
data = json.dumps({‘accountname’:name, ‘groupcode’:parent ‘Liabilities’:’Current Liabilities’,
,’openingbal’:0.00}) ‘Expenses’:’Direct Expense’,
res = postJsonResponse(‘accounts’,data,header) ‘Income’:’Direct Income’}
createAccounts(‘accounts.csv’, toplevel_mapping, login_hdr)
def createAccounts(fn,toplevel,header):
# get exiting groups and their codes There are some potential issues in transferring accounts
groups = getJsonResponse(‘groupsubgroups?groupflatlist’,he from GNUCash to GNUKhata. For example, in GNUKhata,
ader)[‘gkresult’] you either have an account or a sub-group. However, in
f = open(fn) GNUCash, a sub-group can function as a normal account as
rows = csv.reader(f) well. An account in GNUCash is consistent with a sub-group
skip_first = rows.next() of GNUKhata if the placeholder flag is true.
for row in rows: However, the objective of this article is to become familiar
fullname = row[1].split(‘:’) with the communication between a client application and the
name = row[2] core server, and no attempt is made to handle corner cases.
is_group = row[-1] == ‘T’ GNUCash transaction data may also be exported as CSV
# Map top-level to GNUKhata top level files. The above utility may be similarly extended to handle
# Ignore Assets and use ‘Current Assets’ and ‘Fixed that data as well.
Assets’ as top level The Web-based frontend, gkwebapp, makes it easy to
if fullname[0] == ‘Assets’: view and enter the data. The communication with the server
fullname = fullname[1:] happens as in the utility above, and it is done from the code
if len(fullname) > 1 and fullname[0] in toplevel: residing in the views directory of the gkwebapp.
fullname[0]=toplevel[fullname[0]] You may, as an exercise, extend the Web application
parent = groups[fullname[-2]] to import GNUCash accounts into GNUKhata and learn
if is_group: that as well!
if not (name in groups):
# add this to the list of groups By: Dr Anil Seth
parent,header) The author has earned the right to do what interests him. You
can find him online at http://sethanil.com, http://sethanil.
else: blogspot.com, and reach him via email at anil@sethanil.com.

OSFY Magazine Attractions During 2017-18

March 2017 Open Source Firewall, Network security and Monitoring
April 2017 Databases management and Optimisation
May 2017 Open Source Programming (Languages and tools)
June 2017 Open Source and IoT
July 2017 Mobile App Development and Optimisation
August 2017 Docker and Containers
September 2017 Web and desktop app Development
October 2017 Artificial Intelligence, Deep learning and Machine Learning
November 2017 Open Source on Windows
December 2017 BigData, Hadoop, PaaS, SaaS, Iaas and Cloud
January 2018 Data Security, Storage and Backup
February 2018 Best in the world of Open Source (Tools and Services)

22 | SEPTEMBER 2017 | OPEN SOURCE FOR YOU | www.OpenSourceForU.com

For U & Me Interview

is built on a
lot of past experience
Q How did you start your
journey with the CentOS
Managing a Linux
distribution for a long
Q Why was there a need for
CentOS Linux when Fedora
and Red Hat Enterprise Linux
It was in late 2004. I was not one of time requires immense already existed in the market?
the founders of CentOS but showed up community effort. But The Fedora project was still getting
on the scene in its early days. At that what is the key to sorted out around then. Its team had
time, we had a small team and a lot of success in a market that a clear mandate to try and build an
machines running Red Hat Linux 7.3 includes hundreds of upstream-friendly Linux distribution
and Red Hat Linux 9. competitive options? that was going to move fast and
With Red Hat moving down the Also, what are the help the overall Linux ecosystem
path towards Red Hat Enterprise challenges in building mature. Red Hat Enterprise Linux,
Linux, a model that didn’t work well a brand around an on the other hand, has been built
for us, I started looking at options. We open source offering? for the commercial medium to large
explored Debian and SUSE initially, Karanbir Singh, project organisations, looking for value above
but found management and lifecycle leader, CentOS, the code. This left a clear gap in the
on each of them hard to map out answers these questions ecosystem for a community-centric,
workflow into. and outlines the future manageable, predictable enough Linux
It was during this time that I came of the platform that distribution that the community itself,
across the Whitebox Linux effort has been leading Web small vendors, and niche users around
and then the CentOS Project. Both developments, in an the mainstream could consume.
had the same goal, but the CentOS exclusive conversation Initially, the work we did was quite
team was more inclusive and seemed with Jagmeet Singh focused on the specific use cases that
more focused on its goals. So, in late of OSFY. the developers and contributors had.
September 2004, I joined the CentOS All of us were doing specific things, in
IRC channel and then, in November, specific ways and CentOS Linux fitted
I joined the CentOS mailing list as a of a lot of code, written in many in well. But as we started to mature, we
contributor. And I am still contributing languages — each with its own licence, saw great success in specific verticals,
13 years down the road! build process and management. starting from academia and education
Three main strategies saw us past institutions to Web hosting, VoIP (Voice

Q What were the biggest

roadblocks that emerged
initially while designing CentOS
that painful process. The first was
consistency. Whatever we did, we had
to be consistent and uniform across
over Internet Protocol) and HPC (high
performance computing).

for the community, and how

did its core development team
overcome them?
the entire distribution, and make
sure all developers had a uniform
understanding of the process and
Q What were the major
outcomes of the Red Hat
A lot of our problems were about not flow. The second was a self-use focus. Red Hat came on as a major sponsor
getting off the ground. Initially, there Regardless of what the other people for the CentOS Project in January
was no clear aim. And then, we faced were targeting, all developers were 2014. From the CentOS Project’s
the challenge that the build systems encouraged to focus on their own perspective, this meant we were then
and code audit tools in 2003/2004 were use cases and their personal goals. able to start looking beyond just the
either primitive, absent entirely or the The third was the hardest, to try and platform and the Linux distribution. It
contributors were unaware of them. A disconnect commercial interests from allowed us to build the infrastructure
Linux distribution is a large collection developer and contributor work. and resources needed to support

24 | SEPTEMBER 2017 | OPEN SOURCE FOR YOU | www.OpenSourceForU.com

Interview For U & Me

Karanbir Singh,
project leader,

doing so in order to achieve a goal—

either to run a website or to run a mail
server. Helping users achieve their
end goals easily has been our constant
focus. It is a key metric we still track
in order to reach our goals.
This means that as the user base
adapts to the new world of cloud-native,
container-based and dynamically
orchestrated workloads, the CentOS
Project continues to deliver the same
level of user focus that we have had
over the years.
Protecting the user’s investment in
the platform across the base without
giving up on existing processes is
something we deliver till date. For
instance, people can choose when
and how they jump on the container
process, or just entirely opt out. It is not
something that will be influenced by a
CentOS Linux release. It is this duality,
which maintains an existing base while
allowing the user to seamlessly move
into emerging tech, that creates a great
other projects above the platform, value proposition for CentOS Linux.
develop a great CI (continuous
integration) service as well as a
much better outreach effort than
we were able to earlier.
Q How do you manage the
diversification of different
CentOS Linux versions and
The real wins have been from releases?
the user perspective. If today you The way the contribution process and
are looking for an OpenStack ownership works makes it relatively
user side install, the CentOS easy to manage the diversification.
Project hosted on the RDO stack Primarily, the aim is to ensure that if
is the best packaged, tested and we are doing something specific, the
maintained option. people doing the work are directly
In a nutshell, the Red Hat
relationship has allowed the CentOS
Project to dramatically expand the
Q What makes CentOS Linux
a perfect choice even 13
years after its first release in
invested in the result of the work itself.
This helps ensure quality as there are
eyes scrutinising incoming patches and
scope of its operations beyond just May 2004? changes—since the developers’ own
the Linux distribution and enable When users install a Linux requirements could be impacted by
many more user scenarios. distribution, they are almost always shipping a sub-optimal release.

www.OpenSourceForU.com | OPEN SOURCE FOR YOU | SEPTMBER 2017 | 25

For U & Me Interview

Participation from the target contributors don’t always have the time most of my focus is around enablement
audience for the specific media or to work on each request, but if you look and making sure that contributors and
release is a very critical requirement. at the CentOS Forums, almost every developers have the resources they need
And because this typically comes question gets a great result. to succeed. The other 70 per cent of my
through the common project resources, There is also a lot of diversity in the time is spent as a consulting engineer
it also means that the people doing groups. The CentOS IRC channel idles at Red Hat, working with service
this work are well engaged in the core at over 600 users during the day, but a teams, helping build best practices in
project scope and Linux distribution large number of users never visit the operations roles and modern system
areas, allowing them to bridge the two forums. Similarly, the CentOS mailing patterns for online services.
sides nicely. lists include over 50,000 people, but a Additionally, I have been involved
At the moment, there are dozens large number of them are never reaching in some of the work going on in the
of different kinds of CentOS releases, the IRC channel or the forums. containers world and its user stores,
including atomic hosts, minimal which includes DevOps, SRE-Patterns,
installs, DVD ISOs, cloud images, CI and CD, among others.
vagrant images and containers. Each
of these comes through a group that is
well invested in the specific space. Q What are the major
differences you’ve observed
being a part of a corporate entity

Q What are the various efforts

that lead to a consistent
developer experience on
like Red Hat and a community
member? Which is tougher
among the two?
CentOS Linux? I have been involved with open source
Application developers working to
consume CentOS Linux as their base
can trust the API/ABI efforts where
Q Is it solely the community
feedback that helps you
build new CentOS updates, or
communities for over 15 years. During
such a long period, open source work has
never been my primary job. It’s always
content deployed today will still work do you also consider feature been something that I do in my free time
three or five years down the road. The requests from the Red Hat team? or in addition to what I was already doing,
interfaces into that don’t change (they CentOS Linux is built as a downstream similar to my move with the CentOS
can evolve, but won’t break existing from Red Hat’s sources. They are being Project. But what makes Red Hat unique
content/scripts/code). Therefore, delivered via git.centos.org, and then the in a way is that this isn’t an odd role.
working with these interfaces also additional layers and packages are built A large number of people at Red
means that they work within the same by various community members. We also Hat participate and execute their day
process that the end user is already encourage people to come and join the job via open source communities. And
aware of and already manages for development process to build, test and that makes it a lot easier, being a long-
simple things like security patching, deliver features to the audience that we term contributor.
an area often overlooked by the casual target through the open source project. All There is only one key challenge
developer. this is entirely community focused. that one needs to keep in mind when
So if someone at Red Hat wants to working on an open source project

Q How does the small team of

developers manage to offer
a long period of support for every
execute something, they would need to
join the relevant community and work
that route for engagement on CentOS.
as a part of the day job, though.
It is to set realistic expectations
around community participation, and
single release? Having said that, we have a concept recognise that the community is there
We invest heavily in automation to of Special Interest Groups that can be because its members often care about
enable long-term support. And that started by anyone, with a specific target something far more than the people
means that a very small group of people in mind. Of course, this is only above paid to work on a project. However,
can actually look after a very large the Linux distribution itself. this typically isn’t a concern when a
codebase. It changes the scope of what community comes together around
the contributors need to do. Rather
than working on the code process,
we work on the automation around it,
Q Apart from your
engagements related to being
a CentOS Project member, what
smaller pieces of the code.
The CentOS Project has quite a
widespread and extensive footprint. It
and aggressively test and monitor the are your major tasks at Red Hat? involves talking to and participating
process and code. These days, I spend around 30 per cent in a wider community where a large
The other thing is that we get of my time working on the CentOS majority is unknown personally.
community support. Developers and Project. Rather than in the project itself, Managing expectations and ensuring

26 | SEPTEMBER 2017 | OPEN SOURCE FOR YOU | www.OpenSourceForU.com

For U & Me Interview

there is respect both ways has been

a balancing act that’s never easy. It’s
something I work quite hard on, and
Q What are the biggest features
that we can see in the next
CentOS release?
but education as a whole. Moreover,
I would like to see CentOS Linux
extend its footprint as a vehicle for
hope we can keep getting it better. The Special Interest Groups (SIGs) are enablement.
constantly releasing new content all the And this is where the CentOS

Q What is your advice to those

who are planning to begin
with CentOS?
time. As an example, the Cloud SIG
released OpenStack Ocata within a few
hours of the upstream project release.
Project provides a great space for other
open source projects—building services
for CI/CD, interacting with the users,
The most important thing to remember There is also a lot of work being done in finding the niches that people care
for those planning to begin with the Atomic SIG around containers, and about and solving real-world problems.
CentOS is that you are not alone. There in the Virt SIG on existing and upcoming If you are involved today with
is lots of documentation as well as virtualisation technologies. an open source project, I strongly
tremendous support and help from a encourage you to get in touch with me
vast community. So as a new user, make
sure you engage with the existing user
base, ask questions, and spend a bit of
Q Lastly, where do you see
CentOS in the future of open
and discuss its development areas.
We measure our success on the basis
of how successful CentOS has been
time to understand what and how things CentOS Linux, due to its characteristics, for the people, the communities, the
are. In the long term, understanding the is a great fit for areas like Web hosting, open source projects and the users who
details will pay off. Web services, cloud workloads and have invested their time, resources and
CentOS Linux is built on a lot of container delivery. Also, as a platform support for the CentOS Project. And we
past experience. Anyone starting down for long-term community centric look forward to solving more problems,
the path of adoption should keep this in workloads, it is a good option in building better solutions and bridging
mind. We’ve tried to build a culture of areas like HPC and IoT. The Linux more gaps together.
helping those that need the most help, distribution also specifically suits the You can reach Karanbir directly at
but also encourage new users to learn needs of the education sector—starting kbsingh@centos.org or meet him on
and grow with the community. and supporting not only IT education Twitter at @kbsingh

28 | SEPTEMBER 2017 | OPEN SOURCE FOR YOU | www.OpenSourceForU.com

Let’s Try Admin

A Primer on Software Defined

Networking (SDN) and the
OpenFlow Standard
Continuous innovation and the need to adapt to the constraints of conventional networking
has made software defined networking (SDN) pretty popular. It is an approach that
disassembles the network’s control plane and data plane. This allows network administrators
to program directly without having to worry about the hardware specifications.

penFlow, the first SDN standard, is a simulator with SDN technology. An OpenFlow switch is
communication protocol in software defined a package that routes packets in the SDN environment.
networking (SDN). It is managed by the Open The data plane is referred to as the switch and the control
Networking Foundation (ONF). The SDN controller or the plane is referred to as the controller. The OpenFlow switch
‘brain’ interacts with the forwarding (data) plane of the interacts with the controller and the switch is managed by the
networking devices like routers and switches via OpenFlow controller via the OpenFlow protocol.
APIs. It empowers the network controllers to decide the The fundamental components of the OpenFlow switch
path of network packets over a network of switches. The (as shown in Figure 2) incorporate at least one flow table,
OpenFlow protocol is required to move network control out a meter table, a group table and an OpenFlow channel to
of exclusive network switches and into control programming an exterior controller. The flow tables and group table
that is open source and privately overseen. perform the packet scanning and forwarding function
Software-defined networking uses southbound APIs and based on the flow entries configured by the controller.
northbound APIs. The former are used to hand over information The routing decisions made by the controller are deployed
to the switches and routers. OpenFlow is the first southbound in the switch’s flow table. The meter table is used for the
API. Applications use the northbound APIs to interact. measurement and control of the rate of packets.

Porting an OpenFlow switch in ns-3 Configuring the SDN OFSwitch

The OpenFlow 1.3 module for ns-3, widely known as the In this article, we have incorporated the OFSwitch13 (version
OFSwitch13 module, was intended to boost the ns-3 network 1.3) with ns-3. To benefit from the features of OFSwitch13, an

www.OpenSourceForU.com | OPEN SOURCE FOR YOU | SEPTEMBER 2017 | 29

Admin Let’s Try

Traditional Network Architecture SDN Architecture

Distributed Control Plane Centralized Control Plane Controller

App App

Internal Diagram of OF Switch OFSwitch

Meter Group Secure

Table Table Channel


Flow Flow Flow

Table Table Table

- Data Plane

- Control Plane

Figure 1: Traditional network architecture vs SDN architecture

ofsoftswitch13 library is used. All the commands given below Hosts

have been tested on Ubuntu 16.04 and might change for other
versions or distributions. Figure 2: OpenFlow switch components
Before that, there are a few bundles to be introduced on
the system: In the ns-3.26 directory, download the repository of
OFSwitch13, as follows:
$ sudo apt-get install build-essential gcc g++ python git
mercurial unzip cmake $ hg clone https://bitbucket.org/ljerezchaves/ofswitch13-
$ sudo apt-get install libpcap-dev libxerces-c-dev libpcre3- module src/ofswitch13
dev flex bison $ cd src/ofswitch13
$ sudo apt-get install pkg-config autoconf libtool libboost- $ hg update 3.1.0
dev $ cd ../..
$ patch -p1 < src/ofswitch13/utils/ofswitch13-src-3_26.patch
In order to utilise ofsoftswitch13 as a static library, you $ patch -p1 < src/ofswitch13/utils/ofswitch13-doc-3_26.patch
need to introduce the Netbee library, as the ofsoftswitch13
library code relies upon it. The file ofswitch13-src-3_26.patch will allow OFSwitch
to get raw packets from nodes (devices). To do this, it will
$ wget https://bitbucket.org/ljerezchaves/ofswitch13-module/ create a new OpenFlow receive callback at CsmaNetDevice
downloads/nbeesrc.zip and virtualNetDevice. The file ofswitch13-doc-3_26.patch is
$ unzip nbeesrc.zip (for unzipping) optional but preferable.
$ cd netbee/src/ After successful installation, configure the module, as follows:
$ cmake .
$ make $ ./waf configure --with-ofswitch13=path/to/ofsoftswitch13
$ sudo cp ../bin/libn*.so /usr/local/lib $ ./waf configure --enable-examples --enable-tests
$ sudo ldconfig
$ sudo cp -R ../include/* /usr/include/ Now, we’re all set. Just build the simulator using the
following command:
Now, clone the repository of the ofsoftwsitch13
library, as follows: $ ./waf

$ git clone https://github.com/ljerezchaves/ofsoftswitch13 Enjoy the ns3.26 simulator with the power of SDN, i.e.,
$ cd ofsoftswitch13 OFSwitch 1.3.
$ ./boot.sh
$ ./configure --enable-ns3-lib Simulating the basic network topology with SDN
$ make based OFSwitch
In this section of the article, we’ll simulate a basic network
Integrating OFSwitch with ns-3 topology with three hosts, a switch and a controller.
To install ns-3.26, use the following command: Figure 3 demonstrates the topology of the network
that we want to create. It includes three hosts, one switch
$ hg clone http://code.nsnam.org/ns-3.26 and one controller.

30 | SEPTEMBER 2017 | OPEN SOURCE FOR YOU | www.OpenSourceForU.com

Let’s Try Admin

Here, host2 pings the other two Here, ofswitch13 domain comes into
hosts—host1 and host3. Whenever either Controller action.
of the hosts makes a ping request, it is
forwarded to the switch. This is indicated Ptr<OFSwitch13InternalHelper> of13Helper=Cr
by arrows shown in blue. As this is the eateObject<OFSwitch13InternalHelper>();
first request, the switch’s flow table will of13Helper->InstallController
not contain any entry. This is known as (controllerNode);
a table miss. Thus, the request will be Host 1
//to install
Host 3

forwarded to the controller. controller on node

The controller will instruct the switch Host 2
of13Helper->InstallSwitch (switchNode,
with respect to the routing decision to be switchPorts);
made and to also modify the flow table in Figure 3: Network topology //to install OFSwitch
the switch. This is shown by the arrows in of13Helper->CreateOpenFlowChannels ();
green. The request will then be forwarded by the switch to the
appropriate destination, i.e., to host1 and host3. //for creating channels between Switch and
The next time the same request is forwarded to the switch, controller
the switch’s flow table will contain an entry for that request
and, thus, the switch itself will make routing decisions based on Ipv4AddressHelper ipv4helpr; //set IPv4 addresses
that entry, without the controller in action. Ipv4InterfaceContainer hostIpIfaces;
The explanation for the code to simulate the above ipv4helpr.SetBase (“”, “”);
topology is given below. Only the required extracts of //IPv4 range starts from
the code are given. The accompanying lines of code
demonstrate the extra header documents required for re- hostIpIfaces = ipv4helpr.Assign (hostDevices);
enacting wired systems.
Below lines will configure ping applications between hosts-
#include<ns3/ofswitch13-module.h> V4PingHelper pingHelper = V4PingHelper (hostIpIfaces.
GetAddress (1));
The following lines of code create an object called ‘hosts’ pingHelper.SetAttribute (“Verbose”, BooleanValue (true));
of class NodeContainer common to all the other nodes. Here, ApplicationContainer pingApps1= pingHelper.Install (hosts.
three hosts are created. Get (0));
pingApps1.Start (Seconds (1));
NodeContainer hosts; ApplicationContainer pingApps2 = pingHelper.Install (hosts.
hosts.Create (3); Get (0));
pingApps2.Start (Seconds (1));
Ptr<Node> switchNode = CreateObject<Node> ();
//to create node for switch Here, two hosts and one object of class
ApplicationContainer called pingApps is created.
CsmaHelper csmaHelper; Now the code for simulator to work-
NetDeviceContainer hostDevices;
NetDeviceContainer switchPorts; Simulator::Stop (Seconds (10)); //simulation time is 10
for(size_t i = 0;i<hosts.GetN();i++) //linking between seconds
hosts and switch Simulator::Run ();
{ Simulator::Destroy ();
NodeContainer pair (hosts.Get (i), switchNode);
NetDeviceContainer link = csmaHelper.Install (pair); It is recommended that you save ofswitch13-modify.cc at
hostDevices.Add (link.Get (0)); //two way linking this path (ns-dev/scratch/).
switchPorts.Add (link.Get (1)); To run the program, use the following command:
$ ./waf --run ofswitch13-modify

Ptr<Node> controllerNode = CreateObject<Node> (); The output is given in Figure 4.

//to create node for As shown in the figure, the host with the IP address
controller pings the other two hosts, and the ping is

www.OpenSourceForU.com | OPEN SOURCE FOR YOU | SEPTEMBER 2017 | 31

Some of our recent courses:

On Udemy:

 GCP: Complete Google Data Engineer and Cloud Architect


 Time Capsule: Trends in Tech, Product Strategy

 Show And Tell: Sikuli - Pattern-Matching and Automation

On Pluralsight:

 Understanding the Foundations of TensorFlow

 Working with Graph Algorithms in Python

 Building Regression Models in TensorFlow

Our Content:
 The Careers in Computer Science Bundle
9 courses | 139 hours

 The Complete Machine Learning Bundle

10 courses | 63 hours

 The Complete Computer Science Bundle

8 courses | 78 hours

 The Big Data Bundle

9 courses | 64 hours

 The Complete Web Programming Bundle

8 courses | 61 hours

 The Complete Finance & Economics Bundle

9 courses | 56 hours

 The Scientific Essentials Bundle

7 courses | 41 hours

 ~30 courses on Pluralsight

~80 on StackSocial
~75 on Udemy

About Us:

 ex-Google | Stanford | INSEAD

 80,000+ students
Admin Let’s Try

Figure 4: Output of ofswitch13-modify.cc

Figure 6: Position of hosts in NetAnim

Figure 5: Output of the log file

successful with nine packets being transmitted. The time

statistics are also shown.
In the program’s code, to view the log file, set trace=true.
This will generate switch-stats-1.log in the ns-dev folder.

Visualising the basic working of the controller

switch using Net Animator
Network Animator, also known as NetAnim, is used Figure 7: Packet flow in NetAnim
to graphically portray projects in ns-3. It is an offline
animator, which animates the XML file generated during the The other three nodes are the created hosts.
simulation program in ns-3. ns-2 has many default animators ƒ Node 0:
for use but ns-3 is furnished with no default animator. So, ƒ Node 1:
we have to integrate NetAnim with ns-3. NetAnim version ƒ Node 2:
3.107 is used for visualising. Figure 7 is a screenshot of the generated XML file with
To visualise the above program code (ofswitch13-modify. graphical simulation in NetAnim.
cc) in NetAnim, take the following steps.
Add the following few lines of code: References
[1] https://www.nsnam.org/docs/release/3.26/doxygen/
#include <ns3/netanim-module.h> index.html
// extra header file for Network Animator in ns3 [2] https://www.opennetworking.org/images/stories/
AnimationInterface::SetConstantPosition (hosts.Get(0),50,50);
[3] http://www.lrc.ic.unicamp.br/ofswitch13/ofswitch13.pdf
AnimationInterface::SetConstantPosition (hosts.Get(1),10,60); The source repository can be downloaded from:
AnimationInterface::SetConstantPosition (hosts.Get (240, 25); https://bitbucket.org/yashsquare/ns3_support
AnimationInterface anim (“ofs13-modify.xml”);
By: Radha Govani, Yash Modi and Jitendra Bhatia
The above four lines of code will set the position of the Radha Govani and Yash Modi are open source enthusiasts.
hosts (nodes) at the given coordinates on the X-Y plane (refer You can contact them at radhagovani@gmail.com and
to the screenshot in Figure 6), and then generate the ofs13- yashnimeshmodi@gmail.com.
modify.xml file for the ofswitch13-modify.cc. Jitendra Bhatia works as assistant professor at Vishwakarma
In Figure 6, the node with the IP address 10.100.1 (the Government Engineering College. You can contact him at
upper left corner) represents OFSwitch with SDN controller.

34 | SEPTEMBER 2017 | OPEN SOURCE FOR YOU | www.OpenSourceForU.com

Let’s Try Admin

Taming the Cloud:

Provisioning with Terraform
Terraform is open source software that enables sysadmins and developers to write, plan
and create infrastructure as code. It is a no-frills software package, which is very simple
to set up. It uses a simple configuration language or JSON, if you wish.

erraform is a tool to create and manage infrastructure it anywhere in your executable’s search path and all is ready to
that works with various IaaS, PaaS and SaaS service run. The following script could be used to download, unzip and
providers. It is very simple to set up and use, as there verify the set-up on your GNU/Linux or Mac OS X nodes:
aren’t multiple packages, agents and servers, etc, involved.
You just declare your infrastructure in a single (or multiple) HCTLSLOC=’/usr/local/bin’
file using a simple configuration language (or JSON), and HCTLSURL=’https://releases.hashicorp.com’
that’s it. Terraform takes your configurations, evaluates the # use latest version shown on https://www.terraform.io/
various building blocks from those to create a dependency downloads.html
graph, and presents you a plan to create the infrastructure. TRFRMVER=’x.y.z’
When you are satisfied with the creation plan, you apply the
configurations and Terraform creates independent resources in if uname -v | grep -i darwin 2>&1 > /dev/null
parallel. Once some infrastructure is created using Terraform, then
it compares the current state of the infrastructure with the OS=’darwin’
declared configurations on subsequent runs, and only acts else
upon the changed part of the infrastructure. Essentially, it is OS=’linux’
a CRUD (Create Read Update Destroy) tool and acts on the fi
infrastructure in an idempotent manner.
wget -P /tmp --tries=5 -q -L “${HCTLSURL}/
Installation and set-up terraform/${TRFRMVER}/terraform_${TRFRMVER}_${OS}_amd64.zip”
Terraform is created in Golang, and is provided as a static sudo unzip -o “/tmp/terraform_${TRFRMVER}_${OS}_amd64.zip” -d
binary without any install dependencies. You just pick the “${HCTLSLOC}”
correct binary (for GNU/Linux, Mac OS X, Windows, rm -fv “/tmp/terraform_${TRFRMVER}_${OS}_amd64.zip”
FreeBSD, OpenBSD and Solaris) from its download site, unzip terraform version

www.OpenSourceForU.com | OPEN SOURCE FOR YOU | SEPTEMBER 2017 | 35

Admin Let’s Try

Concepts that you need to know the comparison of states, it only shows or applies the
You only need to know a few concepts to start using difference required to bring the infrastructure to the
Terraform quickly to create the infrastructure you desire. desired state as per its configuration. In this way, it
‘Providers’ are some of the building blocks in Terraform creates/maintains the whole infra in an idempotent
which abstract different cloud services and back-ends to manner at every apply stage. You could mark various
actually CRUD various resources. Terraform gives you resources manually to get updated in the next apply
different providers to target different service providers phase using the taint operation. You could also clean
and back-ends, e.g., AWS, Google Cloud, Digital Ocean, up the infra created, partially or fully, with the destroy
Docker and a lot of others. You need to provide different operation.
attributes applicable to the targeted service/ back-end
like the access/secret keys, regions, endpoints, etc, to Working examples and usage
enable Terraform to create and manage various cloud/ Our first example is to clarify the syntax for various
back-end resources. Different providers offer various sections in Terraform configuration files. Download
resources which correspond to different building blocks, the code example1.tf from http://opensourceforu.com/
e.g., VMs, storage, networking, managed services, etc. article_source_code/sept17/terraform.zip. The code is
So only a single provider is required to make use of all a template to bring up multiple instances of AWS EC2
the resources implemented in Terraform, to create and VMs with Ubuntu 14.04 LTS and an encrypted EBS data
manage infrastructure for a service or back-end. There volume, in a specified VPC subnet, etc. The template
are ‘provisioners’ that correspond to different resources also does remote provisioning on the instance(s) brought
to initialise and configure those resources after their up by transferring a provisioning script and doing some
creation. The provisioners mainly do tasks like uploading remote execution.
files, executing remote/local commands/scripts, running Now, let’s dissect this example, line by line, in
configuration management clients, etc. order to practically explore the Terraform concepts.
You need to describe your infrastructure using a The lines starting with the keyword variable are
simple configuration language in single or multiple starting the blocks of input variables to store values.
files, all with the .tf extension. The configuration model The variable blocks allow the assigning of some initial
of Terraform is declarative, and it mainly merges all the values used as default or no values at all. In case of no
.tf files in its working directory at runtime. It resolves default values, Terraform will prompt for the values at
the dependencies between various resources by itself runtime, if these values are not set using the option -var
to create the correct final dependency graph, to bring ‘<variable>=<value>’. So, in our example, sensitive
up independent resources in parallel. Terraform could data like AWS access/private keys are not being put in
use JSON as well for its configuration language, but the template as it is advisable to supply these at runtime,
that works better when Terraform configurations are manually or through the command options or through
generated by automated tools. The Terraform format is environment variables. The environment variables
more human-readable and supports comments, so you should be in the form of TF_VAR_name to let Terraform
could mix and match .tf and .json configuration files read it. The variables could hold string, list and map
in case some things are human coded and others are types of values, e.g., storing a map of different amis
tool generated. Terraform also provides the concepts of and subnets for different AWS regions as demonstrated
variables, and functions working on those variables, to in our example. The string value is contained in double
store, assign and transform various things at runtime. quotes, lists in square brackets and maps in curly braces.
The general workflow of Terraform consists of two The variables are referenced, and their values are
stages —to plan and apply. The plan stage evaluates extracted through interpolation at different places using
the merged (or overridden) configs, and presents a plan the syntax ${var.<variable name>}. You could explore
before the operator about which resources are going everything about Terraform variables on the official
to get created, modified and deleted. So the changes variables help page.
required to create your desired infrastructure are pretty It’s easy to guess that the block starting with the
clear at the plan stage itself and there are no surprises at keyword provider is declaring and supplying the
runtime. Once you are satisfied with the plan generated, arguments for the service/back-end. The different
the apply stage initiates the sequence to create the providers take different arguments based upon the
resources required to build your declared infrastructure. service/back-end being used and you could explore those
Terraform keeps a record of the created infra in a state in detail on the official providers page. The resource
file (default, terraform.tfstate) and on every further keyword contains the main meat in any Terraform
plan-and-apply cycle, it compares the current state configuration. We are using two AWS building blocks
of the infra at runtime with the cached state. After in our example: aws_instance to bring up instances

36 | SEPTEMBER 2017 | OPEN SOURCE FOR YOU | www.OpenSourceForU.com

Let’s Try Admin

and aws_route53_record to create cname records for we execute the command terraform plan -var ‘num_
the instances created. Every resource block takes up nds=”3”’ after exporting the TF_VAR_aws_access_key
some arguments to customise the resource(s) it creates and TF_VAR_aws_access_key, in the working directory
and exposes some attributes of the resource(s) created. where the first example config was created:
Each resource block starts with resource <resource
type> <resource id>, and the important thing is that the + aws_instance.test.0
<resource type> <resource id> combination should be ...
unique in the same Terraform configuration scope. The + aws_instance.test.1
prefix of each resource is linked to its provider, e.g., all ...
the AWS prefix resources require an AWS provider. The + aws_instance.test.2
simple form of accessing the attribute of a resource is ...
<resource type>.<id>.<attribute>. Our example shows + aws_route53_record.test.0
that the public_ip and public_dns attributes of the created ...
instances are being accessed in route53 and output blocks. + aws_route53_record.test.1
Some of the resources require a few post-creation ...
actions like connecting and running local and/or remote + aws_route53_record.test.2
commands, scripts, etc, on AWS instance(s). The
connection block is declared to connect to that resource, Plan: 6 to add, 0 to change, 0 to destroy.
e.g., by creating a ssh connection to the created instances
in our example. The provisioner blocks are the mechanisms If there is some error in the configuration, then that
to use the connection to upload file(s) and the directory to will come up in the plan phase only and Terraform
the created resource(s). The provisioners also run local or dumps the parsing errors. You can explicitly verify the
remote commands, while Chef runs concurrently. You could configuration for any issue using the terraform validate
explore those aspects in detail on the official provisioners command. If all is good, then the plan phase dumps
help page. Our example is uploading a provisioning script the resources it’s going to create (indicated by the +
and kicking that off remotely over ssh to provision the sign before the resources’ names, in green colour) to
created instances out-of-the-box. Terraform provides some converge to the declared model of the infrastructure.
meta-parameters available to all the resources, like the Similarly, the Terraform plan output represents the
count argument in our example. The count.index keeps resources it’s going to delete in red (indicated by
track of the current resource being created to reference that the – sign) and the resources it will update in yellow
now or later, e.g., we are creating a unique name tag for (indicated by the ~ sign). Once you are satisfied with
each instance created, in our example. Terraform deducts the plan of resources creation, you can run terraform
the proper dependencies as we are referencing the attribute apply to apply the plan and actually start creating the
of aws_instance in aws_route53_record; so it creates the infrastructure.
instances before creating their cname records. You could Our second example is to get you more comfortable
use meta-variable depends_on in cases where there is no with Terraform, and use its advanced features to
implicit dependency between resources and you want to create and orchestrate some non-trivial scenarios.
ensure that explicitly. The above-mentioned variables help The code example2.tf can be downloaded from http://
the page provide detailed information about the meta- opensourceforu.com/article_source_code/sept17/
variables too. terraform.zip. It actually automates the task of bringing
The last block declared in our example configuration up a working cluster out-of-the-box. It brings up a
is the output block. As is evident by the name itself, the configurable number of multi-disk instances from the
output could dump the raw or transformed attributes of cluster payload AMI, and then initiates a specific order
the resources created, on demand, at any time. You can of remote provisioners using null_resource, some
also see the usage of various functions like the format and provisioners on all the nodes and some only on a specific
the element in the example configuration. These functions one, respectively.
transform the variables into other useful forms, e.g., the In the example2.tf template, multiple null_resource
element function is retrieving the correct public_ip based are triggered in response to the various resources
upon the current index of the instances created. The official created, on which they depend. In this way, you can
interpolation help page provides detailed information about see how easily we can orchestrate some not-so-trivial
the various functions provided by Terraform. scenarios. You can also see the usage of depends_on
Now let’s look at how to decipher the output being meta-variable to ensure a dependency sequence between
dumped when we invoke different phases of the Terraform various resources. Similarly, you can mark those
workflow. We’ll observe the following kind of output if resources created by Terraform that you want to destroy

www.OpenSourceForU.com | OPEN SOURCE FOR YOU | SEPTEMBER 2017 | 37

Admin Let’s Try

or those resources that you wish to create afresh using module to use the same code to provision test and/or
the commands terraform destroy and terraform taint, production clusters. The usage of the module is simply
respectively. The easy way to get quick information about supplying the required variables to it in the manner shown
the Terraform commands and their options/arguments is by below (after running terraform get to create the necessary
typing terraform and terraform <command name> -h. link for the module code):
The recent versions of Terraform have started to
provide data sources, which are the resources to gather module “myvms” {
dynamic information from the various providers. The source = “../modules/awsvms”
dynamic information gathered through the data sources ami_id = “${var.ami_id}”
is used in the Terraform configurations, most commonly inst_type = “${var.inst_type}”
using interpolation. A simple example of a data source is key_name = “${var.key_name}”
to gather the ami id for the latest version of an ami and subnet_id = “${var.subnet_id}”
use that in the instance provisioning configurations as sg_id = “${var.sg_id}”
shown below: num_nds = “${var.num_nds}”
hst_env = “${var.hst_env}”
data “aws_ami” “myami” { apps_pckd = “${var.apps_pckd}”
most_recent = true hst_rle = “${var.hst_rle}”
root_size = “${var.root_size}”
filter { swap_size = “${var.swap_size}”
name = “name” vol_size = “${var.vol_size}”
values = [“MyBaseImage”] zone_id = “${var.zone_id}”
} prov_scrpt= “${var.prov_scrpt}”
} sub_dmn = “${var.sub_dmn}”
resource “aws_instance” “myvm” {
ami = “${data.aws_ami.myami.id} You also need to create a variables.tf in the location of
… your module source, requiring the same variables you fill
} in your module. Here is the module variables.tf to pass the
variables supplied from the caller of the module:
Code organisation and reusability
Although our examples show the entire declarative variable “ami_id” {}
configuration in a single file, we should break it into variable “inst_type” {}
more than one file. You could break your whole config variable “key_name” {}
into various separate configs based upon the respective variable “subnet_id” {}
functionality they provide. So our first example could variable “sg_id” {}
be broken into variables.tf that keeps all the variables variable “num_nds” {}
blocks, aws.tf that declares our provider, instances. variable “hst_env” {}
tf that declares the layout of the AWS VMs, route53. variable “apps_pckd” {}
tf that declares the aws route 53 functionality, and variable “hst_rle” {}
output.tf for our outputs. To keep things simple, use and variable “root_size” {}
maintain, keep everything related to a whole task being variable “swap_size” {}
solved by Terraform in a single directory along with variable “vol_size” {}
sub-directories that are named as files, scripts, keys, variable “zone_id” {}
etc. Terraform doesn’t enforce any hierarchy of code variable “prov_scrpt” {}
organisation, but keeping each high level functionality variable “sub_dmn” {}
in its dedicated directory will save you from unexpected
Terraform actions in spite of unrelated configuration The Terraform official documentation consists of a few
changes. Remember, in the software world, “A detailed sections for modules usage and creation, which should
little copying is better than a little dependency,” provide you more information on everything related to modules.
as things get fragile and complicated easily with
each added functionality. Importing existing resources
Terraform provides the functionality of creating As we have seen earlier, Terraform caches the properties
modules to reuse the configs created. The cluster of the resources it creates into a state file, and by default
creation template shown above is actually put in a doesn’t know about the resources not created through it.

38 | SEPTEMBER 2017 | OPEN SOURCE FOR YOU | www.OpenSourceForU.com

Let’s Try Admin

But recent versions of Terraform have introduced a feature The official Terraform documentation provides clear
to import existing resources not created through Terraform examples to import the various resources into an existing
into its state file. Currently, the import feature only updates Terraform infrastructure. But if you are looking to include
the state file, but the user needs to create the configuration the existing AWS resources in the AWS infra created by
for the imported resources. Otherwise, Terraform will show Terraform in a more automated way, then take a look at the
the imported resources with no configuration and mark Terraforming tool link in the References section.
those for destruction.
Let’s make this clear by importing an AWS instance, Note: Terraform providers are no longer distributed as
which wasn’t brought up through Terraform, into some part of the main Terraform distribution. Instead, they are
Terraform-created infrastructure. You need to run the installed automatically as part of running terraform init.
command terraform import aws_instances.<Terraform The import command requires that imported resources
Resource Name> <id of the instance> in the directory be specified in the configuration file. Please see terraform
changelog https://github.com/hashicorp/terraform/blob/
where a Terraform state file is located. After the successful
v0.10.0/CHANGELOG.md for these.
import, Terraform gathers information about the instance and
adds a corresponding section in the state file. If you see the
Terraform plan now, it’ll show something like what follows: Missing bytes
You should now be feeling comfortable about starting to
- aws_instance .<Terraform Resource Name> automate the provisioning of your cloud infrastructure. To
be frank, Terraform is so feature-rich now that it can’t be
So it means that now you need to create a corresponding fully covered in a single or multiple articles and deserves a
configuration in an existing or new .tf file. In our example, dedicated book (which has already shaped up in the form of
the following Terraform section should be enough to not let an ebook, ‘Terraform Up & Running’). So you could further
Terraform destroy the imported resource. take a look at the examples provided in its official Git repo.
Also, the References section offers a few pointers to some
resource “aws_instance” “<Terraform Resource Name>” { excellent reads to make you more comfortable and confident
ami = “<AMI>” with this excellent cloud provisioning tool.
instance_type = “<Sizing info>” Creating on-demand and scalable infrastructure in
the cloud is not very difficult if some very simple basic
tags { principles are adopted and implemented using some feature-
... rich but no-fuss, easy-to-use standalone tools. Terraform
} is an indispensable tool for creating and managing cloud
} infrastructure in an idempotent way across a number of
cloud providers. It could further be glued together with
Please note that you only need to mention the Terraform some other management pieces to create an immutable
resource attributes that are required as per the Terraform infrastructure workflow that can tame any kind of modern
document. Now, if you see the Terraform plan, the earlier cloud infrastructure. The ‘Terraform Up and Running’ ebook
shown destruction plan goes away for the imported resource. is already out in the form of a print book.
You could use the following command to extract the attributes
of the imported resource to create its configuration:
sed -n ‘/aws_instance.<Terraform Resource Name>/,/}/p’
[1] Terraform examples: https://github.com/hashicorp/
terraform.tfstate | \ terraform/tree/master/examples
grep -E ‘ami|instance_type|tags’ | grep -v ‘%’ | sed ‘s/^ [2] Terraforming tool: https://github.com/dtan4/terraforming
[3] A Comprehensive Guide to Terraform: https://blog.
*//’ | sed ‘s/:/ =/’
Please pay attention when you import a resource into [4] Terraform Up & Running: http://www.
your current Terraform state and decide not to use that comprehensive-terraform
going forward. In which case, don’t forget to rename your
terraform.state.backup as terraform.state file to roll back
By: Ankur Kumar
to the previous state. You could also delete that resource
block from your state file, as an alternative, but it’s not a The author is a systems and infrastructure developer/
architect and FOSS researcher, currently based in the
recommended approach. Otherwise, Terraform will try to
US. You can find some of his other writings on FOSS
delete the imported but not desired resource and that could at: https://github.com/richnusgeeks.
be catastrophic in some cases.

www.OpenSourceForU.com | OPEN SOURCE FOR YOU | SEPTEMBER 2017 | 39

Admin Let’s Try

Visualising the Response Time of

a Web Server Using Wireshark
The versatile Wireshark tool can be put to several uses. This article presents a tutorial on using
Wireshark to discover and visualise the response time of a Web server.

ireshark is a cross-platform network analysis tool used bounding box in Figure 1 for available interfaces.
to capture packets in real-time. Wireshark includes In this tutorial, we are going to capture Wi-Fi packets, so
filters, flow statistics, colour coding, and other the option ‘Wi-Fi’ has been selected (if you wish to capture
features that allow you to get a deep insight into network traffic the packets using Ethernet or any other interface, select the
and to inspect individual packets. Discovering the delayed HTTP corresponding options).
responses for a particular HTTP request from a particular PC is a Step 2: Here, we make a request to http://www.
tedious task for most admins. This tutorial will teach readers how wikipedia.org and, as a result, Wikipedia sends an HTTP
to discover and visualise the response time of a Web server using response of ‘200 OK’, which indicates the requested
Wireshark. OSFY has published many articles on Wireshark, action was successful. ‘200 OK’ implies that the response
which you can refer to for a better understanding of the topic. contains a payload, which represents the status of the
Step 1: Start capturing the packets using Wireshark on a requested resource (the request is successful). Now filter
specified interface to which you are connected. Refer to the all the HTTP packets as shown in Figure 2, as follows:

syntax: http

Figure 1: Interface selection Figure 2: Filtering HTTP

40 | SEPTEMBER 2017 | OPEN SOURCE FOR YOU | www.OpenSourceForU.com

Let’s Try Admin

Step 3: We now filter the requests and response

sent from the local PC to Wikipedia and vice versa.
Start filtering the IP of www.wikipedia.org (a simple
traceroute or pathping can reveal the IP address of any
Web server) and your local PC IP (a simple ipconfig
for Windows and ifconfig for Linux can reveal your
local PC IP).

Syntax: ip.addr== && ip.addr ==

Step 4: In order to view the response of HTTP,

Figure 3: Allow sub-dissector to reassemble TCP streams right-click on any response packet (HTTP/1.1). Go
to Protocol preference and then uncheck the sub-
dissector to reassemble TCP streams (marked and
shown in Figure 3).
ƒ If the TCP preference ‘Allow sub-dissector to
reassemble TCP streams’ is off, the http.time will
be the time between the GET request and the first
packet of the response, the one containing ‘OK’.
ƒ If ‘Allow sub-dissector to reassemble TCP
streams’ is on and the HTTP reassembly
preferences have been left at their defaults (on),
http.time will be the time between the GET
request and the last packet of the response.
ƒ Procedure: Right-click on any HTTP response
Figure 4: Response time packet -> Protocol preference -> uncheck
‘Reassemble HTTP headers spanning multiple
TCP segments’ and ‘Reassemble HTTP bodies
spanning multiple TCP segments’.
Step 5: Create a filter based on the response
time as shown in Figure 4, and visualise the HTTP
responses using an I/O graph as shown in Figure 5.

Syntax: http.time >= 0.050000

Step 6: To calculate the delta (delay) time between

request and response, use Time Reference (CTRL-T in
the GUI) for easy delta time calculation.
Step 7: In order to display only the HTTP response,
Figure 5: Statistics --> I/O graph add a filter http.time >=0.0500 in the display filter. The
graph, as shown in Figure 6, depicts the result of the
HTTP responses (delta time).

By: M. Kannan, Poomanam and Prem Latha

M. Kannan is an associate professor and head of the
department of electronics engineering, Madras Institute
of Technology. His research interests include computer
networks, VLSI, embedded systems and wireless security.
Poomanam and Prema Latha are specialists in VLSI at
the Madras Institute of Technology, Anna University. Their
research interests include computer networks, VLSI and
Figure 6: Visualisation of HTTP responses embedded design.

www.OpenSourceForU.com | OPEN SOURCE FOR YOU | SEPTEMBER 2017 | 41

Admin How To

DevOps Series
Creating a Virtual Machine for
Erlang/OTP Using Ansible

This seventh article in the DevOps series is a tutorial on how to

create a test virtual machine (VM) to compile, build, and test Erlang/
OTP from its source code. You can then adapt the method to create
different VMs for various Erlang releases.


rlang is a programming language designed by Ericsson The IP address of the guest CentOS 6.8 VM is added to
primarily for soft real-time systems. The Open Telecom the inventory file as shown below:
Platform (OTP) consists of libraries, applications
and tools to be used with Erlang to implement services that erlang ansible_host= ansible_connection=ssh
require high availability. In this article, we will create a test ansible_user=bravo ansible_password=password
virtual machine (VM) to compile, build, and test Erlang/OTP
from its source code. This allows you to create VMs with An entry for the erlang host is also added to the /etc/hosts
different Erlang release versions for testing. file as indicated below:
The Erlang programming language was developed by
Joe Armstrong, Robert Virding and Mike Williams in 1986 erlang
and released as free and open source software in 1998. It
was initially designed to work with telecom switches, but is A ‘bravo’ user account is created on the test VM, and is
widely used today in large scale, distributed systems. Erlang added to the ‘wheel’ group. The /etc/sudoers file also has the
is a concurrent and functional programming language, and is following line uncommented, so that the ‘bravo’ user will be
released under the Apache License 2.0. able to execute sudo commands:

Setting it up ## Allows people in group wheel to run all commands

A CentOS 6.8 virtual machine (VM) running on KVM is used %wheel ALL=(ALL) ALL
for the installation. Internet access should be available from
the guest machine. The VM should have at least 2GB of RAM We can obtain the Erlang/OTP sources from a stable
allotted to build the Erlang/OTP documentation. The Ansible tarball, or clone the Git repository. The steps involved in both
version used on the host (Parabola GNU/Linux-libre x86_64) these cases are discussed below.
is The ansible/ folder contains the following files:
Building from the source tarball
ansible/inventory/kvm/inventory The Erlang/OTP stable releases are available at http://www.
ansible/playbooks/configuration/erlang.yml erlang.org/downloads. The build process is divided into many

42 | SEPTEMBER 2017 | OPEN SOURCE FOR YOU | www.OpenSourceForU.com

Admin How To

steps, and we shall go through each one of them. The version - name: Download and extract Erlang source tarball
of Erlang/OTP can be passed as an argument to the playbook. unarchive:
Its default value is the release 19.0, and is defined in the src: “http://erlang.org/download/{{ ERL_VERSION }}.tar.
variable section of the playbook as shown below: gz”
dest: “{{ ERL_DIR }}”
vars: remote_src: yes
ERL_VERSION: “otp_src_{{ version | default(‘19.0’) }}”
ERL_DIR: “{{ ansible_env.HOME }}/installs/erlang” The ‘configure’ script is available in the sources, and
ERL_TOP: “{{ ERL_DIR }}/{{ ERL_VERSION }}” it is used to generate the Makefile based on the installed
TEST_SERVER_DIR: “{{ ERL_TOP }}/release/tests/test_server” software. The ‘make’ command will build the binaries from
the source code.
The ERL_DIR variable represents the directory where the
tarball will be downloaded, and the ERL_TOP variable refers - name: Build the project
to the top-level directory location containing the source code. command: “{{ item }} chdir={{ ERL_TOP }}”
The path to the test directory from where the tests will be with_items:
invoked is given by the TEST_SERVER_DIR variable. - ./configure
Erlang/OTP has mandatory and optional package - make
dependencies. Let’s first update the software package environment:
repository, and then install the required dependencies as ERL_TOP: “{{ ERL_TOP }}”
indicated below:
After the ‘make’ command finishes, the ‘bin’ folder in
tasks: the top-level sources directory will contain the Erlang ‘erl’
- name: Update the software package repository interpreter. The Makefile also has targets to run tests to verify
become: true the built binaries. We are remotely invoking the test execution
yum: from Ansible and hence -noshell -noinput are passed as
name: ‘*’ arguments to the Erlang interpreter, as shown in the .yaml file.
update_cache: yes
- name: Prepare tests
- name: Install dependencies command: “{{ item }} chdir={{ ERL_TOP }}”
become: true with_items:
package: - make release_tests
name: “{{ item }}” environment:
state: latest ERL_TOP: “{{ ERL_TOP }}”
- wget - name: Execute tests
- make shell: “cd {{ TEST_SERVER_DIR }} && {{ ERL_TOP }}/bin/erl
- gcc -noshell -noinput -s ts install -s ts smoke_test batch -s
- perl init stop”
- m4
- ncurses-devel You need to verify that the tests have passed successfully
- sed by checking the $ERL_TOP/release/tests/test_server/index.
- libxslt html page in a browser. A screenshot of the test results is
- fop shown in Figure 1.
The built executables and libraries can then be installed
The Erlang/OTP sources are written using the ‘C’ on the system using the make install command. By default,
programming language. The GNU C Compiler (GCC) the install directory is /usr/local.
and GNU Make are used to compile the source code. The
‘libxslt’ and ‘fop’ packages are required to generate the - name: Install
documentation. The build directory is then created, the source command: “{{ item }} chdir={{ ERL_TOP }}”
tarball is downloaded and it is extracted to the directory with_items:
mentioned in ERL_DIR. - make install
become: true
- name: Create destination directory environment:
file: path=”{{ ERL_DIR }}” state=directory ERL_TOP: “{{ ERL_TOP }}”

44 | SEPTEMBER 2017 | OPEN SOURCE FOR YOU | www.OpenSourceForU.com

How To Admin


- name: Update the software package repository
become: true
name: ‘*’
update_cache: yes

- name: Install dependencies

become: true
name: “{{ item }}”
state: latest
- wget
- make
- gcc
- perl
- m4
Figure 1: Test results - ncurses-devel
- sed
The documentation can also be generated and installed as - libxslt
shown below: - fop

- name: Make docs - name: Create destination directory

shell: “cd {{ ERL_TOP }} && make docs” file: path=”{{ ERL_DIR }}” state=directory
ERL_TOP: “{{ ERL_TOP }}” - name: Download and extract Erlang source tarball
FOP_HOME: “{{ ERL_TOP }}/fop” unarchive:
FOP_OPTS: “-Xmx2048m” src: “http://erlang.org/download/{{ ERL_VERSION
- name: Install docs dest: “{{ ERL_DIR }}”
become: true remote_src: yes
shell: “cd {{ ERL_TOP }} && make install-docs”
environment: - name: Build the project
ERL_TOP: “{{ ERL_TOP }}” command: “{{ item }} chdir={{ ERL_TOP }}”
The total available RAM (2GB) is specified in the - ./configure
FOP_OPTS environment variable. The complete playbook to - make
download, compile, execute the tests, and also generate the environment:
documentation is given below: ERL_TOP: “{{ ERL_TOP }}”
- name: Setup Erlang build - name: Prepare tests
hosts: erlang command: “{{ item }} chdir={{ ERL_TOP }}”
gather_facts: true with_items:
tags: [release] - make release_tests
vars: ERL_TOP: “{{ ERL_TOP }}”
ERL_VERSION: “otp_src_{{ version | default(‘19.0’) }}”
ERL_DIR: “{{ ansible_env.HOME }}/installs/erlang” - name: Execute tests
ERL_TOP: “{{ ERL_DIR }}/{{ ERL_VERSION }}” shell: “cd {{ TEST_SERVER_DIR }} && {{ ERL_TOP }}/bin/
TEST_SERVER_DIR: “{{ ERL_TOP }}/release/tests/test_ erl -noshell -noinput -s ts install -s ts smoke_test batch -s

www.OpenSourceForU.com | OPEN SOURCE FOR YOU | SEPTEMBER 2017 | 45

Admin How To

init stop” become: true

- name: Install name: “{{ item }}”
command: “{{ item }} chdir={{ ERL_TOP }}” state: latest
with_items: with_items:
- make install - wget
become: true - make
environment: - gcc
ERL_TOP: “{{ ERL_TOP }}” - perl
- m4
- name: Make docs - ncurses-devel
shell: “cd {{ ERL_TOP }} && make docs” - sed
environment: - libxslt
ERL_TOP: “{{ ERL_TOP }}” - fop
FOP_HOME: “{{ ERL_TOP }}/fop” - git
FOP_OPTS: “-Xmx2048m” - autoconf

- name: Install docs - name: Create destination directory

become: true file: path=”{{ ERL_DIR }}” state=directory
shell: “cd {{ ERL_TOP }} && make install-docs”
environment: - name: Clone the repository
ERL_TOP: “{{ ERL_TOP }}” git:
repo: “https://github.com/erlang/otp.git”
The playbook can be invoked as follows: dest: “{{ ERL_DIR }}/otp”

$ ansible-playbook -i inventory/kvm/inventory playbooks/ - name: Build the project

configuration/erlang.yml -e “version=19.0” --tags “release” -K command: “{{ item }} chdir={{ ERL_TOP }}”
Building from the Git repository - ./otp_build autoconf
We can build the Erlang/OTP sources from the Git - ./configure
repository too. The complete playbook is given below - make
for reference: environment:
ERL_TOP: “{{ ERL_TOP }}”
- name: Setup Erlang Git build
hosts: erlang The ‘git’ and ‘autoconf’ software packages are required
gather_facts: true for downloading and building the sources from the Git
tags: [git] repository. The Ansible Git module is used to clone the
remote repository. The source directory provides an otp_build
vars: script to create the configure script. You can invoke the above
GIT_VERSION: “otp” playbook as follows:
ERL_DIR: “{{ ansible_env.HOME }}/installs/erlang”
ERL_TOP: “{{ ERL_DIR }}/{{ GIT_VERSION }}” $ ansible-playbook -i inventory/kvm/inventory playbooks/
TEST_SERVER_DIR: “{{ ERL_TOP }}/release/tests/test_ configuration/erlang.yml --tags “git” -K
You are encouraged to read the complete installation
tasks: documentation at https://github.com/erlang/otp/blob/master/
- name: Update the software package repository HOWTO/INSTALL.md.
become: true
name: ‘*’ By: Shakthi Kannan
update_cache: yes
The author is a free software enthusiast and blogs at
- name: Install dependencies

46 | SEPTEMBER 2017 | OPEN SOURCE FOR YOU | www.OpenSourceForU.com

Overview Admin

An Introduction to govcsim
(a vCenter Server Simulator)
govcsim is a vCenter Server and ESXi API based simulator that offers a quick fix
solution for prototyping and testing code. It simulates the vCenter Server model and
can be used to create data centres, hosts, clusters, etc.

g ovcsim (a vCenter Server simulator) is an open source

vCenter Server and ESXi API based simulator written
in the Go language, using the govmomi library. govcsim
simulates the vCenter Server model by creating various vCenter
The requirements for installing govcsim are:
1. Golang 1.7+
2. Git
Step 1: Installing Golang
related objects like data centres, hosts, clusters, resource pools, To install the Go tools, type the following command
networks and datastores. at the terminal:
If you are a software developer or quality engineer who
works with vCenter and related technologies, then you can use $ sudo dnf install -y golang
govcsim for fast prototyping and for testing your code.
In this article, we will write an Ansible Playbook to gather Step 2: Configuring the Golang workspace
all VMs installed on a given govcsim installation. Ansible Use the following commands to configure the
provides many modules for managing and maintaining VMware Golang workspace:
resources. (You can find out more about Ansible modules
for managing VMware at http://docs.ansible.com/ansible/ $ mkdir -p $HOME/go
list_of_cloud_modules.html#vmware.) Do note that govcsim $ echo ‘export GOPATH=$HOME/go’ >> $HOME/.bashrc
will simulate almost the identical environments provided by $ source $HOME/.bashrc
VMWare vCenter and ESXi server.
Check if everything is working by using the
Installation command given below:
We will use Fedora 26 for the installation of govcsim. Let’s assume
that Ansible has been already installed using dnf or a source tree. $ go env GOPATH

www.OpenSourceForU.com | OPEN SOURCE FOR YOU | SEPTEMBER 2017 | 47


Open Source Day
and You Cloud
(talks on hybrid app development
and Big Data
(Success Stories) for the enterprise use)

Database Day Cyber

Security Day

Open Container Day

Source in IoT



Asia’s #1 Conference
on Open Source

Register Now!

Stall Booking & Partnering Opportunities Open Now

For more details, call Omar on +91 995 8855 993 or write to info@osidays.com.

Gold Partner Silver Partner Associate Partners

Admin Overview

Figure 3: Ansible Playbook to get details about the virtual machine

Figure 1: Getting help from vcsim

Figure 2: Starting vcsim without any parameters

This should return your home directory path with the

Go workspace.
Step 3: Download govcsim using the ‘go get’ command
Figure 4: Ansible in action
$ go get github.com/vmware/govmomi/vcsim
$ $GOPATH/bin/vcsim -h Ansible at https://docs.ansible.com/ansible/.
After running the playbook from Figure 3, you will get
If everything is configured correctly, you will be able a list of virtual machine objects that are simulated by the
to get the help options related to govcsim. govcsim server (see Figure 4).
You can play around and write different playbooks to get
To start govcsim without any argument, use the information about govcsim simulated VMware objects.
following command:

$ vcsim References
[1] Ansible documentation: https://docs.ansible.com/ansible
Now, govcsim is working. You can check
[2] Govcsim: https://github.com/vmware/govmomi/tree/
out the various methods available by visiting master/vcsim on your favourite browser.
By: Abhijeet Kasurde
Testing govcsim with Ansible
Now, let’s try to write a simple Ansible Playbook, which The author works at Red Hat and is a FOSS evangelist. He
loves to explore new technologies and software. You can
will list down all VMs emulated by govcsim. The complete contact him at abhijeetkasurde21@gmail.com.
code is given in Figure 3. You can read up more about

The latest from the Open Source world is here.

Join the community at facebook.com/opensourceforu
Follow us on Twitter @OpenSourceForU

50 | SEPTEMBER 2017 | OPEN SOURCE FOR YOU | www.OpenSourceForU.com

Insight Admin

Serverless Architectures:
Demystifying Serverless Computing
Serverless architectures refer to applications that depend a lot on third party
services known as BaaS (Backend as a Service), or on custom code which
runs on FaaS (Function as a Service).

n the 1990s, Neal Ford (now at ThoughtWorks) was will end the discussion by looking at the limitations of the
working in a small company that focused on a technology serverless approach.
called Clipper. By writing an object-oriented framework
based on Clipper, DOS applications were built using dBase. Why serverless?
With the expertise the firm had on Clipper, it ran a thriving Most of us remember using server machines of one form or
training and consulting business. Then, all of a sudden, this another. We remember logging remotely to server machines
Clipper-based business disappeared with the rise of Windows. and working with them for hours. We had cute names for the
So Neal Ford and his team went scrambling to learn and adopt servers - Bailey, Daisy, Charlie, Ginger, and Teddy - treating
new technologies. “Ignore the march of technology at your them well and taking care of them fondly. However, there
peril,” is the lesson that one can learn from this experience. were many problems in using physical servers like these:
Many of us live inside ‘technology bubbles’. It is easy to ƒ Companies had to do capacity planning and predict their
get cozy and lose track of what is happening around us. All future resource requirements.
of a sudden, when the bubble bursts, we are left scrambling ƒ Purchasing servers meant high capital expenses (capex)
to find a new job or business. Hence, it is important to stay for companies.
relevant. In the 90s, that meant catching up with things like ƒ We had to follow lengthy procurement processes to
graphical user interfaces (GUIs), client/server technologies purchase new servers.
and later, the World Wide Web. Today, relevance is all about ƒ We had to patch and maintain the servers … and so on.
being agile and leveraging the cloud, machine learning, The cloud and virtualisation provided a level of flexibility
artificial intelligence, etc. that we hadn’t known with physical servers. We didn’t have
With this background, let’s delve into serverless to follow lengthy procurement processes, or worry about
computing, which is an emerging field. In this article, readers who ‘owns the server’, or why only a particular team had
will learn how to employ the serverless approach in their ‘exclusive access to that powerful server’, etc. The task of
applications and discover key serverless technologies; we procuring physical machines became obsolete with the arrival

www.OpenSourceForU.com | OPEN SOURCE FOR YOU | SEPTEMBER 2017 | 51

Admin Insight

of virtual machines (VMs) and the cloud. The architecture The solution is to set up some compute capacity to process
we used also changed. For example, instead of scaling up data from a database and also execute this logic in a language
by adding more CPUs or memory to physical servers, we of choice. For example, if you are using the AWS platform,
started ‘scaling out’ by adding more machines as needed, you can use DynamoDB for the back-end, write programming
but, in the cloud. This model gave logic as Lambda functions, and
us the flexibility of an opex-based AWS expose them through the AWS API
(operational expenses-based) Lambda
Gateway with a load balancer. This
revenue model. If any of the VMs entire set-up does not require you to
went down, we got new VMs Key provision any infrastructure or have
spawned in minutes. In short, we Apache
Serverless MS Azure any knowledge about underlying
started treating servers as ‘cattle’ Platforms Functions servers/VMs in the cloud. You can
and not ‘pets’. use a database of your choice for
However, the cloud and Google the back-end. Then choose any
virtualisation came with their own Functions programming language supported
problems and still have many in AWS Lambda, including Java,
limitations. We are still spending a Figure 1: Key serverless platforms Python, JavaScript, and C#. There is
lot of time managing them — for no cost involved if there aren’t any
example, bringing VMs up and down, based on need. We have users using the MovieBot. If a blockbuster like ‘Baahubali’ is
to architect for availability and fault-tolerance, size workloads, released, then there could be a huge surge in users accessing
and manage capacity and utilisation. If we have dedicated VMs the MovieBot at the same time, and the set-up would
provisioned in the cloud, we still have to pay for the reserved effortlessly scale (you have to pay for the calls, though).
resources (even if it’s just idle time). Hence, moving from a Phew! You essentially engineered a serverless application.
capex model to an opex one is not enough. What we need is to With this, it’s time to define the term ‘serverless’.
only pay for what we are using (and not more than that) and Serverless architectures refer to applications that significantly
‘pay as you go’. Serverless computing promises to address depend on third-party services (known as Backend-as-a-
exactly this problem. Service or BaaS) or on custom code that’s run in ephemeral
The other key aspect is agility. Businesses today need containers (Function-as-a-Service or FaaS).
to be very agile. Technology complexity and infrastructure Hmm, that’s a mouthful of words; so let’s dissect
operations cannot be used as an excuse for not delivering this description.
value at scale. Ideally, much of the engineering effort should ƒ Backend-as-a-Service: Typically, databases (often NoSQL
be focused on providing functionality that delivers the flavours) hold the data and can be accessed over the cloud,
desired experience, and not in monitoring and managing the and a service can be used to help access that back-end.
infrastructure that supports the scale requirements. This is Such a back-end service is referred to as BaaS.
where serverless shines. ƒ Function-as-a-Service: Code that processes the requests
(i.e., the ‘programming logic’ written in your favourite
What is serverless? programming language) could be run on containers that are
Consider a chatbot for booking movie tickets - let’s call it spun and destroyed as needed. They are known as FaaS.
MovieBot. Any user can make queries about movies, book The word ‘serverless’ is misleading because it literally
tickets, or cancel them in a conversational style (e.g., “Is means there are no servers. Actually, the word implies, “I don’t
‘Dunkirk’ playing in Urvashi Theatre in Bengaluru tonight?” care what a server is.” In other words, serverless enables us
in voice or text). to create applications without thinking about servers, i.e., we
This solution requires three elements: a chat interface can build and run applications or services without worrying
channel (like Skype or Facebook Messenger), a natural about provisioning, managing or scaling the underlying
language processor (NLP) to understand the user’s intentions infrastructure. Just put your code in the cloud and run it! Keep
(e.g., ‘book a ticket’, ‘ticket availability’, ‘cancellation’, etc), in mind that this applies to Platform-as-a-Service (PaaS) as
and then access to a back-end where the transactions and data well; although you may not deal with direct VMs with PaaS,
pertaining to movies is stored. The chat interface channels you still have to deal with instance sizes and capacity.
are universal and can be used for different kinds of bots. NLP Think of serverless as a piece of functionality to run
can be implemented using technologies like AWS Lex or IBM — not in your machine but executed remotely. Typically,
Watson. The question is: how is the back-end served? Would serverless functions are executed in an ‘event-driven’
you set up a dedicated server (or a cluster of servers), an API fashion — the functions get executed as a response to events
gateway, deploy load balancers, or put in place identity and or requests on HTTP. In the case of the MovieBot, the
access control mechanisms? That’s costly and painful, right! Lambda functions are invoked to serve user queries as and
That’s where serverless technology can help. when user(s) interact with it.

52 | SEPTEMBER 2017 | OPEN SOURCE FOR YOU | www.OpenSourceForU.com

Insight Admin

Use cases Functions has good support for a wider variety of languages
With serverless architecture, developers can deploy certain and integrates with Microsoft’s Azure services. Google’s
types of solutions at scale with cost-effectiveness. We have Cloud Functions is currently in beta. One of the key
already discussed developing chatbots - it is a classic use open source players in serverless technologies is Apache
case for serverless computing. Other key use cases for the OpenWhisk, backed by IBM and Adobe. It is often tedious
serverless approach are given below. to develop applications directly on these platforms (AWS,
1) Three-tier Web applications: Conventional single page Azure, Google and OpenWhisk). The serverless framework is
applications (SPA), which rely on REpresentative State a popular solution that aims to ease application development
Transfer (REST) based services to perform a given on these platforms.
functionality, can be re-written to leverage serverless Many solutions (especially open source) focus on
functions front-ended by an API gateway. This is a abstracting away the details of container technologies like
powerful pattern that helps your application scale Docker and Kubernetes. Hyper.sh provides a container
infinitely, without concerns of configuring scale-out or hosting service in which you can use Docker images
infrastructure resources.. directly in serverless style. Kubeless from Bitnami, Fission
2) Scalable batch jobs: Batch jobs were traditionally run from Platform9, and funktion from Fabric8 are serverless
as daemons or background processes on dedicated VMs. frameworks that provide an abstraction over Kubernetes.
More often than not, this approach hit scalability and had Given that serverless architecture is an emerging approach,
reliability issues - developers would leave their critical technologies are still evolving and are yet to mature. So you
processes with Single Points of Failure (SPoF). With the will see a lot of action in this space in the years to come.
serverless approach, batch jobs can now be redesigned
as a chain of mappers and reducers, each running as Join us at the India Serverless Summit 2017
independent functions. Such mappers and reducers will These are the best of times, and these are the worst of
share a common data store, something like a blob storage times! There are so many awesome new technologies to
or a queue, and can individually scale up to meet the data catch up on. But, we simply can’t. We have seen a pro-
processing needs. gression of computing models - from virtualisation, IaaS,
3) Stream processing: Related to scalable batch jobs is the PaaS, containers, and now, serverless - all in a matter of
pattern of ingesting and processing large streams of data a few years. You certainly don’t want to be left behind.
for near-real-time processing. Streams from services So join us at the Serverless Summit, India’s first
confluence on serverless technologies, being held on
like Kafka and Kinesis can be processed by serverless
October 27, 2017 at Bengaluru. It is the best place to
functions, which can be scaled seamlessly to reduce hear from industry experts, network with technology
latency and increase the throughput of the system. This enthusiasts, as well as learn about how to adopt server-
pattern can elegantly handle spiky loads as well. less architecture. The keynote speaker is John Willis,
4) Automation/event-driven processing: Perhaps the first director of ecosystem development at Docker and a
application of serverless computing was automation. DevOps guru (widely known for the book ‘The DevOps
Functions could be written to respond to certain alerts Handbook’ that he co-authored).
or events. These could also be periodically scheduled to Open Source For You is the media partner and the
augment the capabilities for the cloud service provider Cloud Native Computing Foundation is the community
through extensibility. partner for this summit. For more details, please visit the
The kind of applications that are best suited for serverless website www.inserverless.com.
architectures include mobile back-ends, data processing
systems (real-time and batch) and Web applications. Challenges in going serverless
In general, serverless architecture is suitable for any Despite the fact that a few large businesses are already
distributed system that reacts to events or process workloads powered entirely by serverless technologies, we should keep
dynamically, based on demand. For example, serverless in mind that serverless is an emerging approach. There are
computing is suitable for processing events from IoT (Internet many challenges we need to deal with when developing
of Things) devices, processing large data sets (in Big Data) serverless solutions. Let us discuss them in the context of the
and intelligent systems that respond to queries (chatbots). MovieBot example mentioned earlier.
ƒ Debugging
Serverless technologies Unlike in typical application development, there is no
There are many proprietary and a few open source serverless concept of a local environment for serverless functions. Even
technologies and platforms available for us to choose from. fundamental debugging operations like stepping-through,
AWS Lambda is the earliest (announced in late 2014 and breakpoints, step-over and watch points are not available with
released in 2015) and the most popular serverless technology, serverless functions. As of now, we need to rely on extensive
while other players are fast catching up. Microsoft’s Azure logging and instrumentation for debugging.

www.OpenSourceForU.com | OPEN SOURCE FOR YOU | SEPTEMBER 2017 | 53

Admin Insight

When MovieBot provides an inconsistent response or Further, maintaining all the dependent packages, versioning
does not understand the intent of the user, how do we debug them, etc, is a practical challenge as well.
the code that is running remotely? For situations such as this, Another challenge is the lack of support for widely used
we have to log numerous details: NLP scores, the dialogue languages from serverless platforms. For instance, as of May
responses, query results of the movie ticket database, etc. 2017, you can write functions in C#, Node.js (4.3 and 6.10),
Then we have to manually analyse and do detective work to Python (2.7 and 3.6) and Java 8 on AWS Lambda. How about
find out what could have gone wrong. And, that is painful. other languages like Go, PHP, Ruby, Groovy, Rust or any
ƒ State management others of your choice? Though there are solutions to write
Although serverless is inherently stateless, real-world serverless functions in these languages and execute them, it
applications invariably have to deal with state. Orchestrating is harder to do so. Since serverless technologies are maturing
a set of serverless functions becomes a significant challenge with support for a wider number of languages, this challenge
when there is a common context that has to be passed will gradually disappear with time.
between them. Serverless is all about creating solutions without thinking or
Any chatbot conversation represents a dialogue. It worrying about servers; think of it as just putting your code in
is important for the program to understand the entire the cloud and running it! Serverless is a game-changer because
conversation. For example, for the query, “Is ‘Dunkirk’ it shifts the way you look at how applications are composed,
playing in Urvashi Theatre in Bengaluru tonight?” if the written, deployed and scaled. If you want significant agility
answer from MovieBot is “Yes”, then the next query from in creating highly scalable applications while remaining
the user could be, “Are two tickets available?” If MovieBot cost-effective, serverless is what you need. Businesses across
confirms this, the user could say, “Okay, book it.” For this the world are already providing highly compelling solutions
transaction to work, MovieBot should remember the entire using serverless computing technologies. The applications
dialogue, which includes the name of the movie, the theatre’s serverless has range from chatbots to real-time stream
location, the city, and the number of tickets to book. This processing from IoT (Internet of Things) devices. So it is not
entire dialogue represents a sequence of stateless function a question of if, but rather, when you will adopt the serverless
calls. However, we need to persist this state for the final approach for your business.
transaction to be successful. This maintenance of state
external to functions is a tedious task.
ƒ Vendor lock-in
Although we talk about isolated functions that are [1] ‘Build Your Own Technology Radar’, Neal Ford, http://
executed independently, we are in practice tied to the SDK
(software development kit) and the services provided by [2] ‘Serverless Architectures’, Martin Fowler, https://
the serverless technology platform. This could result in martinfowler.com/articles/serverless.html
vendor lock-in because it is difficult to migrate to other [3] ‘Why the Fuss About Serverless?’ Simon Wardley, http://blog.
equivalent platforms.
[4] ‘Serverless Architectural Patterns and Best Practices’,
Let’s assume that we implement the MovieBot on the AWS Amazon Web Services, https://www.youtube.com/
Lambda platform using Python. Though the core logic of the watch?v=b7UMoc1iUYw
bot is written as Lambda functions, we need to use other related Serverless technologies
services from the AWS platform for the chatbot to work, such • AWS Lambda: https://aws.amazon.com/lambda/
as AWS Lex (for NLP), AWS API gateway, DynamoDB (for • Azure Functions: https://functions.azure.com/
data persistence), etc. Further, the bot code may need to make • Google Cloud Functions: https://cloud.google.com/
use of the AWS SDK to consume the services (such as S3 or • Apache OpenWhisk: https://github.com/openwhisk
DynamoDB), and that is written using boto3. In other words, • Serverless framework: https://github.com/serverless/
for the bot to be a reality, it needs to consume many more serverless
services from the AWS platform than just the Lambda function • Fission: https://github.com/fission/fission
• Hyper.sh: https://github.com/hyperhq/
code written in plain Python. This results in vendor lock-in • Funktion: https://funktion.fabric8.io/
because it is harder to migrate the bot to other platforms. • Kubeless: http://kubeless.io/
ƒ Other challenges
Each serverless function code will typically have third
party library dependencies. When deploying the serverless By: Ganesh Samarthyam, Manoj Ganapathi and Srushit Repakula
function, we need to deploy the third party dependency The authors work at CodeOps Technologies, which is a
packages as well, and that increases the deployment package software technology, consulting and training company based
size. Because containers are used underneath to execute the in Bengaluru. CodeOps is the organiser of the upcoming India
Serverless Summit, scheduled on October 27, 2017. Please
serverless functions, the increased deployment size increases
check www.codeops.tech for more details.
the latency to start up and execute the serverless functions.

56 | SEPTEMBER 2017 | OPEN SOURCE FOR YOU | www.OpenSourceForU.com

Insight Admin

A Glimpse of Microservices with

Kubernetes and Docker
The microservices architecture is a variant of service oriented architecture. It develops
a single application as a suite of small services, each running in its own process and
communicating with lightweight mechanisms, often an HTTP resource API.

Microservices is a variant of the service-oriented

architecture (SOA) architectural style. A SOA is a style
of software design in which services are provided to the
other components by application components, through a
communication protocol over a network that structures
an application as a collection of loosely coupled services.
In the microservices architecture, services should be
fine-grained and the protocols should be lightweight. The
benefit of breaking down an application into different
smaller services is that it improves modularity and
makes the application easier to understand, develop and
test. It also parallelises development by enabling small
autonomous teams to develop, deploy and scale their
respective services independently.

These services are independently deployable and scalable.
icroservices’ is a compound word made of Each service also provides a kind of contract allowing for
‘micro’ and ‘services’. As the name suggests, different services to be written in different programming
microservices are the small modules that provide languages. They can also be managed by different teams.
some functionality to a system. These modules can be
anything that is designed to serve some specific function. The architecture of microservices
These services can be independent or interrelated with each Microservices follows the service-oriented architecture
other, based on some contract. in which the services are independent of users, products
The main function of microservices is to provide and technologies. This architecture allows one to build
isolation between services — a separation of services from applications as suites of services that can be used by other
servers and the ability to run them independently, with the services. This architecture is in contrast to the monolithic
interaction between them based on a specific requirement. architecture, where the services are built as a single unit
To achieve this isolation, we use containerisation, which will comprising a client-side user interface, databases and
be discussed later. The idea behind choosing microservices server-side applications in a single frame — all dependent
is to avoid correlated failure in a system where there on one another. The failure of one can bring down the
is a dependency between services. When running all whole system.
microservices inside the same process, all services will be The microservices architecture mainly consists of
killed if the process is restarted. By running each service the client-side user interface, databases and server-side
in its own process, only one service is killed if that process applications as different services that are related in some
is restarted, but restarting the server will kill all services. way to each other but are not dependent on each other. Each
By running each service on its own server, it’s easier to layer is independent of the other, which in turn leads to easy
maintain these isolated services, though there is a cost maintenance. The architecture is represented in Figure 2.
associated with this option. This architecture is a form or system that is built by
plugging together components, somewhat like in a real
How microservices are defined world composition where a component is a unit of software
The microservices architecture develops a single application that is independently replaceable and upgradeable. These
as a suite of small services, each running in its own process microservices are easily deployable and integrated into
and communicating with lightweight mechanisms, often an one another. This gives rise to the possibility of continuous
HTTP resource API. integration and continuous deployment.

www.OpenSourceForU.com | OPEN SOURCE FOR YOU | SEPTEMBER 2017 | 57

Admin Insight

doesn’t have to
install and configure
User Interface HTTP
complex databases
nor worry about
Services HTTP
switching between
Module 1 Module 2 Module 3 Module 4 incompatible
language toolchain
Key / Value versions. When an
app is dockerised,
Database 2
HTTP that complexity
Database 1

Database 3
Client Side User Interface Microservices Database is pushed into
microservices - application databases
containers that
Figure 1: Microservices - application databases Figure 2: Microservices architecture are easily built,
shared and run. It
is a tool that is designed to benefit both developers and
Containers are isolated systems administrators.
but share the OS and, where
Bins/ Bins/ Bins/ appropriate, the bins/libraries

Libs Libs Libs
....result is significantly faster deployment, How well does Kubernetes go with Docker?
Guest Guest Guest
much less overhead, easier migration,
faster restart
Before starting the discussion on Kubernetes, we must
OS OS OS first understand orchestration, which is to arrange various



components so that they achieve a desired result. It also




Hyprevisor (Type 2) means the process of integrating two or more applications
Host OS Host OS
and/or services together to automate a process, or
Server Server
synchronise data in real-time.
Figure 3: Virtual machines vs containers The intermediate path connecting two or more services
is done by orchestration, which refers to the automated
What’s so good about microservices? arrangement, coordination and management of software
With the advances in software architecture, microservices containers. So what does Kubernetes do then? Kubernetes
have emerged as a different platform compared to other is an open source platform for automating deployments,
software architecture. Microservices are easily scalable scaling and operations of application containers across
and are not limited to a language; so you are free to clusters of hosts, providing container-centric infrastructure.
choose any language for the services. The services Orchestration is an idea whereas Kubernetes implements
are loosely coupled, which in turn results in ease of that idea. It is a tool for orchestration. It deploys containers
maintenance and flexibility, as well as reduced time in inside a cluster. It is a helper tool that can be used to
debugging and deployment. manage a cluster of containers and treat all servers as a
single unit. These containers are provided by Docker.
Microservices with Docker and Kubernetes The best example of Kubernetes is the Pokémon Go
Docker is a software technology that provides containers, App, which runs on a virtual environment of Google
which are a computer virtualisation method in which Cloud, in a separate container for each user. Kubernetes
the kernel of an operating system allows the existence uses a different set-up for each OS. So if you want a tool
of multiple isolated user-space instances, instead of just that will overcome Docker’s limitations, you should go
one. Everything required to make a piece of software run with Kubernetes.
is packaged into isolated containers. With microservices, To conclude, we may say that microservices is growing
containers play the same role of providing virtual very fast, the reason being its features of independence and
environments to different processes that are running, being isolation which give it the power to easily run, test and be
deployed and undergoing testing, independently. deployed. This is just a small summary of microservices,
Docker is a bit like a virtual machine, but rather than about which there is a lot more to learn.
creating a whole virtual operating system, Docker allows
applications to use the same kernel as the system that it’s
running on and only requires applications to be shipped By: Astha Srivastava
with things not already running on the host computer. The author is a software developer. Her areas of expertise
The main idea behind using Docker is to eliminate the are C, C++, C#, Java, JavaScript, HTML and ASP.NET. She has
recently started working on the basics of artificial intelligence.
‘works on my machine’ type of problems that occur
She can be reached at asthasri25@gmail.com.
when collaborating on code with co-workers. Docker

58 | SEPTEMBER 2017 | OPEN SOURCE FOR YOU | www.OpenSourceForU.com

Let’s Try Developers

A Cost-Effective Test Automation
Tool for Web Applications
Selenium is a software testing framework. Test authors can write tests in it without
learning a test scripting language. It automates Web based applications efficiently and
provides a recording/ playback system for authoring tests.

elenium is a portable software-testing framework for suite and is the easiest one to learn. It is a Firefox plugin that
Web applications that can operate across different you can install as easily as any other plugin. It allows testers to
browsers and operating systems. It is quite similar record their actions as they go through the workflow that they
to HP Quick Test Pro (or QTP, now called UFT) except that need to test. But it can only be used with the Firefox browser,
Selenium focuses on automating Web based applications. as other browsers are not supported. The recorded scripts can
Testing done using this tool is usually referred to as Selenium be converted into various programming languages supported
testing. Selenium is not just a single tool but a set of tools by Selenium, and the scripts can be executed on other browsers
that helps the tester to automate Web based applications more as well. However, for the sake of simplicity, the Selenium IDE
efficiently. It has four components: should only be used as a prototyping tool. If you want to create
1. The Selenium integrated development environment (IDE) more advanced test cases, either use Selenium RC or WebDriver.
2. The Selenium remote control (RC)
3. WebDriver Selenium RC
4. The Selenium grid Selenium RC or Selenium Remote Control (also known as
Selenium RC and WebDriver are merged into a single Selenium 1.0) was the flagship testing framework of the
framework to form Selenium 2. Selenium 1 is also referred whole Selenium project for a long time. It works in a way
to as Selenium RC. Jason Huggins created Selenium in 2004. that the client libraries can communicate with the Selenium
Initially, he named it JavaScriptTestRunner, and later changed RC server that passes each Selenium command for execution.
this to Selenium. It is licensed under Apache License 2.0. In Then the server passes the Selenium command to the browser
the following sections, we will learn about how Selenium and using Selenium-Core JavaScript commands. This was the
its components operate. first automated Web testing tool that allowed people to
use a programming language they preferred. Selenium RC
The Selenium IDE components include:
The Selenium IDE is the simplest framework in the Selenium 1. The Selenium server, which launches and kills the

www.OpenSourceForU.com | OPEN SOURCE FOR YOU | SEPTEMBER 2017 | 59

Developers Let’s Try

browser, interprets and runs the Selenese commands The WebDriver uses a different underlying framework,
passed from the test program, and acts as an HTTP while Selenium RC uses a JavaScript Selenium-Core
proxy, intercepting and verifying HTTP messages passed embedded within the browser, which has its limitations.
between the browser and Application Under Test (AUT). WebDriver directly interacts with the browser without any
2. Client libraries that provide the interface between each intermediary. Selenium RC depends on a server.
programming language and the Selenium RC server.
Selenium RC is great for testing complex AJAX based Architecture
Web user interfaces under a continuous integration system. It The architecture of WebDriver is explained in Figure 1.
is also an ideal solution for users of Selenium IDE who want
to write tests in a more expressive programming language
than the Selenese HTML table format.
Web Application

Selenese commands
Selenese is the set of Selenium commands which is used to
test Web applications. The tester can test the broken links,
the existence of some object on the UI, AJAX functionality,
the alert window, list options and a lot more using Selenese.
There are three types of commands:
1. Actions: These are commands that manipulate the state
of the application. Upon execution, if an action fails, the
execution of the current test is stopped. Some examples are:
click(): Clicks on a link, button, checkbox or radio button.
contextMenuAt (locator, coordString): Simulates the Selenium Web Driver
user by clicking the ‘Close’ button in the title bar of a
popup window or tab.
2. Accessors: These evaluate the state of the application and
store the results in variables which are used in assertions.
Some examples are: Selenium Test
assertErrorOnNext: Pings Selenium to expect an error on (Java, C#, Ruby, Python,
Perl, Php, Java Script)
the next command execution with an expected message.
storeAllButtons: Returns the IDs of all buttons on the page.
3. Assertions: These enable us to verify the state of an Figure 1: Architecture of Selenium WebDriver
application and compare it against the expected. It is
used in three modes, i.e., assert, verify and waitfor. Some The differences between WebDriver and Selenium RC are
examples are: given in Table 1.
waitForErrorOnNext(message): Wait for error, used with
the accessor assertErrorOnNext. Table 1
verifySelected (selectLocator, opti onLocator):Verifies WebDriver Selenium RC
that the selected item of a drop-down satisfies Architecture is simpler, Architecture is complex, as it
optionSpecifier. as it controls the browser depends on the server.
from the OS level.
Selenium WebDriver It supports HtmlUnit. It does not support HtmlUnit.
Selenium WebDriver is a tool that automates the testing of Web WebDriver is faster, as it It is slower, as it uses
applications and is popularly known as Selenium 2.0. It is a Web interacts directly with the JavaScript to interact
automation framework that allows you to execute your tests browser. with RC.
against different browsers. WebDriver also enables you to use a Less object-oriented Purely object-oriented and
programming language in creating your test scripts. The following APIs and cannot be used can be used for iPhone/An-
programming languages are supported by Selenium WebDriver: for mobile testing. droid application testing.
1. Java WebDriver is not ready Selenium RC can support
2. .NET to support new brows- new browsers and have
3. PHP ers and does not have a built-in commands.
4. Python built-in command for the
5. Perl automatic generation of
test results.
6. Ruby

Continued to page 64....

60 | SEPTEMBER 2017 | OPEN SOURCE FOR YOU | www.OpenSourceForU.com
Let’s Try Developers

Using the Spring Boot

Admin UI for
Spring Boot Applications

Using Spring Boot makes it easy for developers to create standalone,

production-grade Spring based applications that can be ‘just run’. Spring Boot
is a part of microservices development.

s part of developing microservices, many of us use us an Admin UI dashboard to administer Spring Boot
the features of Spring Boot along with Spring Cloud. applications. This module crunches the data from Actuator
In the microservices world, we may have many end points, and provides insights about all the registered
Spring Boot applications running on the same or different applications in a single dashboard.
hosts. If we add SpringActuator (http://docs.spring.io/ We will demonstrate the Spring Boot admin features in
spring-boot/docs/current/reference/htmlsingle/#production- the following sections.
ready) to the Spring Boot applications, we get a lot of out-of- As a first step, create a Spring Boot application that will
the-box end points to monitor and interact with applications. be a Spring Boot Admin Server module by adding the Maven
The list is given in Table 1. dependencies given below:
The end points given in Table 1 provide a lot of
insights about the Spring Boot application. But if you <dependency>
have many applications running, then monitoring each <groupId>de.codecentric</groupId>
application by hitting the end <artifactId>spring-boot-admin-server</artifactId>
points and inspecting the JSON <version>1.5.1</version>
response is a tedious process. </dependency>
To avoid this hassle, the Code <dependency>
Centric team came up with the <groupId>de.codecentric</groupId>
Spring Boot Admin (https://github. <artifactId>spring-boot-admin-server-ui</artifactId>
com/codecentric/spring-boot- <version>1.5.1</version>
Figure 1: Spring Boot logo admin) module, which provides </dependency>

www.OpenSourceForU.com | OPEN SOURCE FOR YOU | SEPTEMBER 2017 | 61

Developers Let’s Try

Table 1
ID Description Sensitive default
actuator Provides a hypermedia-based ‘discovery page’ for the other endpoints. Requires True
Spring HATEOAS to be on the classpath.
auditevents Exposes audit events information for the current application. True
autoconfig Displays an auto-configuration report showing all auto-configuration candidates True
and the reason why they ‘were’ or ‘were not’ applied.
beans Displays a complete list of all the Spring Beans in your application. True
configprops Displays a collated list of all @ConfigurationProperties. True
dump Performs a thread dump. True
env Exposes properties from Spring’s ConfigurableEnvironment. True
flyway Shows any Flyway database migrations that have been applied. True
health Shows application health information (when the application is secure, a simple False
‘status’ when accessed over an unauthenticated connection or full message
details when authenticated).
info Displays arbitrary application information. False
loggers Shows and modifies the configuration of loggers in the application. True
liquibase Shows any Liquibase database migrations that have been applied. True
metrics Shows ‘metrics’ information for the current application. True
mappings Displays a collated list of all @RequestMapping paths. True
shutdown Allows the application to be gracefully shut down (not enabled by default). True
trace Displays trace information (by default, the last 100 HTTP requests). True

Add the Spring Boot Admin Server configuration by adding throws Exception {
@EnableAdminServer to your configuration, as follows: // Page with login form is served as /login.html and does
a POST on /login
package org.samrttechie; http.formLogin().loginPage(“/login.
import org.springframework.boot.SpringApplication; // The UI does a POST on /logout on logout
import org.springframework.boot.autoconfigure. http.logout().logoutUrl(“/logout”);
SpringBootApplication; // The ui currently doesn’t support csrf
import org.springframework.context.annotation.Configuration; http.csrf().disable();
import org.springframework.security.config.annotation.web.
builders.HttpSecurity; // Requests for the login page and the static
import org.springframework.security.config.annotation.web. assets are allowed
configuration.WebSecurityConfigurerAdapter; http.authorizeRequests()
.antMatchers(“/login.html”, “/**/*.
import de.codecentric.boot.admin.config.EnableAdminServer; css”, “/img/**”, “/third-party/**”)
@EnableAdminServer // ... and any other request needs to be authorized
@Configuration http.authorizeRequests().
@SpringBootApplication antMatchers(“/**”).authenticated();
public class SpringBootAdminApplication {
// Enable so that the clients can
public static void main(String[] args) { authenticate via HTTP basic for registering
SpringApplication. http.httpBasic();
run(SpringBootAdminApplication.class, args); }
} }
// end::configuration-spring-security[]
public static class SecurityConfig extends }
WebSecurityConfigurerAdapter {
@Override Let us create more Spring Boot applications to monitor
protected void configure(HttpSecurity http) through the Spring Boot Admin Server created in the above

62 | September 2017 | OpeN SOUrCe FOr YOU | www.OpenSourceForU.com

Let’s Try Developers


Figure 2: Admin server UI Add the property given below to the application.
properties file. This property tells us where the Spring Boot
Admin Server is running. Hence, the clients will register
with the server.


Now, if we start the Admin Server and other Spring Boot

applications, we will be able to see all the admin clients’
information in the Admin Server dashboard. As we started
our Admin Server on port 1111 in this example, we can see
the dashboard at http://<host_name>:1111. Figure 2 shows
the Admin Server UI.
A detailed view of the application is given in Figure
3. In this view, we can see the tail end of the log file, the
metrics, environment variables, the log configuration
where we can dynamically switch the log levels at the
component level, the root level or package level, and other
Let’s now look at another feature called notifications
from the Spring Boot admin. This notifies the administrators
when the application status is DOWN or when the
application status is coming UP. Spring Boot admin supports
the following channels to notify the user.
ƒ Email notifications
ƒ Pagerduty notifications
ƒ Hipchat notifications
ƒ Slack notifications
ƒ Let’s Chat notifications
In this article, we will configure Slack notifications.
Add the properties given below to the Spring Boot Admin
Server’s application.properties file.

Figure 3: Detailed view of Spring Boot Admin slack.com/services/T8787879tttr/B5UM0989988L/0000990999VD1hV
t7Go1eL //Slack Webhook URL of a channel
steps. All Spring Boot applications that we now create will act spring.boot.admin.notify.slack.message=”*#{application.
as Spring Boot Admin clients. To make the application an admin name}* is *#{to.status}*” //Message to appear in the
client, add the dependency given below along with the actuator channel
dependency. In this demo, I have created three applications:
Eureka Server, Customer Service and Order Service. Since we are managing all the applications with the
Spring Boot Admin, we need to secure its UI with a login
<dependency> feature. Let us enable the login feature to the Spring Boot
<groupId>de.codecentric</groupId> Admin Server. I am going with basic authentication here.
<artifactId>spring-boot-admin-starter-client</ Add the Maven dependencies give below to the Admin

www.OpenSourceForU.com | OPEN SOURCE FOR YOU | SEPTEMBER 2017 | 63

Developers Let’s Try

Server module, as follows: authenticating. Hence, add the properties given below to the
admin client’s application.properties files.
<groupId>org.springframework.boot</groupId> spring.boot.admin.username=admin
<artifactId>spring-boot-starter-security</artifactId> spring.boot.admin.password=admin123
<groupId>de.codecentric</groupId> There are additional UI features like Hystrix and
<artifactId>spring-boot-admin-server-ui-login</ Turbine UI, which we can enable in the dashboard. You
artifactId> can find more details at http://codecentric.github.io/
<version>1.5.1</version> spring-boot-admin/1.5.1/#_ui_modules. The sample code
</dependency> created for this demonstration is available on https://github.
Add the properties given below to the application.properties com/2013techsmarts/SpringBoot_Admin_Demo.

By: Siva Prasad Rao Janapati

security.user.name=admin //user name to authenticate
security.user.password=admin123 //Password to authenticate The author is a software engineer with hands-on experience in
Java, JEE, Spring, Oracle Commerce, MOZU Commerce, Apache
Solr and other open source/enterprise technologies. You can
As we have added security to the Admin Server,
reach him at his blog http://smarttechie.org.
admin clients should be able to connect to the server by

Continued from page 60.... To locate by CSS Selector, type:

driver.findElement(By.cssSelector(<css selector>));
Selenium locators
Locator is a command that instructs the Selenium IDE which To locate by XPath, type:
GUI element it needs to work on. Elements are located in
Selenium WebDriver with the help of findElement() and driver.findElement(By.xpath(<xpath>));
findElements() methods provided by the WebDriver and
WebElement class. The findElement() method returns a Limitations of Selenium
WebElement object based on a specified search criteria or Selenium does have some limitations which one needs to be
ends up throwing an exception. The findElements() method aware of. First and foremost, image based testing is not clear-
returns a list of WebElements matching the search criteria. If cut compared to some other commercial tools in the market,
these are not found, it returns an empty list. while the fact that it is open source also means that there is no
The different types of locators are: guaranteed timely support. Another limitation of Selenium is
1. ID that it supports Web applications; therefore, it is not possible
2. Name to automate the testing of non-browser based applications.
3. Link Text Selenium is a power testing framework to conduct
4. CSS Selector functional and regression testing. It is open source software
5. DOM and supports various programming environments, OSs and
6. XPath popular browsers. Selenium WebDriver is used to conduct
batch testing, cross-platform browser testing, data driven
To locate by ID, type: testing, etc. It is also very cost-effective when automating
Web applications; and for the technically inclined, it provides
driver.findElement(By.id(<element ID>)); the power and flexibility to extend its capability many times
over, making it a very credible alternative to other test
To locate by name, type: automation tools in the market.

driver.findElement(By.name(<element name>));
By: Neetesh Mehrotra
The author works in TCS as a systems engineer. His areas of
To locate by Link Text, type:
interest are Java development and automation testing. You can
contact him at mehrotra.neetesh@gmail.com.

64 | SEPTEMBER 2017 | OPEN SOURCE FOR YOU | www.OpenSourceForU.com

Let’s Try Developers

Splinter: An Easy Way to Test

Web Applications
Splinter is a Web application testing tool which is built around Python. It
automates actions such as visiting specified URLs and interacts with their items.
It also removes the drudgery from Web application testing by replacing manual
testing with automated testing.

very one of us makes mistakes—some of these might to-use built-in functions for the most frequently performed
be trivial and be ignored, while a few that are serious tasks. A newbie can easily use Splinter and automate any
can’t be ignored. Hence, it’s always a good practice specific process with just a limited knowledge of Python
to verify and validate what we do in order to eliminate scripting. It acts as an easily usable abstraction layer on top of
the possibility of error. So is the case with any software different available automation tools like Selenium and makes
application. The development of a software application is it easy to write automation tests. We can easily automate a
complete only when it’s fully verified and validated (its plethora of tasks such as opening a browser, clicking on any
functionality, performance, user interface, etc). Only then is it specific link or accessing any link, just with one or two lines
ready for release. Carrying out all such validations manually of code using Splinter, while in the case of other open source
is quite time consuming; so, machines perform such repetitive tools like Selenium, this is a long and complex process.
tasks and processes. This is called automation testing. It saves Splinter even allows us to find different elements of any
a lot of time while it reduces the risk of any further error Web application using its different properties like tag name,
caused by human intervention. text or ID value, xpath, etc. Since Splinter is an open source
There are different automation tools and frameworks tool, it’s quite easy to get clarifications on anything that’s not
available, of which Splinter is one. It lets us automate clear. It is supported by a large community. It even has well
different manual tasks and processes associated with any maintained documentation which makes it easy for any newbie
Web-based software application. In a Web application, we to master this tool. Apart from all this, Splinter supports various
need to automate the sequence of different actions performed, inbuilt libraries making the task of automation easier. We can
right from opening the Web browser to checking if it’s easily manage different actions performed on more than one
loading properly for different actions that involve interactions Web window at the same time as well as navigate through the
with the application. Splinter is quite good in automating a history of the page, reload the page, etc.
sequence of actions. It is an open source tool used for testing
different Web applications using Python. The tasks needed Features of Splinter
to be performed by Splinter are written in Python. It lets us 1. Splinter has got one of the simplest APIs among open
automate various browser actions, such as visiting URLs as source tools used for automating different tasks on Web
well as interacting with their different items. It has got easy- applications. This makes it easy to write automated tests

www.OpenSourceForU.com | OPEN SOURCE FOR YOU | SEPTEMBER 2017 | 65

Developers Let’s Try

for any Web application. This helps to confirm if the given response is okay or not.
2. It supports different Web drivers for various browsers. 14. It is possible to manipulate the cookies that are using
These drivers are the Firefox Web driver for Mozilla the cookies’ attributes from any browser instance.
Firefox, Chrome’s Web driver for Google Chrome, The cookie’s attribute is actually an instance of a
PhantomJs Web driver for PhantomJs, zope.testbrowser CookieManager class which manipulates cookies, such as
for Zopetest and a remote Web driver for different adding and deleting them.
‘headless’ (with no GUI) testing. 15. One can create new drivers using Splinter. For instance, if
3. Splinter also allows us to find different elements in any we need to create a new Splinter browser, we just need to
Web page by their Xpath, CSS, tag value, name, ID, implement a test case (extending test.base.BaseBrowsertests).
text or value. In case we need more accurate control of All this will be present in a Python file, which will act as a
the Web page or we need to do something more, such driver for any future usage.
as interacting with old «frameset» tags, Splinter even
exposes the Web driver that allows us to use the low level
methods used for interacting with that tag.
4. Splinter supports multiple Web automation back-ends. We
can use the same set of test code for doing browser-based
testing with Selenium as its back-end, and for ‘headless’ Browser-based Firefox
testing with zope.testbrowser as its back-end. Selenium HTTP
A Webdriver Server
5. It has extensive support for using iframes and interacts Test
Code P Web
Sauce Labs (IE)
with them by just passing the iframe’s name, ID or index Headless

value. There is also Chrome support for various alerts and PhantomJS
prompts in the Splinter 0.4 version.
6. We can easily execute JavaScript in different drivers
which support Splinter. We can even return the result of Figure 1: Flow diagram for Splinter acting as an abstraction layer
the script using an inbuilt method called evaluate_script. (Image source: googleimages.com)
7. Splinter has got the ability to work with AJAX and
asynchronous JavaScript using various inbuilt methods. Drivers supported by Splinter
8. When we use Splinter to work with AJAX and Drivers play a significant role when it comes to any Web
asynchronous JavaScript, it’s a common experience application. In Splinter, a Web driver helps us open that specific
to have some elements which are not present in application whose driver we are using. Different types of
HTML code (since they are created using JavaScript, drivers are supported by Splinter, based on the way any specific
dynamically). In such cases, we can use various inbuilt application is accessed and tested. There are browser based
methods such as is_element_present or is_text_present drivers, which help to open specific browsers; apart from that
for checking the existence of any specific element we have headless drivers, which help in headless testing and
or text. Splinter will actually load the HTML and then there are remote drivers, which help to connect to any Web
the JavaScript in the browser, and the check will be application present on a remote machine. Here is a list of drivers
performed before JavaScript is processed. that are supported by Splinter.
9. The Splinter project has full documentation for its APIs Browser based drivers:
and this is really important when we have to deal with ƒ Chrome WebDriver
different third party libraries. ƒ Firefox WebDriver
10. We can also easily set up a Splinter development ƒ Remote WebDriver
environment. We need to make sure we have some basic Headless drivers
development tools in our machine, before setting up an ƒ Chrome WebDriver
entire environment with just one command. ƒ Phantomjs WebDriver
11. There is also a provision for creating a new Splinter ƒ zope.testbrowser
browser in an easy and simple way. We just need to ƒ Django client
implement a test case for this. ƒ Flask client
12. Using Splinter, it’s possible to check the HTTP status Remote driver
code that a browser visits. We can use the status_code. ƒ Remote WebDriver
is_success method to do the work for us. We can compare
the status code directly. Prerequisites and installation of Splinter
13. Whenever we use the visit method, Splinter actually To install Splinter, Python 2.7 or above should be installed on the
checks if the given response is a success or not, and if it is system. We can download Python from http://www.python.org.Make
not, then Splinter raises an HttpResponseError exception. sure you have already set up your development environment.

66 | SEPTEMBER 2017 | OPEN SOURCE FOR YOU | www.OpenSourceForU.com

Let’s Try Developers

Git should be installed on the system. If you want to use browser.visit(url)

Google Chrome as the Web browser, make sure Chrome
WebDriver is set up properly. #checks if Facebook web page is loaded else prints an error
There are two ways in which we can install Splinter. message
To install the stable release: if browser.is_text_present(‘www.facebook.com’):
To install the official and bug-free version, run the # fills the user’s email ID and password in the email and
following command from a terminal: password field of the facebook login section

$ [sudo] pip install splinter #Inbuilt function browser.fill uses the tag name for Email
and Password input box i.e. email and pass respectively to
For installing under-development source-code: identify it
To get Splinter’s latest and best features, just run the browser.fill(‘email’, user_email)
following given set of commands from a terminal: browser.fill(‘pass’, user_pass)
#selects the login button using its id value present on the
$ git clone git://github.com/cobrateam/splinter.git Facebook page to click and log in with the given details
$ cd splinter button = browser.find_by_id(‘u_0_d’)
$ [sudo] python setup.py install button.click()
Writing sample code to automate a process print(“Facebook web application NOT FOUND”)
using Splinter
As already stated, even a newbie without much knowledge Some important built-in functions used in Splinter
of programming can automate any specific task using Table 1 lists some of Splinter’s significant built-in
Splinter. Let’s discover how one can easily make Splinter functions that can be used while automating any process
perform any specific task automatically on a Web for a Web application.
application. The credit for the ease of coding actually goes
to the different inbuilt functions that Splinter possesses. We Setting up the Splinter development environment
just need to incorporate all such built-in functions or library When it comes to programming in Splinter, we have
files with the help of a few lines of code. Additionally, already seen that it’s easier than other open source Web
we need to apply logic while coding to validate different application testing tools. But we need to set up a development
scenarios from different perspectives. Let’s have a look at environment for it, wherein we can easily code or automate
one of the sample code snippets that has been written for a specific process using Splinter. This is not a tough task. We
Splinter. Here, we make use of the name and ID values of just need to make sure that we have some basic development
different elements present on the Web page to identify that tools, library files and a few add-on dependencies on our
specific Web element. machine, which will ultimately help us code in an easier and
Scenario for sample code: Login to a Facebook account better way. We can get the required tools and set up the entire
using the user’s email ID and password. environment using just a few commands.
Lets’ have a look at the different development tools
#imports the Browser library for Splinter required to set up the environment.
from splinter import Browser Basic development tools: If you are using the Mac OS,
install the Xcode tool. It can be downloaded from the Mac
# takes the email address from user as input to login to his/ Application Store (on the Mac OS X Lion) or even from the
her Facebook account Apple website.
user_email = raw_input(“enter users email address “) If you are using a Linux computer, install some of the
# takes the password from user as input to login to his/her basic development libraries and the headers. On Ubuntu, you
Facebook account can easily install all of these using the apt-get command.
user_pass = raw_input(“enter users password “) Given below is the command used for this purpose.

# loads the Firefox browser $ [sudo] apt-get install build-essential python-dev libxml2-
browser= Browser(‘firefox’) dev libxslt1-dev
# stores the URL for Facebook in url variable
url = “https://www.facebook.com/” Pip and virtualenv: First of all, we need to make sure
that we have Pip installed in our system, with which we
#navigates to facebook website and load that in the Firefox manage all the Splinter development dependencies. It lets us
browser program our task and makes the system perform any activity

www.OpenSourceForU.com | OPEN SOURCE FOR YOU | SEPTEMBER 2017 | 67

Developers Let’s Try
Table 1
Name of the function Function used for Syntax of the function
Browser Browser () Used to instantiate any browser and cre- variable name = Browser (‘name of
related ate a window for it the Web driver used’)
browser.visit() This function is used to navigate to any browser.visit(‘URL’)
specific URL
browser.reload() Used to reload any Web page browser.reload()
browser.title Displays the title of the current active Web browser.title
browser.html Used to display the HTML content of the browser.html
current active Web page
browser.url Used to access the URL of the current browser.url
active Web page
For manag- browser.windows[0] Used to access the first window browser.windows[numeric value rep-
ing different resenting window to be visited]
actions browser.windows Used to access any specific window us- browser.windows[window_name]
[window_name] ing the window_name

browser.windows. Takes you to the current window browser.windows.current()

current ()
window.is_current () Boolean – used to check whether the cur- window.is_current = Boolean True or
rent window is active or not False

window.next () Takes you to the next open window window.next()

window.prev() Takes you to the previous open window window.prev()
window.close() Closes current window window.close()
window.close_oth- Closes all windows except the current window.close_others()
ers() one
For finding browser.find_by_ Used to find an element using its name browser.find_by_name(‘name of ele-
elements name() ment’)
of any Web
page browser.find_by_ Used to find an element using its CSS browser.find_by_css(‘css value’)
css() value
browser.find_by_ Used to find an element using its XPath browser.find_by_xpath(‘xpath value’)
browser.find_by_ Used to find an element using its tag browser.find_by_tag(‘name of tag’)
tag() name
browser.find_by_ Used to find an element using its text browser.find_by_text(‘text value for
text() value the element to be accessed’)
browser.find_by_id() Used to find an element using its ID value browser.find_by_id(‘id value of the
browser.find_by_ Used to find an element using its value browser.find_by_value(‘value of the
value() element to be accessed’)

using the code or command we write. It’s advisable to choose

virtualenv for a good development environment. References
Once we have all the development libraries installed for [1] http://www.wikipedia.org/
the OS we are using, we just need to install all the Splinter [2] https://splinter.readthedocs.io
[3] https://github.com/cobrateam/splinter
development dependencies using the make command. Given
below is the command for this.
By: Vivek Ratan
$ [sudo] make dependencies The author, who is currently an automation test engineer
at Infosys, Pune, has completed his B. Tech in electronics
and instrumentation engineering. He can be reached at
We will use sudo while making dependencies only if we
ratanvivek14@gmail.com for any suggestions or queries.
are not using virtualenv.

68 | SEPTEMBER 2017 | OPEN SOURCE FOR YOU | www.OpenSourceForU.com

Let’s Try Developers

Getting Started with PHP, the

Popular Programming Language
This article provides an introduction to PHP — the development process, its history,
as well as its pros and cons. At the end of the article, we learn how to install XAMPP
on a computer, and to write code to add two numbers.

f you refer to any Web technology survey to check the <!DOCTYPE HTML>
market share of different server side scripting languages, <html>
you will be surprised to know that PHP is used by an <head>
average 70 per cent of the websites. According to w3techs. <title>Example</title>
com, “PHP is used by 82.7 per cent of all the websites </head>
whose server-side programming language we know.” In the <body>
early stages, even Facebook servers deployed PHP to run <?php
their social networking application. Nevertheless, we are echo “I am PHP script!”;
not concerned about the Web traffic hosted by PHP these ?>
days. Instead, we will delve deep into PHP to understand its </body>
development, its history, its pros and cons and, in the end, </html>
we will have a sneak peek into some of the open source IDEs
which you can use for rapid development. In the above example, you can see how easily PHP can
First, let’s understand what PHP is. It is an abbreviated be embedded inside HTML code just by enclosing it inside
form of ‘Hypertext Pre-processor’. Confused about the <?php and ?> tags, which allows very cool navigation
sequence of the acronym? Actually, the earlier name of PHP between HTML and PHP code. It differs from client-side
was ‘Personal Home Page’ and hence the acronym. It is a scripting languages like JavaScript in that PHP code is
server side programming language mainly used to enhance executed on the server with the help of a PHP interpreter, and
the look and feel of HTML Web pages. A sample PHP code only the resultant HTML is sent to the requester’s computer.
embedded into HTML looks like what follows: Though it can do a variety of tasks, ranging from creating

www.OpenSourceForU.com | OPEN SOURCE FOR YOU | SEPTEMBER 2017 | 69

Developers Let’s Try

forms to generating dynamic Web content to sending and called ‘Personal Home Page Tools’ in order to maintain
receiving cookies, yet there are three main areas where PHP his personal Web pages. The succeeding year, these tools
scripts are usually deployed. were released under the name of ‘Personal Home Page/
ƒ Server-side scripting: This is the main usage and target Forms Interpreter’ as CGI binaries. They were enabled to
area of PHP. You require a PHP parser, a Web browser and provide support for databases and Web forms. Once they
a Web server to make use of it, and then you will be able were released to the whole world, PHP underwent a series of
to view the PHP output of Web pages on your machine’s developments and modifications, and the result was that the
browser. second version of ‘Personal Home Page/Forms Interpreter’
ƒ Command line scripting: PHP scripts can also be run was released in November 1997. Moving on, PHP 3, 4 and 5
without any server or browser but with the help of a PHP were released in 1998, 2000 and 2004, respectively.
parser. This is most suited for tasks that take a lot of Today, the most used version of PHP is PHP 5, with
time — for example, sending newsletters to thousands of approximately 93 per cent of the websites using PHP making
records, taking backups from databases, and transferring use of it, though PHP 7 is also available in the market. In
heavy files from one location to another. 2010, PHP 5.4 came out with Unicode support added to it.
ƒ Creating desktop applications: PHP can also be used to
develop desktop based applications with graphical user The pros and cons of PHP
interfaces (GUI). Though it has a lot of pain points, you Before going further into PHP development, let’s take a look
can use PHP-GTK for that, if you want to. PHP-GTK is at some of the advantages and disadvantages of using it in
available as an extension to PHP. Web development.

Fact: Did you know that PHP has a mascot just like Advantages
sports teams? The PHP mascot is a big blue elephant ƒ Availability: The biggest advantage of PHP is that it is
named elePHPant. available as open source, due to which one can find a
large developer community for support and help.
PHP and HTML – similar but different ƒ Stability: PHP has been in use since 1995 and thus it’s
PHP is often confused with HTML. So to set things straight, quite stable compared to other server side scripting
let’s take a look at how PHP and HTML are different and languages since its source code is open and if any bug is
similar at the same time. As we all know, HTML is a markup found, it can be readily fixed.
language and is the backbone for front-end Web pages. On ƒ Extensive libraries: There are thousands of libraries
the other hand, PHP works in the background, on the server, available which enhance the abilities of PHP—for
where HTML is deployed to perform tasks. Together, they are example, PDFs, graphs, Flash movies, etc. PHP makes use
used to make Web pages dynamic. For better understanding, of modules, so you don’t have to write everything from
let’s look at an example where you display some content on a the beginning. You just need to add the required module in
Web page using HTML. Now, if you want to do some back- your code and you are good to go.
end validation on the database, then you will use PHP to do ƒ Built-in modules: Using PHP, one can connect to the
it. So both HTML and PHP have different assigned roles and database effortlessly using its built-in modules, which
they complement each other perfectly. Listed below are some drastically reduce the development time and effort of
of the similarities and differences that will make this clear. Web developers.
ƒ Cross-platform: PHP is supported on all platforms, so
Similarities Differences you don’t have to worry whether your code written in
Windows OS will work on Linux or not.
Compatible with HTML is used on the front-end
ƒ Easy to use: For beginners, learning PHP is easy because
most of the brows- whereas PHP is back-end
ers supporting their technology. of its cool syntax, which is somewhat similar to the C
technologies. programming language, making it even simpler for those
familiar with C.
Can be used on all PHP is a programming language,
operating systems. whereas HTML is called a markup
language and is not included in Disadvantages
the category of programming ƒ Not suitable for huge applications: Though PHP has a
languages because it can’t do lot of advantages in Web page development, it still can’t
calculations like ‘1+1=2’. be used to build complicated and huge Web applications
since it does not support modularity and, hence, the
History and development maintenance of the app will be a cumbersome task.
The development of PHP dates back to 1994 when a Danish- ƒ Security: Security of data involved in Web pages is
Canadian programmer Rasmus Lerdorf created Perl scripts of paramount concern. The security of PHP can be

70 | SEPTEMBER 2017 | OPEN SOURCE FOR YOU | www.OpenSourceForU.com

Let’s Try Developers

compromised due to its open source nature since anyone available packages. It is known for its sleek, feature-
can view its source code and detect bugs in it. So you rich and lightweight interface. It is also supported on all
have to take extra measures to ensure the security of your operating systems. Some of the packages which can be used
Web page if you are dealing with sensitive data. to convert it into an IDE are Sublime PHP Companion,
PHPCS, codIntel, PHPDoc, Simple PHPunit, etc. It can be
Fact: It is estimated that there are approximately 5 million downloaded as open source from sublimetext.com.
PHP developers worldwide, which is a testament to its power. 5. PHP Designer: This IDE is only available for Windows
users. It is very fast and powerful, with full support for
Open source IDEs for PHP development PHP, HTML, JavaScript and CSS. It is used for fast Web
The choice of IDE plays an important role in the development development due to its features like intelligent syntax
of any program or application but this aspect is often highlighting, object-oriented programming, code templates,
neglected. A good and robust IDE comes packed with loads code tips and debug manager, which are all wrapped into
of features and packages to enable rapid development. a sleek and intuitive interface that can also be customised
Automatic code generation, refactoring, organising imports, according to various available themes. It also supports
debugging, identifying dead code and indentation are some various JavaScript frameworks such as JQuery, ExtJs and
of the advantages a powerful IDE can provide. So let’s take Yui. An open source version of it is available and you can
a look at some dominant open source IDEs that can be very read more about it on its official website.
useful in PHP development. 6. NuSphere PHP IDE: PHpED is the IDE developed by
1. NetBeans: Most of you must be aware of NetBeans NuSphere, a Nevada based company which entered the
in Java development but it can also be used for PHP market way back in 2001. The current available version of
development. The biggest advantage of NetBeans is that it PHpED is 18.0 which provides support for PHP 7.0 and
supports many languages like English, Chinese, Japanese, almost all PHP frameworks. This tool also has the ability
etc, and can be installed smoothly on any operating to run unit tests for the developed projects and comes
system. Some of the features that differentiate it from the packaged with the support for all Web based technologies.
rest are smart code completion, refactoring, try/catch code You can download PHpED from NuSphere’s website
completion and formatting. It also has the capability to www.nusphere.com.
configure various PHP frameworks like Smarty, Doctrine, 7. Codelobster: Codelobster also provides a free IDE for PHP
etc. You can download it from netbeans.org. development. Though it is not used too often, it is catching
2. Eclipse: Eclipse tops the list of popular IDEs. If you up fast. By downloading the free version, you get support
have worked with Eclipse earlier, then you will feel at for PHP, JS, HTML and CSS. It can be integrated with
home using Eclipse PDT for PHP development. It can be various frameworks such as Drupal, WordPress, Symfony
downloaded from eclipse.org/pdt. Some of its features and Yii. You can download it from www.codelobster.com.
are syntax highlighting, debugging, code templates,
validating syntax and easy code management through Writing the first PHP program
Windows Explorer. It is a cross-platform IDE and works Having read about PHP, its history and various IDEs, let’s
on Windows, Linux and Mac OS. Since it is developed in write our first PHP program and run it using XAMPP. Though
Java, you must have it installed in your machine. there is no official information about the full form of XAMPP,
3. PHPStorm: PHPStorm, developed by JetBrains (the it is usually assumed to stand for cross-platform (X),
same company that developed IntelliJ IDEA for Java), is Apache (A), MariaDB (M), PHP (P) and Perl (P). XAMPP
mainly used for professional purposes but is also available is an open source, widely used Web server developed by
licence-free for students, teachers and open source apachefriends.org, which can be used to create a local HTTP
projects. It has the most up-to-date set of features for rapid server on machines with a few clicks. We will also be using
development since it provides support for leading front- it in our tutorial below.
end technologies like HTML5, CofeeScript, JavaScript
and Sass. It supports all the major frameworks available
in the market like Symfony, CakePHP, Laravel and Zend,
and can also be integrated with databases, version control
software, rest clients and command line tools to ease the
work of developers. A number of MNCs, like Wikipedia,
Yahoo and Cisco, are making use of PHPStorm for PHP
development. You can read more about PHPStorm at
4. Sublime Text: Sublime Text is basically a text editor, but
it can be converted into a PHP IDE by installing various Figure 1: Apache service started on XAMPP control panel

www.OpenSourceForU.com | OPEN SOURCE FOR YOU | SEPTEMBER 2017 | 71

Developers Let’s Try

ƒ Download the latest version of XAMPP on your systems

from the website https://www.apachefriends.org/
download.html. After the download is complete, install it
in your machine. Index of/PHPDevelopment
ƒ By default, XAMPP is installed in your machine’s C
drive, but if you have specified any other directory in
the installation process, go to that directory and create a Parent Directory
folder named PHPDevelopment inside the htdocs folder in Add Two Numbers.php 2017-07-30 13:47 545
your XAMPP installation directory folder. FirstDemo.php 2017-07-30 13:24 333
For example, C:\xampp\htdocs\PHPDevelopment.
ƒ Now start the XAMPP control panel and click on the Start
Apache/2.4.26 (Win32) OpenSSL/1.0.21 PHP/5.6.31 Server
button to start Apache.
ƒ Create a text file inside the above folder named Figure 2: Screenshot of the list of files in your directory
AddTwoNumbers.php and copy the following code inside it:

<!DOCTYPE html>
<html> Addition of Two Number
Number 1:
</head> Number 2:
<h3>Addition Of Two Numbers</h2>
<div>Number 1:</div> Figure 3: PHP program when run on the browser
<input type=”text” name=”num1”/>
<div>Number 2:</div> ƒ Now type localhost/ PHPDevelopment and it will
<input type=”text” name=”num2”/> list all the files in your directory on your browser, as
<div><br><input type=”submit” value=”CALCULATE SUM”></ shown in Figure 2.
div><br> ƒ Click on AddTwoNumbers.php and you will be directed
</form> to the required page, where you can perform the addition
of two numbers.
<?php Here, you can see that the form has been created using
if (isset($_GET[‘num1’]) && isset($_GET[‘num2’])) { HTML and the corresponding addition of the numbers is
done using PHP. Now start your Web development using
$num1 = $_GET[‘num1’]; PHP. You can also make use of the various frameworks
$num2 = $_GET[‘num2’]; available to simplify development and lessen your coding
$sum = $num1 + $num2; time. Happy coding!
echo “Sum of $num1 and $num2 is $sum”;
} By: Vinayak Vaid
?> The author works as an automation engineer at Infosys Limited,
Pune. He has worked on different testing technologies and
</body> automation tools like QTP, Selenium and Coded UI. He can be
contacted at vinayakvaid91@gmail.com.

The latest from the Open Source world is here.

Join the community at facebook.com/opensourceforu
Follow us on Twitter @OpenSourceForU

72 | SEPTEMBER 2017 | OPEN SOURCE FOR YOU | www.OpenSourceForU.com

How To Developers

Crawling the Web

with Scrapy
Web crawling or spidering is the process of systematically extracting
data from a website using a Web crawler, spider or robot. A Web scraper
methodically harvests data from a website. This article takes the reader
through the Web scraping process using Scrapy.

crapy is one of the most powerful and popular Python following file structure:
frameworks for crawling websites and extracting
structured data useful for applications like data “scrapy_first/
analysis, historical archival, knowledge processing, etc. -scrapy.cfg
To work with Scrapy, you need to have Python installed on scrapy_first/
your system. Python can be downloaded from www.python.org. -__init__.py
- items.py
Installing Scrapy with Pip -pipelines.py
Pip is installed along with Python in the Python/Scripts/folder. -settings.py
To install Scrapy, type the following command: -spiders/
pip install scrapy
In the folder structure given above, ‘scrapy_first’ is the
The above command will install Scrapy on your machine root directory of our Scrapy project.
in the Python/Lib/site-packages folder. A spider is a class that describes how a website will be
scraped, how it will be crawled and how data will be extracted
Creating a project from it. The customisation needed to crawl and parse Web
With Scrapy installed, navigate to the folder in which you pages is defined in the spiders.
want to create your project, open cmd and type the command
below to create the Scrapy project: A spiders’s scraping life cycle
1. You start by generating the initial request to crawl the
scrapy startproject scrapy_first first URL obtained by the start_requests() method,
which generates a request for the URLs specified in the
The above command will create a Scrapy project with the start_urls, and parses them using the parse method as a

www.OpenSourceForU.com | OPEN SOURCE FOR YOU | SEPTEMBER 2017 | 73

Developers How To
Scrapy commands

scrapy startproject myproject This command will create a Scrapy project in the project directory specified; else
[project_dir] with the name of project, if project_dir is not mentioned.

scrapy genspider spider_name This command needs to be run from the root directory of the project, to create a
[domain.com] spider with allowed_domain as domain.com.

Scrapy bench This runs a quick benchmark test, to tell you Scrapy’s maximum possible speed in
crawling Web pages, given your hardware.
scrapy check Checks spider contracts.

scrapy crawl [spider] This command instructs the spider to start crawling the Web pages.

scrapy edit [spider] This command is used to edit the spider using the editor specified in the EDITOR
environment variables or EDITOR setting.
scrapy fetch [url] This command downloads the contents of the URL and stores them in a standard
output file.
scrapy list Lists the available spiders in the project.

scrapy parse [url] This is the default callback used by Scrapy to process downloaded responses,
when their requests don’t specify a callback.

scrapy runspider file_name.py Runs a spider self-contained in a Python file without having to create a project.

scrapy view [url] Opens the URL in the browser as seen by the spider.

scrapy settings This is to get the Scrapy setting value.

scrapy version This is to get the Scrapy version installed.

scrapy shell [url optional] Opens the interactive Scrapy console for the URL.

callback to get a response. that XMLSpider iterates over nodes and CSVSpider iterates over
2. In the callback, after the parsing is done, either of the rows with the parse_rows() method.
three dicts of content — request object, item object or Having understood the different types of spiders, we are
iterable — is returned. This request will also contain the ready to start writing our first spider. Create a file named
callback and is downloaded by Scrapy. The response is myFirstSpider.py in the spiders folder of our project.
handled by the corresponding callback.
3. In callbacks, parsing of page content is performed using import scrapy
the XPath selectors or any other parser libraries like lxml, class MyfirstspiderSpider(scrapy.Spider):
and items are generated with parsed data. name = “myFirstSpider”
4. The returned items are then persisted into the database allowed_domains = [“opensourceforyou.com”]
or the item pipeline, or written to a file using the start_urls = (
FeedExports service. ‘http://opensourceforu.com/2015/10/building-a-
Scrapy is bundled with three kinds of spiders. django-app/’,
BaseSpider: All the spiders must inherit this spider. It is )
the simplest one, responsible for start_urls / start_request()
and calling of the parse method for each resulting response. def parse(self, response):
CrawlSpider: This provides a convenient method for page = response.url.split(“/”)[-2]
crawling links by defining a set of rules. It can be overridden filename = ‘quotes-%s.html’ % page
as per the project’s needs. It supports all the BaseSpider’s with open(filename, ‘wb’) as f:
attributes as well as an additional attribute, ‘rules’, which is a f.write(response.body)
list of one or more rules. self.log(‘Saved file %s’ % filename)
XMLSpider and CSVSpider: XMLSpider iterates over the
XML feeds through a certain node name, whereas CSVSpider In the above code, the following attributes have been defined:
is used to crawl CSV feed. The difference between them is 1. name: This is the unique name given to the spider in the project.

74 | SEPTEMBER 2017 | OPEN SOURCE FOR YOU | www.OpenSourceForU.com

How To Developers

2. allowed_domains: This is the base address of the URLs define the item Fields in our project’s items.py file. Add the
that the spider is allowed to crawl. following lines to it:
3. Start_requests(): The spider begins to crawl on the
requests returned by this method. It is called when the title=item.Field()
spider is opened for scraping. url=item.Field()
4. Parse(): This handles the responses downloaded for
each request made. It is responsible for processing the Our code will look like what’s shown in Figure 1.
response and returning scraped data. In the above code,
the parse method will be used to save the response.body
into the HTML file.

Crawling is basically following links and crawling around
websites. With Scrapy, we can crawl on any website using a
spider with the following command:

scrapy crawl myFirstSpider

Figure 1: First item
Extraction with selectors and items
Selectors: A certain part of HTML Source can be scraped After the changes in the item are done, we need to
using selectors, which is achieved using CSS or Xpath make some changes in our spider. Add the following lines
expressions. so that it can yield the item data:
Xpath is a language for selecting nodes in XML
documents as well as with HTML, whereas CSS selectors from scrapy_first.items import ScrapyFirstItem
are used to define selectors for associate styles. Include the
following code to our previous spider code to select the title def parse(self, response):
of the Web page: item=ScrapyFirstItem()
def parse(self, response): for select in response.xpath(‘//title’):
url=response.url title=select.xpath(‘text()’).extract()
for select in response.xpath(‘//title’): self.log(“title here %s” %title)
title=select.xpath(‘text()’).extract() item[‘title’]=title
self.log(“title here %s” %title) yield item

Items: Items are used to collect the scraped data. They Now run the spider and our output will look like
are regular Python dicts. Before using an item we need to what’s shown in Figure 2.

Figure 2: Execution of the spider

www.OpenSourceForU.com | OPEN SOURCE FOR YOU | SEPTEMBER 2017 | 75

Developers How To

Scraped data gethostname())”

After data is scraped from different sources, it can be persisted Increment stat value: “stats.inc_value(‘count_variable’)”
into a file using FeedExports, which ensures that data is stored Get stat value: “stats.get_stats()”
properly with multiple serialisation formats. We will store the
data as XML. Run the following command to store the data: 3. Sending email: Scrapy comes with an easy-to-use service
for sending email and is implemented using a twisted non-
scrapy crawl myFirstSpider -o data.xml blocking IO of the crawler. For example:

We can find data.xml in our project’s root folder, as shown from scrapy.mail import MailSender
in Figure 3. mailer=MailSender()
mailer.send(to=[‘abc@xyz.com’],subject=”Test Subject ”
,body=”Test Body”, cc=[‘cc@abc.com’])

4. Telnet console: All the running processes of Scrapy are

controlled and inspected using this console. It comes enabled by
default and can be accessed using the following command:
Figure 3: data.xml
telnet console 6023

5. Web services: This service is used to control Scrapy’s Web

crawler via the JSON-RPC 2.0 protocol. It needs to be installed
separately using the following command:
Figure 4: Final spider
pip install scrapy-jsonrpc
Our final spider will look like what’s shown in Figure 4.
The following lines should be included in our project’s settings.
Built-in services py file:
1. Logging: Scrapy uses Python’s built-in logging system for
event tracking. It allows us to include our own messages EXTENSIONS={‘scrapy_jsonrpc.webservice.WebService’:500,}
along with third party APIs’ logging messages in our Set JSONRPC_ENABLED settings to True.
application’s log.
Scrapy vs BeautifulSoup
import logging Scrapy: Scrapy is a full-fledged spider library, capable of performing
logging.WARNING(‘this is a warning’) load balancing restrictions, and parsing a wide range of data types
logging.log(logging.WARNING,”Warning Message”) with minimal customisation. It is a Web scraping framework and
logging.error(“error goes here”) can be used to crawl numerous URLs by providing constraints. It
logging.critical(“critical message goes here”) is best suited in situations like when you have proper seed URLs.
logging.info(“info goes here”) Scrapy supports both CSS selectors and XPath expressions for data
logging.debug(“debug goes here”) extraction. In fact, you could even use BeautifulSoup or PyQuery as
the data extraction mechanism in your Scrapy spiders.
2. stats collection: This facilitates the collection of stats in BeautifulSoup: This is a parsing library which provides easy-to-
a key value pair, where values are often counter. This understand methods for navigating, searching and finally extracting
service is always available even if it is disabled, in which the data you need, i.e., it helps us to navigate through HTML and
case the API will be called but will not collect anything. can be used to fetch data and parse it into any specific format. It
The stats collector can be accessed using the stats can be used if you’d rather implement the HTML fetching part
attribute. For example: yourself and want to easily navigate through HTML DOM.

class ExtensionThatAccessStats(object): Reference

def __init__(self,stats): https://doc.scrapy.org
@classmethod By: Shubham Sharma
def from_crawler(cls,crawler): The author is an open source activist working as a software
return cls(crawlse.stats) engineer at KPIT Technologies, Pune. He can be contacted
Set stat value: “stats.set_value(‘hostname’,socket. at shubham.ks494@gmail.com.

76 | SEPTEMBER 2017 | OPEN SOURCE FOR YOU | www.OpenSourceForU.com

Overview Developers

Five Friendly Open Source Tools for

Testing Web Applications
Web application testing helps ensure that the apps conform to certain set
standards. It is a means to check for any bugs in the application before the latter
goes live or the code is released on the Internet. Various aspects of the application
and its behaviour under different conditions are checked. Here’s a brief introduction
to five popular open source tools you can use for this job.

he term ‘Web application’ or ‘Web app’ is often Types of testing
confused with ‘website’. So let’s get that doubt cleared 1. Functional testing: Functional testing is a superset
—a Web application is a computer app that is hosted validating all those features and functionalities that the
on a website. A website has some fixed content while a Web application is meant to perform. It includes testing the
application performs various definite actions based on the business logic around the set rules. Listed below are some
users’ inputs and actions. of the common checkpoints:
ƒ Tests links to a page from external pages.
Web application testing ƒ Validates the response to a form submission.
Web application testing involves all those activities that ƒ Checks, creates, reads, updates, deletes (CRUD) tasks.
software testers perform to certify a Web app. This testing ƒ Verifies that the data retrieved is correct.
has its own set of criteria and checkpoints, based on the ƒ Identifies database connectivity and query errors.
development model, to decide whether the actions are part 2. Browser compatibility testing: Because of the
of expected behaviour or not. availability of cross-platform browser versions, it

www.OpenSourceForU.com | OPEN SOURCE FOR YOU | SEPTEMBER 2017 | 77

Developers Overview

has become necessary to validate if the application guidelines, including fonts, frames and borders.
is supported on other browser versions without ƒ Checks that images load correctly and in their proper size.
compatibility issues. If the application is not behaving With the increasing need to analyse the performance
properly on certain browsers, it is good to mention the of your Web app, it is a good idea to evaluate some of the
supported versions to avoid customer complaints. Below popular open source performance testing tools.
are some of the common checkpoints:
ƒ Checks browser rendering of your application’s user interface. Why choose open source performance
ƒ Checks the browser’s security settings for cross-domain test tools?
access and hacks. 1. No licensing costs – a commercial load testing tool can
ƒ Verifies consistent functioning of the app across multiple really burn a hole in your pocket when you want to test
versions of a browser. with a large number of virtual users.
ƒ Checks user interface rendering on different-sized mobile 2. Generates (almost) an infinite amount of load on the Web
device screens, including screen rotation. app without charging users any additional licensing costs.
ƒ Verifies that the application operates correctly when the The only limitation would be the resources available.
device moves in and out of the range of network services. 3. Enables you to create your own plugins to extend the
3. Performance testing: Performance testing focuses analysis and reporting capabilities.
on checking how an application behaves under extra 4. Integrates with other open source and commercial tools to
load, which refers to the number of users accessing drive end-to-end test cycles.
the application simultaneously. It is good to see which
particular feature is breaking down under the given load. Popular open source Web application
Listed below are some of the common checkpoints: test tools
ƒ Checks the server’s response to the browser form Licensed tools have their own benefits but open source always
submit requests. stands out because of the ease of use. Here are some popular
ƒ Identifies changes in performance over a period of time. open source Web app test tools that are easily available and
ƒ Tests for functions that stop working at higher loads. simple to use as well.
ƒ Identifies how an application functions after a system
crash or component failure. 1. JMeter: Load and performance tester
ƒ Identifies forms and links that operate differently under JMeter is a pure Java desktop application designed to
higher loads. load-test functional behaviour and measure performance.
4. Security testing: Securing user data is a critical task and It can be used to test performance both on static and
Web apps should not leak data. Testing ensures that the dynamic resources (files, Servlets, Perl scripts, Java objects,
app works only with a valid login, and that after logout, databases and queries, FTP servers and more). It can be
the data remains secure and pressing the ‘back’ key does used to simulate a heavy load on a server, network or object
not resume the session. Given below are some of the to test its strength or to analyse the overall performance
common checkpoints: under different load types. JMeter was originally used for
ƒ Checks whether the app operates on certain URLs testing Web and FTP applications. Nowadays, it is used for
without logging. functional tests, database server tests, etc.
ƒ Tests basic authentication using false user names and
password credentials. The pros of JMeter
ƒ Tests if the app functions correctly upon invalid URL ƒ A very lightweight tool that can be installed easily.
attribute values. ƒ As it is an open source tool, you need not be worried
ƒ Checks how the app functions with invalid input fields, about the licence.
including text fields. ƒ There are multiple plugins that are available in the market
ƒ Tests CAPTCHA fields for Web forms and logins.
5. Usability testing: Any Web app is considered user
friendly if accessibility is easy and navigation is smooth.
If there are ambiguities in representations, then these
should be corrected. Users want clear descriptions and
representations. Shown below are some of the common
ƒ Tests that the content is logically arranged and easy for
users to understand.
ƒ Checks for spelling errors.
ƒ Checks that pages adhere to colour and pattern style Figure 1: JMeter

78 | SEPTEMBER 2017 | OPEN SOURCE FOR YOU | www.OpenSourceForU.com

Overview Developers

and can be installed easily, according to requirements. ƒ Capybara uses the same DSL to drive a variety of
ƒ Offers caching and offline analysis/replaying of test results. browsers and headless drivers.

The cons of JMeter The cons of Capybara

ƒ It can be used only on Web applications. ƒ The only con of this tool is that its framework adds a layer
ƒ Consumption of memory is high in GUI mode, and load, on top of the actual implementation which makes it tough
stress and endurance testing with high user loads should to debug what is actually happening.
preferably be run in non-GUI mode.
ƒ Complex scenarios cannot be checked using JMeter 3. Selenium: Web app testing tool
thread group. Selenium is a suite of tools such as Selenium IDE, Selenium
ƒ Recording is complex, as we need to set up Remote Control and Selenium Grid to test the Web
the proxy manually. application. Selenium IDE is an integrated development
ƒ It supports only Java for custom coding. environment for Selenium scripts. It is implemented as a
Firefox extension, and allows you to record, edit, and debug
2. Capybara: Acceptance test framework for tests. It supports record and playback.
Web applications
Capybara is a Web based automation framework used The pros of Selenium
for creating functional tests that simulate how users ƒ It is a low cost tool.
interact with your application. It is a library built to ƒ It can carry out browser compatibility testing.
be used on top of an underlying Web based driver. It ƒ It offers a choice of languages.
offers a user friendly DSL (domain specific language), ƒ It has multiple testing frameworks.
which is used to describe actions that are executed by ƒ It is easy to integrate with the testing ecosystem.
the underlying Web driver. When the page is loaded ƒ It is open for enhancement.
using the DSL (and underlying Web driver), Capybara ƒ It has test-driven development.
will try to locate the relevant element in the DOM ƒ It’s useful for comprehensive testing.
(Document Object Model) and execute the action, such
as click a button, link, etc. The cons of Selenium
ƒ There are a few problems while testing.
The pros of Capybara ƒ There are issues with finding locators.
ƒ No set-up necessary for Rails and Rack applications. It ƒ There are limitations in browser support.
works out-of-the-box. ƒ Manual scripts are not allowed.
ƒ Intuitive API, which mimics the language an actual user ƒ The performance is slow.
would use.
ƒ Powerful synchronisation features mean you never have 4. Sahi: An automation and testing tool
to manually wait for asynchronous processes to complete. Sahi is an automation and testing tool for Web applications.
It is available in both open source and proprietary versions.
The open source version includes record and playback
on all browsers, HTML reports, suites and batch run,
and parallel playback. The Pro version includes some

Figure 2: Selenium’s capabilities Figure 3: Sahi’s interface

www.OpenSourceForU.com | OPEN SOURCE FOR YOU | SEPTEMBER 2017 | 79

Developers Overview
Table 1: A comparison of open source Web app testing tools
Title Company Scope Application rights User interface available Supported technology
JMeter Apache Test automation Free use, Batch mode, plugin, stan- JDBC driver, JMS,
framework; open source dalone application LDAP CORBA, IMAP,
Testing tool POP3, SMTP, SOAP,

Seleni- GitHub project, Test automation Free use, Integrated into ALM, Ma- Adobe Flash, Ajax,
umHQ Google Code framework; open source ven; standalone applica- .NET, DOM, Java GUI,
Projects, Testing tool tion; Web based Android apps, Silver-
SeleniumHQ light, CSS, HTML, HTTP,
Capybara GitHub project Test automation Free use, COM API; Tool extension; Web, Web services
framework; open source Web based
Testing tool
Sahi Pro Tyto Software Test automation Commercial, trial Command Adobe Flex, Ajax, Java,
framework; line PHP, RubyOnRails,
Testing tool HTTPS, JavaScript

WebLOAD RadView Cloud services; Commercial, demo, Mobile applications,

Software Testing tool free use, trial Web

of the enhanced features like test distribution and report and a sophisticated analytics dashboard. WebLOAD
customisation. Sahi runs as a proxy server; the proxy has built-in flexibility, allowing QA and DevOps teams
settings are configured to point to Sahi’s proxy and then to create complex load testing scenarios thanks to
inject JavaScript event handlers into Web pages. native Java scripting. WebLOAD supports hundreds
of technologies – from Web protocols and enterprise
The pros of Sahi applications to network and server technologies.
ƒ Sahi can achieve most of the automation with the
available functions and variables. It has all the inbuilt The pros of WebLOAD
APIs required for complex tasks. Sahi also has multi- ƒ It has native JavaScript scripting.
browser support. ƒ UI wizards enhance the script.
ƒ It does not require additional tools to run and execute the ƒ It supports many technologies.
tests. All the tests run from the inbuilt Sahi Controller. ƒ It offers easy-to-reach customer support.

The cons of Sahi The cons of WebLOAD

ƒ Compared to Selenium, ƒ It does not support Citrix.
Sahi is difficult to start ƒ It does not support the SAP GUI.
as it involves a complex Figure 4: WebLOAD ƒ It does not support RDP and RTE.
installation process. It also has a very confusing interface. Table 1 compares the merits of all the testing solutions.
ƒ It does not provide the same visibility that Selenium
does, is less popular and has the smallest and least
developed community. By: Meghraj Singh Beniwal
The author has a B.Tech in electronics and communications, and
5. WebLOAD: The best load-runner alternative is currently working as an automation engineer in a company
WebLOAD is an enterprise-scale load testing tool which located in Virginia (USA). He can be contacted at meghrajsingh01@
rediffmail.com or meghrajwithandroid@gmail.com.
features a comprehensive IDE, a load generation console,

The latest from the Open Source world is here.

Join the community at facebook.com/opensourceforu
Follow us on Twitter @OpenSourceForU

80 | SEPTEMBER 2017 | OPEN SOURCE FOR YOU | www.OpenSourceForU.com

Insight Developers

Developing Research Based Web

Applications Using Red Hat OpenShift

OpenShift is a Kubernetes based container application, designed for container based software
deployment and management. It is an application development and hosting platform, which
automates management and enables the developer to focus on the app itself.

he increase in the volume, velocity and the variety Features of Red Hat OpenShift
of data from multiple channels demands high Red Hat OpenShift is one of the leading cloud services
performance computing resources that can process providers in the PaaS (Platform as a Service) paradigm. It
heterogeneous Big Data. It is not always possible to purchase provides multiple platforms to cloud users with the flexibility
costly computing resources like high performance multi- to develop, deploy and execute applications on the cloud.
core processors with supercomputing powers, huge memory OpenShift has high performance data centres with enormous
devices and related technologies to process, visualise and processing power to work with different programming
make predictions on the datasets related to live streaming languages, which include Java, PHP, Ruby, Python, Node.js,
and real-time supercomputing applications. To cope with Perl, Jenkins Server, Ghost, Go and many others.
and work with such technologies, cloud services are used, A beginner can use the Free Tier of Red Hat OpenShift
whereby computing resources can be hired on demand and for the development, deployment and execution of new
billed for as per usage. cloud apps on the online platform provided by it. Any of
There are a number of cloud services providers in the the programming languages mentioned can be used for the
global market with different delivery models including development of apps with real-time implementation.
Infrastructure-as-a-Service (IaaS), Platform-as-a-Service
(PaaS) and Software-as-a-Service (SaaS). Nowadays, Developing PHP research based
there are some new keywords in the cloud delivery space, Web applications
like Network-as-a-Service (NaaS), Database-as-a-Service OpenShift provides multiple programming language options
(DBaaS), Testing-as-a-Service (TaaS) and many others. Each to cloud users for the development of apps. With each
of these cloud delivery approaches has different resources, programming language, OpenShift delivers multiple versions
which are used for different applications. so that the compatibility issues can be avoided at later stages.

www.OpenSourceForU.com | OPEN SOURCE FOR YOU | SEPTEMBER 2017 | 81

Developers Insight

Figure 1: The OpenShift portal

Figure 2: Login panel for Red Hat OpenShift

Figure 5: Selecting the programming language on OpenShift

Figure 6: Assigning the URL to the PHP application on OpenShift

Figure 3: Starter and pro plans for cloud users on OpenShift the commands that should be executed on the local
command prompt or in the Linux Shell.

Developing a Twitter extraction using PHP

on the OpenShift PaaS
To develop a Twitter extraction on the OpenShift PaaS, use
the code given below:

Figure 4: Dashboard of OpenShift to create new applications $settings = array(
‘oauth_access_token’ => “XXXXXXXXXXXXXXXXXXXXXXXXX”,
The cloud applications can be uploaded using ‘oauth_access_token_secret’ => “
mapping with GIT via a local command prompt XXXXXXXXXXXXXXXXXXXXXXXXX “,
(Windows CMD or Linux Terminal). OpenShift specifies ‘consumer_key’ => “ XXXXXXXXXXXXXXXXXXXXXXXXX “,

82 | SEPTEMBER 2017 | OPEN SOURCE FOR YOU | www.OpenSourceForU.com

Insight Developers

Figure 9: Copying local files to the OpenShift cloud

Figure 7: Selecting the region the cloud application should be deployed in

Figure 10: Committing the changes as a permanent write operation on the cloud
Figure 8: Mapping of the command prompt with GIT to upload the code
on the live cloud

‘consumer_secret’ => “ XXXXXXXXXXXXXXXXXXXXXXXXX “

$twitterurl = “https://api.twitter.com/1.1/statuses/user_
$requestMethod = “GET”;
if (isset($_GET[‘user’])) {$user = $_GET[‘user’];} else
{$user = “gauravkumarin”;}
if (isset($_GET[‘count’])) {$count = $_GET[‘count’];} else Figure 11: View the configuration and URL of the cloud app
{$count = 20;}
$field = “?screen_name=$user&count=$count”;
$mytwitter = new TwitterAPIExchange($settings);
$str = json_decode($mytwitter->setGetfield($field)
->buildOauth($twitterurl, $requestMethod)
->performRequest(),$assoc = TRUE);
if($str[“errors”][0][“message”] != “”) {echo “<h3>Sorry,
there was a problem.</h3><p>Twitter returned the following
error message:</p><p><em>”.$str[errors][0][“message”].”</
foreach($str as $current)
{ Figure 12: Executing the Twitter timeline extraction on the OpenShift cloud
echo “Time and Date of Tweet: “.$current[‘created_
at’].”<br />”; count’].”<br />”;
echo “Tweet: “. $current[‘text’].”<br />”; echo “Friends: “. $current[‘user’][‘friends_
echo “Tweeted by: “. $current[‘user’][‘name’].”<br count’].”<br />”;
/>”; echo “Listed: “. $current[‘user’][‘listed_
echo “Screen name: “. $current[‘user’][‘screen_ count’].”<br /><hr />”;
name’].”<br />”; }
echo “Followers: “. $current[‘user’][‘followers_ ?>

www.OpenSourceForU.com | OPEN SOURCE FOR YOU | SEPTEMBER 2017 | 83

Developers Insight

Datasets: Array, CSV, files, datasets for research: Iris,

Wine and Glass

KNearestNeighbors Classifier in PHP-ML

KNearestNeighbors implements the k-nearest neighbours
(k-NN) algorithm for solving the classification problems for
a specific set of data items.
In the following example, inputs with their corresponding
Figure 13: Portal of PHP-ML for machine learning using PHP targets are specified in terms of classes ‘0’ or ‘1’. If these
values are carefully analysed, the corresponding classes can
PHP based machine learning on the OpenShift PaaS be mapped. In the dataset of [2, 5], [3, 6], [4, 7], the values
Machine learning is a powerful science that makes use of in each set are in increasing order and there is a difference of
soft computing and meta-heuristic approaches for effectual +3, which is assigned to the class ‘0’. Similarly, in [4, 2], [5,
predictive mining even from a huge dataset (https://php-ml. 3], [7, 5], the values in each set are decreasing and assigned
readthedocs.io/en/latest/). It has been traditionally used class ‘1’. This data can be trained using k-NN with the
for fraud detection, market analytics, email spam filtering, implementation of the train() function.
malware analysis, fingerprint evaluation, face detection
and many other applications. In machine learning, the $input = [[2, 5], [3, 6], [4, 7], [4, 2], [5, 3], [7, 5]];
algorithms are implemented in a way in which better $target = [‘0’, ‘0’, ‘0’, ‘1’, ‘1’, ‘1’];
classifications and predictions can be made—somewhat $classifier = new KNearestNeighbors();
similar to the intelligence of the human brain. In traditional $classifier->train($input, $target);
implementations, artificial neural networks are used with
machine learning to solve complex classification problems. For the prediction of new input data, the predict()
A number of libraries and frameworks are available under function is implemented. As in the following example,
Free and Open Source Software (FOSS) distribution for predict ([5, 7]) is passed as input, the output will be returned
machine learning. FOSS libraries that can be integrated for as class ‘0’ because the values in [5, 7] are in increasing
research and development include PHP-ML, Apache Mahout, order and almost of the same behaviour as class ‘0’. The
Shogun, Apache Singa, Apache Spark Mlib, TensorFlow, exact difference of +3 is not mandatory, because machine
Oryx2, Accord.NET, Amazon Machine Learning, Scikit- learning approaches make use of results with a higher
Learn, H2O, ConvNetJS, etc. degree of approximation, probability and optimisation.
PHP-ML is a powerful machine learning library used
for R&D in the domain of machine learning for different $classifier->predict([[10, 6], [1, 3]]);
applications. It integrates assorted algorithms in the form of // The function will return [‘1’, ‘0’] depending upon the
classes and methods for high performance computing with pattern and behavior of input
the analytics from real-time datasets. PHP-ML has a rich set
of algorithms implemented in PHP Scripts, and these can Scope of R&D
be easily integrated on the real-time cloud of OpenShift by As there are many applications for which classification
uploading the code and mapping with GIT. and predictive mining can be used, the free and open
source libraries can be integrated on the real-time clouds
Key features and algorithms in PHP-ML of Red Hat OpenShift, IBM Bluemix, Amazon, Google
Association rule learning: Apriori Apps Engine and many others, depending upon the
Classification: KNearestNeighbors, Naive Bayes, SVC, etc algorithms to be used. The aspects and logs associated
Clustering: k-Means, DBSCAN with performance, complexity, security and integrity can
Cross validation: Random split, stratified random split be analysed with the implementation of algorithms on
Feature extraction: Token count vectoriser, Tf-idf real-time clouds.
Metric: Accuracy, confusion matrix, classification report
By: Dr Gaurav Kumar
Models management: Persistency
Math: Distance, matrix, set, statistic The author is the MD of Magma Research and Consultancy,
Ambala. He is associated with various academic and
Neural network: Multi-layer perceptron classifier research institutes, where he delivers lectures and conducts
Preprocessing: Normalisation, imputation missing values technical workshops on the latest technologies and tools.
Regression: Least squares, SVR You can contact him at kumargaurav.in@gmail.com.
Website: www.gauravkumarindia.com.
Workflow: Pipeline

84 | SEPTEMBER 2017 | OPEN SOURCE FOR YOU | www.OpenSourceForU.com

Insight Developers

A Few Tips for Scaling Up

Web Performance
Every millisecond counts when it comes to loading Web pages and their responsiveness.
It has become critical to optimise the performance of Web applications/pages to retain
existing visitors and bring in new customers. If you are eager to explore the world of
Web optimisation, then this article is the right place to start.

he World Wide Web has evolved into the primary of Web pages is provided by Yahoo (https://developer.yahoo.
channel to access both information and services in the com/performance/rules.html). There are other informative
digital era. Though network speed has increased many resources, too, such as BrowserDiet (https://browserdiet.com/
times over, it is still very important to follow best practices en/#html). Various other factors that contribute to Web page
when designing and developing Web pages to provide optimal optimisation are shown in Figure 1.
user experiences. Visitors of Web pages/applications expect
the page to load as quickly as possible, irrespective of the Content optimisation
speed of their network or the capability of their device. When responding to end user requests, the most time is taken
Along with quick loading, another important parameter is up by the downloading of components such as images, scripts,
to make Web applications more responsive. If a page doesn’t Flash and style sheets.
meet these two criteria, then users generally move out of it ƒ The greater the number of HTTP requests, the more the
and look for better alternatives. So, both from the technical time required for the page to load and its responsiveness
and economical perspectives, it becomes very important to lessens. A critical mechanism to reduce the number of
optimise the responsiveness of Web pages. HTTP requests is to reduce the number of components in
Optimisation cannot be thought of just as an add-on after the Web page. This may be achieved by combining several
completing the design of the page. If certain optimisation components. For example, all scripts can be combined,
practices are followed during each stage of Web page many CSS style sheets can be merged together, etc.
development, these will certainly result in a better performance. ƒ Minimising the DNS lookup is another important factor
This article explores some of these best practices to optimise in optimisation. The primary role of Domain Name
the performance of the Web page/application. Systems is the mapping of human readable domain names
Web page optimisation is an active research domain in to IP addresses. DNS lookups generally take somewhere
which there are contributions from so many research groups. between 20 and 120 milliseconds. Minimising the number
An easy-to-use Web resource to start with the optimisation of unique host names will reduce the number of DNS

www.OpenSourceForU.com | OPEN SOURCE FOR YOU | SEPTEMBER 2017 | 85

Developers Insight

Mathematical expressions in CSS are evaluated a lot more

Images Content times than the developer might actually expect. Avoid
them as far as possible.
ƒ If you have to include multiple CSS files, merge them all
JavaScript Optimisations Dimensions Server
into one file. This reduces the number of HTTP requests.
For example, instead of the following code…
CSS Cookie

<link rel=”stylesheet” href=”1.css” media=”all”>

Figure 1: Dimensions of Web page optimisations <link rel=”stylesheet” href=”2.css” media=”all”>
<link rel=”stylesheet” href=”3.css” media=”all”>
resolution attempts. <link rel=”stylesheet” href=”4.css” media=”all”>
ƒ Reducing the redirects can increase speed. These redirects <link rel=”stylesheet” href=”5.css” media=”all”>
are performed with 301 and 302 status codes.
ƒ With respect to Web 2.0 applications, caching of AJAX …use the command given below:
(Asynchronous JavaScript And XML) requests is an
important step. <link rel=”stylesheet” href=”complete.css” media=”all”>
ƒ The number of DOM (Document Object Model) elements
should be kept under control. ƒ Opt to use <link> over the @import when using CSS in
a page.
Server optimisation
ƒ Using a Content Delivery Network (CDN) can help in JavaScript
optimising the Web page’s performance. Geographical JavaScript has become the de-facto client-side scripting
proximity to the user has a positive impact on the time language. So the way in which JavaScript components are
required to fetch content. built does have a significant impact on the performance of
ƒ A cache-control header can help. If the content is static, Web pages.
then the expiry should be set as Never Expire. For ƒ If possible, move the script to the bottom of the page.
dynamic content, the time up to when the component is This cannot be done always (for example, if your page’s
valid should be set. This will minimise HTTP requests. critical contents are rendered through the document.
ƒ Compressing the components is another great step in write() function).
optimisation. This can be achieved with ‘Gzip’. Experts ƒ Using external JavaScript and style sheet files will enable
estimate that the compression minimises the time required better caching. So, it would be better in many scenarios to
for responses by 70 per cent. put CSS and JavaScript through the external mode.
ƒ With respect to AJAX applications, the GET method is ƒ Minifying and Obfuscation are two effective mechanisms
preferable. So, along with XMLHttpRequest, as far as to improve the performance by tweaking the code. One
possible use the GET method. survey indicates that obfuscation can achieve a 25 per cent
reduction in size.
Cookies ƒ Crowding of events needs to be avoided. Delegating
Cookies are one of the most used mechanisms by Web events properly improves the performance of the page.
developers to store tiny pieces of information. With respect to ƒ The usage of async (asynchronous) must be encouraged,
cookies, the following factors should be considered: as shown below:
ƒ Size of the cookies should be kept minimal.
ƒ Cookies should be set at the appropriate level in the <script async src=”example.js”></script>
domain hierarchy. This is done to reduce the impact on
sub-domains. If you don’t use the aysnc keyword then the page has
ƒ Don’t forget to set a proper expiry date for the cookie. to wait till the example.js is fully downloaded. The aysnc
keyword makes page parsing happen even before the
Style sheets downloading of the script is completed. Once the script is
Professionally designed style sheets make Web pages look downloaded, it is activated. However, when using multiple
elegant. The following factors must be considered in handling async, the order of execution becomes a concern.
style sheets:
ƒ It is better to keep the style sheets in the HEAD section of Optimising images
the Web pages. This is done to permit the pages to render Images are an integral part of most Web pages. Hence, the
incrementally. way images are handled defines the performance of the
ƒ Care should be taken to use expressions in CSS. application. The following factors should be considered:

86 | SEPTEMBER 2017 | OPEN SOURCE FOR YOU | www.OpenSourceForU.com

Insight Developers

ƒ Scaling down of images using HTML tags should be

avoided. There is no point in using a bigger image and Confess YSlow

resizing it using the width and height attributes of the

<img> tag. Yellow Lab Tools PageSpeed
ƒ When using Data URI, the contents can be given in inline Diagnostic Tools
mode. This can be done for smaller sized images. Lighthouse WebPageTest
So, instead of the following command…
DareBoost SpeedCurve
.icon-test { background-image: url(‘test.png’); }

…use the code given below: Figure 2: Performance analysis tools

Table 1
.icon-test { background-image: url(‘data:image/png;bas
Tool Description
MVEUAAACnej3aAAAAAXRSTlMAQObYZgAAAApJRE FUCNdjYAAAAAIAAeIhvDM Apache This load testing tool is popular in the
AAAAASUVORK5CYII%3D’); } JMeter Java community.
Locust This load testing tool can be used to
specify user behaviour with Python.
The capability to handle millions of
ƒ Images generally contain data that are not required in Web
simultaneous user requests can be
usage. For example, the EXIF metadata can be stripped
tested with this tool.
before uploading to the server.
ƒ There are many tools to help you optimise images, such Wrk This is an HTTP benchmarking tool.
as TinyPNG, Compressor.io, etc. There are command line HTTPerf Different types of HTTP workloads shall
based tools also, such as jpegtran, imgopt, etc. be generated and tested. There are
various ports available for HTTPerf:
Performance analysis tools • HTTPerf.rb: Ruby interface
There are many tools available to analyse the performance of • HTTPerf.py: Python Port
Web pages. Some of these tools are illustrated in Figure 2. • HTTPerf.js: JavaScript Port
• Gohttperf: Go port
There are component-specific tools, too. For example, for
benchmarking JavaScript, the following tools may be used:
ƒ JSPerf Benchmarking Web servers
ƒ Benchmark.js Benchmarking of Web servers is an important mechanism in
ƒ JSlitmus Web page/application optimisation. Table 1 provides a sample
ƒ Matcha list of tools available for benchmarking Web servers.
ƒ Memory-stats.js The Web optimisation domain is really huge. This
For PHP, tools such as PHPench and php-bench could be article has just provided a few start-up pointers, using which
harnessed. interested developers can proceed further in understanding the
advanced technicalities of the topic.
As stated earlier, minifying is one of the optimisation References
techniques, for which there are many tools. For HTML, the
[1] Yahoo Best Practices: https://developer.yahoo.com/
following Minifiers could be tried out:
ƒ HTMLCompressor [2] Browser Diet: https://browserdiet.com/en/#html
ƒ HTMLMinifier [3] https://github.com/davidsonfellipe/awesome-
ƒ HTML_press wpo#analyzers
[4] https://github.com/zenorocha/browser-diet/wiki/
ƒ Minimize
Some of the tools used for Minifying JavaScript and CSS
are listed below:
ƒ Uglifyjs2 By: Dr K.S. Kuppusamy
ƒ CSSmin.js The author is assistant professor of computer science,
ƒ Clean-css School of Engineering and Technology at Pondicherry Central
University. He has vast experience in teaching and research
ƒ JShrink
(in academia and industry). He can be reached via mail at
ƒ JSCompress kskuppu@gmail.com.
ƒ YUI Compressor

www.OpenSourceForU.com | OPEN SOURCE FOR YOU | SEPTEMBER 2017 | 87

For U & Me Success Story

Open Source Enables

PushEngage to Serve 20 Million
Push Notifications Each Day!
Bengaluru-based PushEngage has deployed open source technologies such as Apache,
Bootstrap, MongoDB, MySQL, Node.js and Nginx to handle its push notification service.
More than half of its clients are based outside the country.

he digital marketing world has early stage, which is when we started The prime reason behind the
been ruled by the electronic direct considering building PushEngage and mountainous growth of PushEngage is
mailer (eDM) for quite some time. went on to create an automated marketing the ease of its deployment on any website.
But the eDM is now making way for push platform for browser-based push Local search site AskLaila, which receives
notifications. Be it an e-commerce site or notifications, available to all,” recalls Ravi over a million monthly unique visits, claims
an online news publication, Web masters Trivedi, founder and CEO, PushEngage. that notifications through PushEngage can
are deploying push notifications to grow With a small team of just 10 be deployed in as early as ten minutes.
their traffic as well as enhance their brand. employees, Trivedi’s PushEngage The service has also helped the company
But what does a push notification provider handles marketing automation through retain its users. “With PushEngage
deploy to serve a bulk of notifications? notifications for more than 6,000 clients notifications, we have been able to reach
Well, it is usually an open source solution! around the world. The total client base out to users who are not active on the site
Bengaluru-based PushEngage is sends over 20 million notifications on and provide them with helpful offers or
among the few early adopters of push a daily basis. All of that comes from 40 information,” says Nitin Agrawal, director
notifications. The company had built an servers that run in the cloud, and use of engineering, Asklaila.com.
in-house product to test the success rate a mix of proprietary and open source
of notifications circulated over-the-air solutions at the back-end. Bringing community offerings
at the time when Google offered the to the mainstream
A 10-member team handles more
same support to Chrome in April 2015. Trivedi tells Open Source For You that
than 6,000 global clients to serve
The initial results were strong enough to while his company had initially chosen
over 20 million push notifications on
commercialise the product. a daily basis! components that helped to scale better,
“We saw robust results even at the along with a faster development time,

88 | September 2017 | OpeN SOUrCe FOr YOU | www.OpenSourceForU.com

Success Story For U & Me

the team has now planned to look for

community-based alternatives once the
product and its technology stack mature.
“We began with proprietary solutions
so that someone could handle the
complexity and scale in the beginning.
For instance, using AWS (Amazon Web
Services) Kinesis, instead of Apache
Kafka, provides multiple shards and easy
scalability,” the founder mentions.
Being a great fan of open source
technologies since the time he did his The team at PushEngage that handles millions of notifications each day
master’s in computer science at the Indian The first version of the automated re-invent the wheel,” says Trivedi.
Institute of Science in Bengaluru, Trivedi model was a minimally viable product. PushEngage is also committed
wants to rely entirely on open source. However, subsequent versions enabled the to giving back to the community and
PushEngage already uses the Nginx server engineers to incorporate a highly scalable developers worldwide. The company
to scale connections for sending push and efficient combination that helps to send has already provided a push notification
notifications to Web browsers, and there and receive messages on a large scale. API free to developers. This allows any
are plans to switch to Apache Kafka from Apart from building just a scalable developer to integrate all the rich features
the proprietary Kinesis solution — to use solution, the PushEngage team was of customer segmentation, automation,
a server architecture solely based on open required to set up a datastore that could scheduling and triggering notifications, as
source technologies. generate queries on several attributes in well as geotargeting.
real-time and push messages at a fast pace.
The process of enabling There was also a need for datastores that had Open source adoption cuts costs
automation high read characteristics. According to Trivedi, deploying open
PushEngage uses open APIs provided by source helped his company to reduce the
Web browsers such as Chrome, Firefox cost of the project by 25 per cent. “Open
and Safari that adhere to the W3C-Push source adoption enabled PushEngage to
standards. In addition to leveraging the reduce not just initial operational costs
available APIs, the company uses its but also our ongoing cost of product
internally built libraries that have Node.js development,” he notes.
as a programming language and MongoDB
as a data store. Future-proof
“The open source technologies Moreover, open source is making the
behind PushEngage help us become more automation process through PushEngage
competitive as these solutions are tested PushEngage enables automation on the Web future-proof. “We have a microservice-
thoroughly for scale, security and bugs. The based architecture built for horizontal
deployment of open source also reduces PushEngage moved to a microservice- scaling. We also use queue-based
our maintenance tasks and offers us the powered, asynchronous message-based message passing, as well as datastores
freedom to move across any cloud provider architecture to overcome the primary that are strong in both read and write
without a vendor lock-in,” Trivedi says. challenges. “The deployment of appropriate characteristics and have a scalable
architecture provided us with the desired database service. All this ensures our
Challenges of mass adoption burst-mode scalability as well as fast enough scaling needs are well met,” says Trivedi.
Even though open source has made results for our customers,” says Trivedi. The company also uses relational
PushEngage capable of delivering push and non-relational databases, depending
notifications on a massive scale, catering The role of the community on the need of the architecture. All this
to thousands of clients was not easy for To build a secure and advanced logging makes the push notification technology
the company in the early stages. “Push system, as well as testing frameworks for capable of expanding, with the addition
notifications send and receive messages in automating notifications, PushEngage took of new components and features
bursts, and hence the peak capacities are help from the open source community, in the future.
very high but of a short duration. Building accessing libraries on GitHub and
a scalable solution around these issues SourceForge. “Online listings of open
By: Jagmeet Singh
requires well thought out architecture and source developments are quite handy in
The author is an assistant editor at EFY.
solid engineering,” explains Trivedi. finding good libraries, so we don’t have to

www.OpenSourceForU.com | OpeN SOUrCe FOr YOU | September 2017 | 89

Developers Insight

Regular Expressions in Programming

Languages: The Story of C++
In this issue of OSFY, we present the third article on regular expressions in programming
languages. The earlier articles covered the use of regular expressions in general, in Python and
then in Perl. Read on to discover the intricacies of regular expressions in C++.

nterpreted languages often have weakly typed variables 1983. A book titled ‘The C++ Programming Language’, first
which don’t require prior declaration. The additional benefit published in 1985 by Stroustrup himself, and its subsequent
of weakly typed variables is that they can be used to hold editions became the de facto standard for C++ until 1998, when
different types of data. For example, the same variable can the language was standardised by the International Standards
hold an integer, a character, or a string. Due to these qualities, Organization (ISO) and the International Electrotechnical
scripts written in such languages are often very compact. But Commission (IEC) as ISO/IEC 14882:1998, informally called
this is not the case with compiled languages, for which you C++98. The next three standards of C++ are informally called
need a lot of initialisation; and with strongly typed variables, C++03, C++11 and C++14. Hopefully, by the time this article
the code is often longer. Even if the regular expression syntax gets published, the latest standard of C++, informally called
for interpreted and compiled languages is the same, how they C++17, would have been released and will have some major
are used in real programs is different. So, I believe it is time changes to C++. After this, the next big changes in C++ will
to discuss regular expressions in compiled languages. In this take place with a newer standard, informally known as C++20,
article, I will discuss the regular expression syntax of C++. which is set to be released in 2020.
The first three standards of C++, namely the de facto
Standards of C++ standard of C++, C++98 and C++03, do not have any inbuilt
People often fail to notice the fact that programming mechanism for handling regular expressions. Things changed
languages like C and C++ have different standards. This with C++11 when native support for regular expressions was
is quite unlike languages like Perl and Python, for which added with the help of a new header file called <regex>. In
the use of regular expressions is highly warranted due to fact, the support for regular expressions was one of the most
the very nature of these programming languages (they are important changes brought in by this standard. C++14 also
scripting languages widely used for text processing and Web has provision for native support of regular expressions, and it
application development). is highly unlikely that C++17 or any future standards of C++
For a language like C++, heavily used for high- will quash the support for handling regular expressions. One
performance computing applications, system programming, problem we might face in this regard is that the academic
embedded system development, etc, many felt that the community in India mostly revolves around the C++98
inclusion of regular expressions was unnecessary. Many standard, which doesn’t support regular expressions. But this
of the initial standards of C++ didn’t have a natural way is just a personal opinion and I don’t have any documented
for handling regular expressions. I will briefly discuss the evidence to prove my statement.
different standards of C++ and which among them has
support for regular expressions. The C++11 standard
C++ was invented by Bjarne Stroustrup in 1979 and was Unlike C++03 and C++14, for which the changes were
initially known as ‘C with Classes’ and later renamed C++ in minimal, C++11 was a major revision of C++. GCC 5 fully

90 | SEPTEMBER 2017 | OPEN SOURCE FOR YOU | www.OpenSourceForU.com

Insight Developers

supports the features of C++11and C++14. The latter has programs. This and all the other C++ programs and text files
become the default standard for GCC 6. There were many used in this article can be downloaded from opensourceforu.
changes made to the core language by the standard C++11. com/article_source_code/September17C++.zip.
The inclusion of a new 64-bit integer data type, called long
long int, is a change made in C++ by the C++11 standard. #include <iostream>
Earlier, C++ only had 32-bit integers called long int. External #include <regex>
templates were also added to C++ by this standard.
Many more changes were made to the core of the C++ using namespace std;
language by the C++11 standard, but the changes were not int main( )
limited to the core alone — the C++ standard library was {
also enhanced by the C++11 standard. Changes were made to char str[ ] = “Open Source For You”;
the C++ standard library in such a way that multiple threads regex pat(“Source”);
could be created very easily. New methods for generating if( regex_search(str,pat) )
pseudo-random numbers were also provided by the C++11 {
standard. A uniform method for computing the return type of cout << “Match Found\n”;
function objects was also included by the C++11 standard. }
Though a lot of changes have been made to the standard else
library in C++11, the one that concerns us the most is the {
inclusion of a new header file called <regex>. cout<<”No Match Found\n”;
Regular expressions in C++11 return 0;
In C++, support for regular expressions is achieved by }
making changes to the standard library of C++. The header
file called <regex> is added to the C++ standard library to I’m assuming that the syntax of C is quite well known
support regular expressions. The header file <regex> is also to readers, who will understand the simple C++ programs
available in C++14 and, hence, what we learn for C++11 also we discuss in this article, so no further skills are required.
applies to C++14. There are some additions to the header Now let us study and analyse the program. The first two lines
file <regex> in C++14, which will be discussed later in this #include <iostream> and #include <regex> include the two
article. There are three functions provided by the header header files <iostream> and <regex>. The next line of code
file <regex>. These are regex_match( ), regex_search( ) using namespace std; adds the std namespace to the program
and regex_replace( ). The function regex_match( ) returns a so that cout, cin, etc, can be used without the help of the scope
match only if the match is found at the beginning of a string, resolution operator (::). The line int main( ) declares the only
whereas regex_search( ) searches the entire string for a function in this program, the main( ) function.
match. The function regex_replace( ) not only finds a match, This is one problem we face when programming
but it replaces the matched string with a replacement string. languages like C++ or Java are used. You need to write a lot
All these functions use a regular expression to denote the of code to set up the environment and get things moving.
string to be matched. This is one reason why you should stick with languages like
Other than these three functions, the header file <regex> Perl or Python rather than C++ or Java if your whole aim is
also defines a number of classes like regex, wregex, etc, and to process a text file. But if you are writing system software
a few iterator types like regex_iterator and regex_token_ and want to analyse a system log file, then using regular
iterator. But to simplify and shorten our discussion, I will expressions in C++ is a very good idea.
only cover the class regex and the three functions, regex_ The next line of code char str[ ] = “Open Source For
search( ), regex_match( ) and regex_replace( ). I believe You”; initialises a character array called str[ ] with a string in
it is impossible to discuss all the features of the header file which we will search for a pattern. In this particular case, the
<regex> in a short article like this, but the topics I will cover character array is initialised with the string Open Source For
are a good starting point for any serious C++ programmer to You. If you wish to replace the line of code
catch up with professional users of regular expressions. Now char str[ ] = Open Source For You with string str =
let us see how regular expressions are used in C++ with the “Open Source For You”; and thereby use an object str of
help of a small C++ program. string class of C++ instead of a character array, the program
will still work equally well. Remember that the string class of
A simple C++ program using regular C++ is just an instance of the template class basic_string.
expressions This modified program called string.cc is also available
The code below shows a C++ program called regex1.cc. I for downloading. On execution with the commands g++
am sure you are all familiar with the .cc extension of C++ string.cc and ./a.out, the program string.cc will also behave

www.OpenSourceForU.com | OPEN SOURCE FOR YOU | SEPTEMBER 2017 | 91

Developers Insight

this case, the word Source appears as the second word in the
string Open Source For You, and hence no match is found by
the function regex_match( ). Figure 1 shows the output of the
programs regex1.cc and regex2.cc.

Pattern replacement in C++

Let’s now study the working of the function regex_replace().
Figure 1: Output of regex_search( ) and regex_match( ) Consider the program regex3.cc which uses the function
regex_replace( ). This function will search for a match and if
just like the program regex1.cc. Since I am expecting a mixed it finds one, the function will replace the matched string with a
group of readers with expertise in different programming replacement string.
languages, I tried to make the C++ programs look as much
as possible like C programs, in the belief that C is the #include <iostream>
language of academia and everybody has had a stint with it #include <regex>
as a student. I could have even used the printf( ) and scanf( ) #include <string>
functions instead of cout and cin. But a line should be drawn
somewhere, and this is where I have stopped making C++ using namespace std;
programs look like C.
The next line of code regex pat(“Source”); is very int main( )
important. It is responsible for setting up the regular expression {
pattern that should be searched in the string Open Source For char str[ ] = “Open Source Software is Good”;
You. Here the pattern searched is the word ‘Source’ which is regex pat(“Open Source”);
stored in an object called pat of the class regex. char rep[ ] = “Free”;
The next few lines of code contain an if-else statement. The cout <<regex_replace(str,pat,rep)<<’\n’;
line of code if( regex_search(str,pat) ) uses the function regex_ return 0;
search( ) provided by the header file <regex> to search for the }
pattern stored in the object pat of the class regex in the string
stored inside the character array str[ ]. If a match is found, the Except for the line of code cout <<regex_
line of code cout << “Match Found\n”; is executed and prints replace(str,pat,rep)<<’\n’; I don’t think any further
the message Match Found. If a match is not found, the else explanation is required. This is the line in which the function
part of the code cout << “No Match Found\n”; is executed regex_replace( ) is called with three parameters -- the
and prints the message No Match Found. This program can be character array str[ ] where the string to be searched is stored,
compiled with the command g++ regex1.cc, where g++ is the the regular expression pattern to be matched stored in the
C++ compiler provided by GCC (GNU Compiler Collection). object pat of the class regex, and the pattern to be replaced
This will produce an executable called a.out. This is then stored in the character array rep[ ]. Execute the program
executed with the command ./a.out. On execution, the program regex3.cc with the commands g++ regex3.cc and ./a.out. You
prints the message Match Found on the screen because the will see the message Free Software is Good on the screen.
function regex_search( ) searches the entire string to find a Nothing surprising there, because the string in the character
match. Since the word Source is present in the string Open array str[ ] contains Open Source Software is Good, the
Source For You, a match is found. pattern to be searched is Open Source and the replacement
Now, it is time for us to revisit the difference between the pattern is Free. Hence, a match is found and a replacement is
functions regex_search( ) and regex_match( ). To do this, the done by the function regex_replace( ).
line of code if( regex_search(str,pat) ) in the program regex1. The next question to be answered is whether the function
cc is replaced with the line if( regex_match(str,pat) ). regex_replace( ) behaves like the function regex_search( )
This modified code is available as a program named or the function regex_match( ). In order to understand the
regex2.cc, which can be compiled with the command g++ behaviour of the function regex_replace( ) clearly, let us
regex2.cc, which will produce an executable called a.out. modify the program regex3.cc slightly to get a program called
This is then executed with the command ./a.out. Now the regex4.cc as shown in the following code:
output printed on the screen is No Match Found. Why? As
mentioned earlier, this is due to a difference between the #include <iostream>
functions regex_search( ) and regex_match( ). The function #include <regex>
regex_search( ) searches the entire string for a match and #include <string>
the function regex_match( ) returns true only if the regular
expression pattern is present at the beginning of a string. In using namespace std;

92 | SEPTEMBER 2017 | OPEN SOURCE FOR YOU | www.OpenSourceForU.com

Developers Insight

int main( ) return 0;

{ }
char str[ ] = “Open Source Software is Good”;
regex pat(“Good”); When the program regex5.cc is executed with the
char rep[ ] = “Excellent”; commands g++ regex5.cc and ./a.out, the message printed
cout <<regex_replace(str,pat,rep)<<’\n’; on the screen is UNIX IS AN OPERATING SYSTEM. So, a
return 0; case-sensitive pattern match is carried out here. The next
} question is: How do we carry out a case-insensitive pattern
match? For this purpose, we use a regex constant called icase.
On execution with the commands g++ regex4.cc and ./a.out, When the line of code regex pat(“UNIX”); is replaced with
the program regex4.cc prints the message Open Source Software the line regex pat(“UNIX”, regex_constants::icase); a case
is Excellent. This clearly tells us that the function regex_replace() insensitive match is carried out, and this results in a match for
behaves like the function regex_search( ) whereby the whole three lines in the text file file1.txt. Figure 3 shows the results
string is searched for a possible match for the given regular of the case-sensitive and case-insensitive matches. There
expression, unlike the function regex_match( ) which looks for are many other regex constants defined in the namespace
a match at the very beginning of a string. Figure 2 shows the regex_constants. Some of them are nosubs, optimize, collate,
output of the two programs, regex3.cc and regex4.cc. etc. Use of these regex constants will add more power to your
regular expressions. It is a good idea to learn more about them
File processing in C++ with regular as you gain more information about C++ regular expressions.
The next question that needs to be answered is: How do we
process data inside a text file with a regular expression? To
test the working of such a program, a text file called file1.txt
is used, which is the same one used in the previous articles in
this series on regular expressions.
Figure 2: The function regex_replace( ) in C++
unix is an operting system
Unix is an Operating System
Linux is also an Operating System

Now let us consider the following C++ program called

regex5.cc that reads and processes the text file file1.txt line by
line, to print all the lines that contain the word ‘UNIX’. Figure 3: Case-sensitive and case-insensitive matches

#include <iostream> Regular expressions in C++14 and C++17

#include <string> It is now time for us to discuss regular expressions in C++14.
#include <fstream> Luckily, except for a few minor additions, the <regex> header
#include <regex> file of C++11 has remained largely unchanged even after the
introduction of the later standard C++14. For example, the
using namespace std; definitions of the functions regex_match( ) and regex_search()
are slightly modified in C++14 so that additional processing
int main( ) with these functions is possible. But these changes only add
{ more power to existing functions and do not affect their
ifstream file(“file1.txt”); basic working. And finally, what are the changes that will be
string str; brought on by C++17? Hopefully, nothing major. So far, there
regex pat(“UNIX”); have been no rumours about whether there will be a major
while (getline(file, str)) revision to the header file <regex>. Therefore, whatever we
{ have learnt from this article can be used for a long time.
if( regex_search(str,pat) )
{ Regular expression style of C++
cout << str <<”\n”; Unlike the previous two articles in this series, in this article I
} have started by explaining C++ code snippets using regular
} expressions directly, without providing details regarding the

94 | SEPTEMBER 2017 | OPEN SOURCE FOR YOU | www.OpenSourceForU.com

Insight Developers

kind of regular expression syntax being used in C++. Sometimes while (getline(file, str))
it is better to attack the problem directly than beat around the {
bush. But even then, it is absolutely essential to know the regular if( regex_search(str,pat) )
expression syntax used with C++. Otherwise, this article may {
just be a set of ‘Do-It-Yourself’ instructions. C++11 regular cout << str <<”\n”;
expressions support multiple regular expression styles like }
ECMAScript syntax, AWK script syntax, Grep syntax, etc. }
ECMAScript is a scripting language and JavaScript is return 0;
the most well-known implementation of ECMAScript. The }
syntax used by ECMAScript is not much different from the
other regular expression flavours. There are some minor On execution with the following commands, g++ regex7.cc
differences, though. For example, the notation \d used in Perl and ./a.out the program prints those lines containing numbers
style regular expressions to denote decimal digits is absent alone. Figure 4 shows the output of the program regex7.cc.
in ECMAScript style regular expressions. Instead, a notation Except for the line of code regex pat(“^[[:digit:]]+$”);
like [[:digit:]] is used there. I am not going to point out which defines the regular expression pattern to be searched,
any other such difference but just keep in mind that C++11 there is no difference between the working of the programs
supports multiple regular expression styles and some of the regex5.cc and regex7.cc. The caret symbol ^ is used to denote
styles differ from the others, slightly. that the match should happen at the very beginning and the
dollar symbol $ is used to denote that the match should occur
A practical regular expression for C++ at the end. In the middle, there is the regular expression
Now let us discuss a practical regular expression with which [[:digit:]]+ which implies one or more occurrences of
we can find out some real data rather than finding out ‘strings decimal digits, the same as [0-9]+. So, the given regular
starting with abc and ending with xyz’. Our aim is to identify expression finds a match only if the line of text contains only
those lines that contain only numbers. Consider the text file decimal digits and nothing more. Due to this reason, lines of
file2.txt with the following data to test our regular expressions: text like AA111, 22.22, a1234z, etc, are not selected.

aaaaaaa Figure 4: Regular expressions for numbers
AA111 Now it is time for us to wind up the discussion. Like the
111 previous two articles in this series, I have covered the use of
2222 regular expressions in a particular programming language as
33333 well as some other aspects of the programming language that
22.22 will affect the usage of regular expressions. In this article, the
BBBB lengthy discussion about the standards of C++ was absolutely
essential, without which you might blindly apply regular
Now consider the program regex7.cc with the expressions on all standards of C++ without considering
following code: the subtle differences between them. The topics on regular
expressions discussed in this article may not be comprehensive
#include <iostream> but they provide an adequate basis for any good C++
#include <string> programmer to build up from. In the next part of this series
#include <fstream> we will discuss the use of regular expressions in yet another
#include <regex> programming language, maybe one that is much used on the
Internet and the World Wide Web.
using namespace std;
By: Deepu Benson
int main() The author has nearly 16 years of programming experience.
{ He is a free software enthusiast and his area of interest
ifstream file(“file2.txt”); is theoretical computer science. The author maintains a
string str; technical blog at www.computingforbeginners.blogspot.in
and can be reached at deepumb@hotmail.com.
regex pat(“^[[:digit:]]+$”);

www.OpenSourceForU.com | OPEN SOURCE FOR YOU | SEPTEMBER 2017 | 95

For U & Me Insight

Eight Top-of-the-Line Open Source

Game Development Tools
The open source game development tools presented in this article give developers
numerous options to explore and choose from, as per their requirements.

pen source game development is generally looked GDevelop
upon as a tech enthusiast’s hobby. Rapid advancements GDevelop is an open source, cross-platform game creator
in technology, combined with the various innovations platform designed for novices. There is no requirement for
being launched every day, have put tech experts and gamers in any sort of programming skills. GDevelop is a great platform
a win-win situation. Open source provides interoperability, high to develop all sorts of 2D and 3D games. It consists of several
quality and good security in game devlopment. Little wonder editors on which games can be created. The list is as follows.
then that open source platforms are already being used for quite ƒ Project manager: This displays the open games in the editor,
a few successful and complex games. allowing developers to set and organise the scenes. Users can
The following points highlight some of the advantages of select the scene to be edited and modify the parameters like
the open source gaming platforms. the title, background colour, text, etc. It also gives access to
ƒ Better quality and more customised software: With the the image bank editor of the games, and allows the user to
source code being available on open source gaming select the extensions to be utilised by the game.
platforms, professional developers can customise features ƒ Image bank editor: This allows the user to manage all
and add varied plugins as per their own requirements, sorts of images via objects. It supports transparency
which is beneficial for game development companies. integrated in the image.
ƒ Say good bye to licensing: With completely open source ƒ Scene editor: This allows users to organise the scene at
platforms, there is no requirement for any sort of licensing. the start, positioning the object in the scene.
So apart from no licence costs, other issues like tracking and ƒ Object editor: This allows the creation of objects to be
monitoring are also avoided. displayed on the stage, like text and 3D box objects. It
ƒ Lower cost of hardware: Open source gaming platforms also has the ‘Particle Transmitter’ object, which allows
in Linux involve lower hardware costs compared to developers to use particles in the game with ease.
Windows. With the advantages of easy portability ƒ Layer editor: This allows users to manage the interface
and high compression, Linux requires low hardware that remains motionless, while allowing the camera of the
configurations. So, game development costs are lower rest of the game to move or zoom.
and even legacy hardware systems can be used for game ƒ Event editor: This allows users to animate the scene,
development. depending on the conditions and actions that will be
Let’s take a look at the top open source gaming performed on the objects of the scene.
development platforms, which give developers numerous The events are compiled by GDevelop in machine code
options to explore and choose from, as per their requirements. — the mechanism is simple and similar to writing C++ code.

96 | SEPTEMBER 2017 | OPEN SOURCE FOR YOU | www.OpenSourceForU.com

For U & Me Insight

Features Features
ƒ It comprises various objects which can be used readily — ƒ Nice and clean interface: Has a visual editor, a dynamic
text objects, 3D boxes, own customised shapes via Shape scene system, a user friendly content creation interface,
Painter, the particle engine, dynamic lights and shadows, a visual shader editing tool and live editing on mobile
custom collision masks, etc. devices.
ƒ Adds behaviours to the objects through the physics ƒ Efficient in 2D game design because of a dedicated
engine, pathfinding, top-down movement, platformer 2D engine, custom 2D physics engine, and a flexible
engine, draggable objects and the automation of tasks. kinematic controller.
ƒ Offers advanced design features and interfaces through ƒ High-end 3D game development by importing animation
the scene editor, multiple layers, the debugger and models from 3DS Max, Maya and Blender; has skeleton
performance profilers. deforms and blend shapes, lighting and shadow mapping,
ƒ Other features include HTML 5 support, sound and music HDR rendering, anti-aliasing, etc.
effects, and integration with the joystick and keyboard. ƒ Flexible animation engine for games, enabled by the visual
animation editor, frame-based or cut-out animation, custom
Latest version: 4.0.94 transition curves and tweens, and animation tree support.
Official website: http://compilgames.net/ ƒ Other features include a Python-like scripting language, a
powerful debugger and an easy C++ API.

Latest version: 2.1.3

Official website: https://godotengine.org

Figure 1: GDevelop user interface

Godot Engine Figure 2: Godot Engine interface

The Godot Engine is a highly powerful cross-platform
game development engine that supports 2D and 3D game Cocos2d-x
development from a unified interface. The platform supports Cocos2d-x is an open source game development platform
Windows, OS X, Linux and BSD for developing games for available under the MIT License. It allows developers to build
the PC, console and even mobile-cum-Web platforms. It is games, apps and various interactive programs. It enables
integrated with a wide variety of tools, providing developers developers to make use of C++, Lua and JavaScript for cross-
with tons of options, and avoiding the need for even a single platform deployment on iOS, Android, Windows Phone, OS
third party tool. The engine is built on the concept of a tree X, Windows and Linux devices.
of nested scenes. The cocos2d-x renderer engine is highly optimised for
The games created with Godot are either written in C++ or 2D graphics with OpenGL support. It is packed with tons
a customised scripting language called GDScript, which is a of features like skeletal animation, sprite sheet animation,
high level, dynamically typed language with many similarities coordinate systems, visual effects, textures, tile maps, multi-
to Python. GDScript is greatly customised and optimised resolution devices, etc.
for the Godot engine. Godot has a power text editor, which It is maintained by developers at Chukong Technologies,
provides developers various features like auto indentation, which is also developing Cocostudio, a WYSIWYG editor.
highlighting syntax and even code completion. It also has a
debugger to provide breakpoints and program stepping. Features
Godot makes use of the OpenGL ES 2.0 graphics engine, ƒ Animation: It provides numerous animation options
which has many features like transparency, normal mapping, that work on sprites using a set of actions and timers.
dynamic shadows using shadow maps, and various post- It supports animation of particle effects, image filtering
processing effects like FXAA, bloom, DOF, HDR, gamma effects through shaders, etc.
correction and fog. ƒ Easy GUI: It includes an easy GUI interface with text

98 | SEPTEMBER 2017 | OPEN SOURCE FOR YOU | www.OpenSourceForU.com



Profit from IoT India’s Electronics Showcasing Technology

Manufacturing Show that Powers the Light
India’s #1 IoT Show. At Electronics For You, Is there a show in India that showcases the Our belief is that the LED Bulb itself is a
we strongly believe that India has the potential latest in electronics manufacturing such as culmination of advancement in technology. And,
to become a super power in the IoT space in Rapid Prototyping, Rapid Production and such a product category and its associated
the upcoming years. All that's needed are Table Top Manufacturing? industry cannot grow without focusing on latest
platforms for different stake-holders of the technologies. But, while there are some good
eco-system to come together. Yes, there is now - EFY Expo 2018. EFY B2B shows for LED Lighting in India, none has
Expo 2018, with its focus on the said areas a focus on “the technology that powers the light”.
We’ve been building one such platform: and being co-located at India Electronics Thus, the need for LEDasia.in.
IoTshow.in--an event for the creators, the Week, has emerged as India's leading show
enablers and customers of IoT. In Feb 2018, the on latest manufacturing technologies and Who Should Attend?
3rd edition of IoTshow.in will bring together a electronics components • Tech Decision Makers: CEOs, CTOs, R&D,
B2B expo, technical and business conferences, Design Engineers, etc—who are developing
Start-up Zone, demo sessions of innovative Who Should Attend? latest LED-based products
products, and more. • Manufacturers: CEOs, MDs etc—who are • Purchase Decision Makers: CEOs,
manufacturing electronics & technology Purchase Managers, Production Managers,
Who Should Attend? products etc from manufacturers that use LEDs
• Creators of IoT Solutions: OEMs, Design • Purchase Decision Makers: CEOs, • Channel Partners: Importers, distributors,
Houses, CEOs, CTOs, Design Engineers, Purchase Managers, Production resellers of LEDs & LED Lighting products
Software Developers, IT Managers, etc Managers, etc involved in electronics • Investors: Start-ups, Entrepreneurs,
• Enablers of IoT Solutions: System manufacturing Investment Consultants interested in this
Integrators, solution providers, distributors, • Technology decision makers: Design sector
resellers, etc engineers, R&D heads etc. – involved in • Enablers: System Integrators, Lighting
• Business Customers: Enterprise, SMEs, electronics manufacturing consultants, etc interested in Smarter
Government, Defence, Academia, etc • Channel Partners: Importers, distributors, Lighting Solutions (thanks to co-located
resellers of electronics components, tools IOTshow.in)
Why Should You Attend? & equipment
• Get updates on latest technology trends • Investors: Start-ups, Entrepreneurs, Why Should You Attend?
defining the IoT landscape Investment Consultants interested in • Get updates on latest technology trends
• Get a first-hand glimpse of products & electronics manufacturing defining LED & LED Lighting sector
solutions that enable development of better • Get a first-hand glimpse of latest
IoT solutions Why Should You Attend? components, equipment and tools to help
• Connect with leading IoT brands seeking • Get updates on latest technology trends in produce better lighting products
channel partners and system integrators rapid prototyping & production and table • Get connected with new suppliers from
• Connect with leading top manufacturing across India to improve your supply-chain
suppliers/service-providers of electronics, IT • Get connected with new suppliers from • Connect with OEMs, principles, lighting
and telecom services who can help you across India to improve your supply-chain brands seeking channel partners and
produce IoT solutions better and faster • Connect with OEMs, principles, brands system integrators
• Network with the who’s who of the IoT world seeking channel partners and distributors • Connect with foreign suppliers and principles
and build connects with industry peers • Connect with foreign suppliers and to represent them in India
• Find out IoT solutions that can help you principles to represent them in India • Explore new business ideas and investment
reduce costs or increase revenues • Explore new business ideas and opportunities ideas in LED & Lighting sector
• Get updates on latest business trends investment opportunities in this sector • Get a first-hand view of “IOT + Lighting”
shaping demand and supply for IoT solutions that make lighting smarter


Reasons Why You Should NOT Attend IEW 2018

We spoke to few members of the tech of the speakers are not vendors?
community to understand why they did Where most talks will not be by people
not attend past editions of India trying to sell their products? How
Electronics Week (IEW). Our aim was boring! I can't imagine why anyone
India’s Mega Tech Conference to identify the most common reasons would want to attend such an event. I
and share them with you, so that if you love sales talks, and I am sure
EFY Conferences (EFYCON) started as a tiny too had similar reasons, you may everybody else does too. So IEW is a
900-footfall community conference in 2012, going by the choose not to attend IEW 2018. This is big 'no-no' for me.
name of Electronics Rocks. Within four years it grew into what they shared…
“India’s largest, most exciting engineering conference,” #7. I Don't Need Hands-on
and was awarded as the most important IoT global #1. Technologies like IOT, AI, Knowledge
event in 2016 by Postscapes. Embedded Systems Have No Future I don't see any value in tech workshops
Frankly, I have NO interest in new being organised at IEW. Why would
In 2017, 11 independent conferences on IOT, Artificial technologies like Internet of Things anyone want hands-on knowledge?
Intelligence, Cyber Security, Data Analytics, Cloud
(IOT), Artificial Intelligence, etc. I don't Isn't browsing the Net and watching
Technologies, LED Lighting, SMT Manufacturing, PCB
think these will ever take off, or become YouTube videos a better alternative?
Manufacturing, etc were held together in 3 days, as part
of EFY Conferences. critical enough to affect my organization
or my career. #8. I Love My Office
Key Themes of Conferences Why do people leave the comfort of
& Workshops in 2018 #2. I See No Point in their office, and weave through that
• Profit from IoT: How can suppliers make money and Attending Tech Events terrible traffic to attend a technical
customers save money using IoT What's the point in investing energy event? They must be crazy. What’s the
• IT & Telecom Tech Trends Enabling IoT
Development and resources to attend tech events? I big deal in listening to experts or
• Electronics Tech Trends Enabling IoT Development would rather wait and watch—-let networking with peers? I'd rather enjoy
• Artificial Intelligence and IoT others take the lead. Why take the the coffee and the cool comfort of my
• Cyber-security and IoT initiative to understand new office, and learn everything by browsing
• Latest Trends in Test & Measurement Equipment
• What's New in Desktop Manufacturing
technologies, their impact and business the Net!
• The Latest in Rapid Prototyping & Production models—beats me.
#3. My Boss Does Not Like Me #9. I Prefer Foreign Events
Who Should Attend? My boss is not fond of me and doesn't While IEW's IOTshow.in was voted as
• Investors & Entrepreneurs in Tech
• Technical Decision Makers & Influencers
really want me to grow professionally. World's #1 IOT event on
• R&D Professionals And when she came to know that IEW Postscapes.com, I don't see much
• Design Engineers 2018 is an event that can help me value in attending an event in
• IOT Solution Developers advance my career, she cancelled my India—and that, too, put together by an
• System Integrators application to attend it. Thankfully, she Indian organizer. Naah! I would rather
• IT Managers
is attending the event! Look forward to attend one in Europe or an event
a holiday at work. organized by foreigners.
#4. I Hate Innovators Hope we've managed to convince
• Academicians • Defence Personnel
Oh my! Indian start-ups are planning to you to NOT to attend IEW 2018.
• Bulk/Group Bookings
give LIVE demonstrations at IEW 2018. Frankly, we too, have NO clue why
I find that hard to believe. Worse, if my 10,000-plus techies attended IEW in
boss sees these, he will expect me to March 2017. Perhaps there's
create innovative stuff too. I better find a something about the event that we've
way to keep him from attending. not figured out yet. But, if we haven't
been able to dissuade you from
#5. I Am Way Too BUSY attending IEW 2018, then you may
I am just too busy with my ongoing register at http://register.efy.in.
projects. They just don't seem to be
getting over. Once I catch up, I'll invest
Conference Special Privileges
some time in enhancing my knowledge Pass Pricing & Packages For
and skills, and figure out how to meet 1 Day Pass Defence & Defence
my deadlines. INR 1999 Electronics Personnel
PRO Pass
#6. I Like Vendor Events INR 7999 Group & Bulk
Can you imagine an event where most Bookings


The Themes
• Profit from IOT • Rapid Prototyping & Production
• Table Top Manufacturing • LEDs & LED Lighting

The Co-located Shows

Why Exhibit at IEW 2018?

More Technology India’s Only Test Bag year-end orders:
Decision Makers and & Measurement meet prospects in early
Influencers than Any Show is Also a Feb, get orders
Other Event Part of IEW before FY ends

It’s a Technology - 360 Degree Promotions World’s #1 IOT Show is

centric Show and Not via Event, Publications a Part of IEW and IOT is
Just a B2B Show and Online! Driving Growth

3,000+ Visitors are Being Held at a Venue The only show in

Conference (KTPO) That’s Closer Bengaluru in
Delegates Alone to Tech Firms FY 2017-18

Besides Purchase Your Brand and It’s an Electronics For

Orders--You Can Solutions Reach Out to You Group Property—
Bag ‘Design Ins’ and 500,000+ Audience, Not And We Value Your Trust
‘Design-Wins’ too Just 15 to 20,000 More Than Money

Co-located Events IEW Connects You Special Packages for

Offer Cross with Customers ‘Make in India’, ‘Design
Pollination of Before the Event, in India’, ‘Start-up India’
Business & At the Event, and even and ‘LED Lighting’
Networking After the Event Exhibitors

Why Should You Risk Being an Early Bird?

1. The best locations sell out first
2. The earlier you book—better are the rates, and more are the deliverables
3. We might just run out of space this year!

To get more details on how exhibiting at IEW 2018 can help you achieve your sales & marketing goals,

contact us at +91-9811155335 OR write to us at growmybiz@efy.in

EFY Enterprises Pvt Ltd | D-87/1, Okhla Industrial Area, Phase -1, New Delhi– 110020
Insight For U & Me

boxes, labels, menus, buttons and common elements. ActionScript 3 library that is very similar to the traditional
ƒ Physics engine: It supports 2D physics engines like Flash architecture. It recreates the Flash display list
BOX2D and Chipmunk. architecture on top of the accelerated graphics hardware.
ƒ Audio: It supports sound effects and background music. It is a very compact framework but comprises various
ƒ Network support: HTTP with SSL, WebSocket API, packages and classes. The following are the sets of tools that
XMLHttpRequest API, etc. are integrated with Starling for application development:
ƒ Display programming: Every object is a display object.
Latest version: 3.15.1 ƒ Images and textures
Official website: http://www.cocos2d-x.org ƒ Dynamic text
ƒ Event handling
ƒ Animation
ƒ Asset management
ƒ Special effects
ƒ Utilities

ƒ It is based on Stage3D and supports multiple platforms
like Android, iOS, Web browsers, OS X, etc.
ƒ It has low configuration requirements in terms of CPU,
memory and GPU.
ƒ It has lower battery consumption.
Figure 3: Cocos2d-x user interface ƒ Has effective object organisation via hierarchical trees,
i.e., parent-child relationship.
Delta Engine ƒ Highly powerful and efficient event system using
Delta Engine is an open source 2D and 3D app and game ActionScript.
development engine maintained by Delta Engine company. ƒ Supports texture atlases, filters, stencil masks, blend
The applications and games can be developed in an easy modes, Tweens, multi-touch, bitmap fonts and 3D effects.
manner through Visual Studio.net or the Delta engine editor.
Delta Engine supports various languages and frameworks Latest version: 2.2
like C# OpenGL, C# OpenTK, C# GLFW, C# XNA, C# Official website: https://gamua.com/starling
SharpDX, C# SlimDX, C# MonoGame, LinuxOpenGL,
MacOpenGL and WebGL. Panda 3D
It supports various platforms like Windows OS, OS X, Panda 3D is an open source framework for rendering and
Linux, Android, Android TV and Linux. developing 3D games using C++ and Python programs.
The entire gaming engine is written in C++ and makes use
Features of automatic wrapper generators to expose the complete
ƒ It supports 3D features like 3D model importing, a functionality of the engine in the Python interface. It supports
particle effect editor, etc. OpenGL and DirectX.
ƒ Content like images, sounds, music and 3D models is Panda 3D includes various tools like scene graph
saved directly using the Delta engine. browsing, performance monitoring, animation optimisers and
ƒ Supports physical simulation; most code is many more.
interchangeable for both 2D and 3D simulation.
ƒ Supports integration of external libraries and frameworks Features
like the 2D Sprite animation library, Spine. ƒ Hassle-free installation and supports Windows, OS X and
ƒ App Builder tool integrated in the editor supports Linux. No need for any sort of compilation.
building, deployment and launching of apps on a mobile ƒ Full Python integration and highly optimised via C++.
device. ƒ Comes with various OpenGL and DirectX features like
GLSL, a powerful interface between shaders and engine,
Latest version: 0.9.11 and supports render-to-texture and multiple render targets.
Official website: https://deltaengine.net ƒ Other features include shader generation, 3D pipeline,
support for OpenAL Audio Engine, FMOD Audio Engine
Starling and Miles Audio Engine.
Starling is an open source 2D game development framework ƒ Has support for the Bullet physics engine, ODE physical
that supports both mobile and desktop platforms. It is a pure engine and PhysX physics engine.

www.OpenSourceForU.com | OPEN SOURCE FOR YOU | SEPTEMBER 2017 | 103

For U & Me Insight

Figure 4: Superpowers’ GUI interface Figure 5: MonoGame user interface

Latest version: 1.9.4 ƒ SharpDX: An open source implementation of DirectX API

Official website: www.panda3d.org for .NET, which supports high performance of 2D and 3D
games and real-time sound.
Superpowers ƒ Lidgren.Network: This is a network library for the .NET
Superpowers is a powerful, open source and free framework, which makes use of the UDP socket to
development platform enabling developers to create fully provide APIs for connecting to the client and server as
customised 2D and 3D games that are highly flexible. It is well as sending and reading messages.
a cross-platform development tool and supports Windows,
Linux and OS X operating systems. It makes use of Features
Typescript to write gaming logic and to highlight the ƒ Via C# and .NET languages, MonoGame enables
syntax, which simplifies development. developers to write reliable and high-performance game
Features ƒ Open source code enables changes and even porting to
ƒ Easy and well laid GUI interface helps even newbies to new platforms.
quickly learn game development. ƒ Bundled with more than 1000 games, MonoGame can be
ƒ Has a powerful Typescript editor fully packed with used for high-end games development.
features like syntax highlighting, auto completion of
code and live error reporting. Latest version: 3.6
ƒ Comes with hundreds of inbuilt licence-free sprites, 3D Official website: www.monogame.net
models, sound effects, fonts and music.
ƒ Built-in library of games and examples acts as a strong
platform for beginners.
Latest version: 4.0 [1] http://compilgames.net/
[2] https://godotengine.org
Official website: http://superpowers-html5.com [3] http://www.cocos2d-x.org
[4] https://deltaengine.net
MonoGame [5] https://gamua.com/starling
MonoGame is powerful free software that Windows based [6] www.panda3d.org
[7] http://superpowers-html5.com
developers and Windows Phone gamers use to run on other [8] www.monogame.net
systems. It is a cross-platform game development tool
and supports Linux, OS X, Android, PlayStation Mobile,
Nintendo Switch, etc. By: Prof. Anand Nayyar
It is basically an open source implementation of the The author is assistant professor in the department of
Microsoft XNA 4 framework. The basic objective of computer applications and IT at KCL Institute of Management
MonoGame is to ‘write once, play everywhere’. and Technology, Jalandhar, Punjab. He loves to work and
The following are the technologies that power the research on open source technologies, cloud computing, sensor
MonoGame API’s cross-platform capabilities: networks, hacking and network security. He can be reached at
anand_nayyar@yahoo.co.in. You can watch his YouTube videos
ƒ OpenTK: A low level C# library that combines at youtube.com/anandnayyar.
OpenGL, OpenCL and OpenAL for 3D graphics.

104 | SEPTEMBER 2017 | OPEN SOURCE FOR YOU | www.OpenSourceForU.com

Insight OpenGurus

Communication Protocols for the

Internet of Things: A Few Choices
With more and more devices being connected to the Internet and millions
of devices interacting with each other and the server, the need for
communication protocols is critical. MQTT, CoAP, and Bluetooth are some
of the communication protocols covered in this article.

n 2008, the number of connected devices in operation divided into three constituents.
exceeded the number of humans connected to the Internet. It 1. The hardware: This makes up the ‘things’ part of IoT
is estimated that by 2025 there will be more than 50 billion and usually has a small microcontroller with sensors/
connected devices generating a revenue of US$ 11 trillion. actuators and firmware running on it, which is responsible
Though the term, the Internet of Things or IoT, was first coined for how it functions. A good example of this would
back in 1999, the buzzword has started becoming a feasible be a smart fitness tracker with, say, an ARM Cortex
reality in recent years. As we can see, the consumer electronics M4 microcontroller and an Inertial Measurement Unit
market is already flooded with smart and connected LED bulbs, (accelerometers or gyroscopes) sending data to your
home automation solutions and intelligent vehicles. Meanwhile, smartphone via Bluetooth.
the Do-It-Yourself (DIY) hobbyist sector is seeing ultra-low 2. The software: Firmware running on the device, mobile
power and high performance SoCs with built-in Wi-Fi, LoRa or applications, cloud applications, databases, device
Bluetooth communication features. management/implementation, the frontend to display data or
The prices of radio chips are now as low as US$ 5 and there an algorithm which gives intelligence to your IoT project—
are tons of new products, unimaginable before but now a reality, all come under the software portion of the IoT stack.
as was seen at this year’s Consumer Electronics Show (CES),
Las Vegas and Mobile World Congress (MWC), Barcelona.
Products like a smart toothbrush that learns your brushing habits,
connected drones that can follow and record you while you are
in the middle of an adventurous moment like river rafting, or a
simple over-the-air (OTA) software update that can turn your
car into a smart self-driving vehicle. With IoT and artificial
intelligence backing it up, the possibilities are endless.

Getting started with IoT

IoT provides a great development stack, so everyone can
contribute to its development and growth. It can be broadly Figure 1: IoT - billions of devices connected with each other

www.OpenSourceForU.com | OPEN SOURCE FOR YOU | SEPTEMBER 2017 | 105

OpenGurus Insight

3. The cloud: The ability to stream and store data over is overall a lightweight protocol that runs on embedded
the Internet, visualise it in a Web browser and control devices and mobile platforms, while connecting to highly
the device remotely from any part of the world is all scalable enterprise and Web servers over wired or wireless
because of the cloud, which virtually makes the data networks. It is useful for connections with remote embedded
available anytime, anywhere. systems, where a small code footprint is required and/
There are innumerable ways to get into the IoT space, right or network bandwidth is at a premium or connectivity is
away. In this article, I’ll talk about communication protocols unpredictable. It is also ideal for mobile applications that
for the IoT space, which can be used for communication require a small size, low power usage, minimised data
between machines or between a machine and server. Due packets, and efficient distribution of information to one or
to constraints in processing capabilities and the low power many receivers. It is an ISO standard (ISO/IEC PRF 20922)
requirements of IoT devices (which are generally meant to be protocol. The good performance and reliability of MQTT is
deployed in environments with constant battery power) with demonstrated by Facebook Messenger, Amazon IoT (AWS-
limited bandwidth capabilities, a need was felt for dedicated IoT), IBM Node-Red, etc—organisations that are using it to
standards and protocols especially designed for IoT. Since serve millions of people daily.
those who manufacture IoT devices and those who create the An MQTT-SN or MQTT sensor network allows you to use
IoT platforms are different, this required industry standards and MQTT over a wireless sensor network, which is not generally
protocols that were not high on power consumption, bandwidth a TCP/IP based model. The MQTT broker can be run locally or
usage, or processing power and could be adopted easily by all deployed on the cloud. It is further enhanced with features like
IoT players—hardware manufacturers, software developers or user name/password authentication, encryption using Transport
cloud solutions/service providers. Layer Security (TLS) and Quality of Service (QoS).
When developing and deploying an IoT project, it’s MQTT implementation: MQTT can be implemented with
important to answer questions like: a broker and MQTT clients. The good news is that both can
ƒ How do my devices talk to each other or to me? be found open sourced in the Mosquitto package, which is
ƒ Do I want the stability of a wired network or the freedom an open source MQTT broker available as a package for
of a wireless one? Linux, OSX or Windows machines. It runs an MQTT broker
ƒ What are my constraints? Data rates, battery power or daemon, which listens for MQTT translations on Port 1883 of
poor networks? TCP (by default). To install it on Debian based machines (like
ƒ What communication options do I have? Ubuntu 16.04, Raspbian Jessie, etc), simply run the following
command from the terminal:
Enter the world of IoT communications
This section covers a list of IoT communication protocols. # sudo apt-get install mosquito mosquito-clients
1. MQTT (Message Queue Telemetry Transport)
MQTT is my preferred IoT protocol, which I use for This will install and run MQTT broker on your Debian
almost all my IoT automation based Linux machine and provide
projects. It was created about 15 clients the utilities mosquitto_pub and
years back for monitoring remote mosquitto_sub, which can be used to
sensor nodes, and is designed test and use it.
to conserve both power and On the device/client side, Eclipse
memory. It is based on the ‘Publish IoT provides a great open sourced
Subscribe’ communication model, implementation of MQTT and
where a broker is responsible for MQTT-SN version 3.1.1 in the form
relaying messages to MQTT clients. of a library known as Eclipse PAHO,
This allows multiple clients to which is available for almost all
post messages and receive updates modern programming languages like
on different topics from a central C, C++, Java, Python, Arduino, etc,
server known as the MQTT broker. or can be used over WebSockets. For
This is similar to subscribing to a Figure 2: The MQTT model more details or the API reference,
YouTube channel, where you get visit http://www.eclipse.org/paho/.
notified whenever a new video is posted. The table in Figure 3 compares HTTP and MQTT, clearly
Using MQTT, a connected device can subscribe to any showing why the latter is a winner in the IoT space.
number of topics hosted by an MQTT broker. Whenever a 2. CoAP (Constrained Application Protocol)
different device publishes data on any of those topics, the Constrained Application Protocol (CoAP) is an Internet
server sends out a message to all connected subscribers application protocol for constrained devices (defined in RFC
of those topics, alerting them to the new available data. It 7228). It enables constrained devices to communicate with

106 | SEPTEMBER 2017 | OPEN SOURCE FOR YOU | www.OpenSourceForU.com

Insight OpenGurus
Characteristics 3G WiFi
3. Bluetooth and Bluetooth Low Energy
Receive Message / Hour 1,708 160,278 3,628 263,314 While MQTT and CoAP are infrastructure-independent,
Percent Battery / Hour 18.43% 16.13% 3.45% 4.23% which means that it doesn’t matter whether you’re connected
Percent Battery /
Message 0.01709 0.00010 0.00095 0.00002 to a wired or a wireless network, Bluetooth provides only
Messages Received
(Note the losses) 240/1024 1024 /
1024 524 / 1024 1024 / 1024 wireless communication over radio frequency (2.4GHz
Send Messages / Hour 1,926 21,685 5,229 23,184 spectrum in the ISM band) using an industry standard that was
Percent Battery / Hour 18.79% 17.80% 5.44% 3.66% initially used to share files between mobile phones and is now
Percent Battery /
Message 0.00975 0.00082 0.00104 0.00016 powerful enough to play music (Advanced Audio Distribution
Figure 3: MQTT vs HTTP Profile/A2DP), stream data, or build your next IoT device.
Bluetooth, generally, is divided into three categories.
Bluetooth Classic: This is meant for high data rate
applications like streaming audio wirelessly.
Bluetooth Smart or Low Energy/BLE: This is meant for low
powered battery-operated devices that stream low packets of data.
Bluetooth SmartReady: These are essentially the ‘hub’
devices such as computers, smartphones, etc. They support
both the ‘classic’ and ‘smart’ devices.
Bluetooth is a sophisticated ad hoc networking protocol, and is
now especially designed from the ground up for IoT. It provides a
Figure 4: The CoAP model stable connection and communication channel, which is extremely
low profile and low powered. An obvious example is fitness
trackers, which even though powered on throughout the day, can
last for months on a single charge or run on a coin cell battery, all
thanks to BLE (Bluetooth Low Energy). Bluetooth Classic has
fixed profiles like UART over Bluetooth class and A2DP class
for audio streaming. On the other hand, Bluetooth Low Energy
provides GATT or Generic Attribute Profile, which allows users
to define their own profile using Bluetooth, like in the case of a
Traditional wireless devices,
Bluetooth Smart Ready
Devices that connect with
Bluetooth Smart
Sensor devices, sending
heart rate monitor. BLE is extremely flexible and useful in the IoT
streaming rich content,
like video and audio
both- the center of your
wireless world
small bits of data, using very
little energy
space. Bluetooth 5.0 is already out and is maturing, offering more
range, more data rates and double the transmission speeds.
Figure 5: Bluetooth flavours
Which protocol should I use for my next IoT project?
the wider Internet using similar protocols. CoAP is designed There are many different protocols and industry standards that
for use between devices on the same constrained network, are specially designed for IoT or can be used for it, such as
between devices and general nodes on the Internet, and the few mentioned above and others like Wi-Fi WebSockets,
between devices on different constrained networks joined Zigbee, LoRA, Simple RF, XMPP, RFID, NFC, etc. Yet, one’s
by the Internet. It is an application layer protocol designed choice should be based on the project requirements and the
for network constrained IoT devices like wireless sensor constraints of the application you are thinking of developing.
network nodes, and is often termed the lightweight version MQTT, for example, is extremely powerful when you have an
of HTTP with support for REST APIs. It can run on most actuator network which needs to respond to a common sensor.
devices that support UDP or a UDP analogue. It implements The PUB/SUB model is ideal in that case. In the case of CoAP,
the REST architectural style which can be transparently you can create your own constrained network environment and
mapped to HTTP. However, CoAP also provides features relay information to the Internet via proxy. If your project does
that go beyond HTTP such as native push notifications and not require the Internet or long range communication, like a
group communication. While a usual HTTP header can be fitness tracker, then Bluetooth Low Energy could be the best
around 100 bytes, a CoAP standard header can be as light as choice. The possibilities in the IoT space are endless.
just 4 bytes. Unlike MQTT, CoAP doesn’t require a broker
server to function. On the implementation side, the Eclipse
By: Ayan Pahwa
Californium project provides a Java implementation of the
CoAP protocol, including support for the DTLS security The author is an IoT, AI and automotive enthusiast working as an
layer. There’s also a MicroCoAP project which provides embedded software engineer at the Mentor Graphics- A Siemens
Business, facility at Noida. He can be reached at Ayan_Pahwa@
CoAP implementation for Arduino. Check out https://
mentor.com. Website: http://iayanpahwa.github.io.

www.OpenSourceForU.com | OPEN SOURCE FOR YOU | SEPTEMBER 2017 | 107


Finding out a file’s size before initiating following command:

an expensive download
With super-fast connectivity, we often download chmod +x ./sizer.sh
gigabytes before realising that a file is bigger than our
budget. To find out the size of the download, install Run it with ./sizer.sh and supply your URL to
curl and gawk. Then fill a local file sizer.sh with the query its size.
following commands:
—A. Datta,
#!/usr/bin/env bash webmaster@aucklandwhich.org
echo -n “Enter full download URL (Ctrl-Shift-V to paste): “
read FULLURL; RET=0 A useful tool called Tilda
while [[ RET -eq 0 ]]; do We often want to access the terminal to execute
LEN=$(curl -Ls --head $FULLURL | grep -i “Content- some command or to see the progress of some job
Length” | tail -1) running inside the terminal; to do this, we need to switch
TYPE=$(curl -Ls --head $FULLURL | grep -i “Content- from using Ctrl+tab. For those who dislike switching,
type” | tail -1) there is Tilda. It is a Linux terminal with no borders and
grep “text” <(echo $TYPE) >/dev/null 2>&1 is hidden from the desktop till a key is pressed. You can
RET=$? install it using the following command on Debian and
if [[ RET -ne 0 ]] similar systems.
tr -dc ‘[[:print:]]’ <<<$TYPE; echo sudo apt-get install tilda
echo $(tr -dc ‘[[:print:]]’ <<<$LEN)” bytes”;
exit Once installed, you can launch and configure Tilda
fi by right-clicking on it and then choose Preferences.
DOMAIN=$(echo $FULLURL|cut -d/ -f1-3) Assign a less frequently used key combination as a
OLDURL=$FULLURL shortcut to launch Tilda so that it doesn’t interfere
FULLURL=$DOMAIN$(curl -Ls --get -r 0-10240 $FULLURL | with your regular shortcut keys. Tilda is very useful
grep -i “http-equiv” | \ when you are following some tutorial and want to code
grep -i “url=” | awk ‘BEGIN {FS=”url=”} {print $2}’ | simultaneously while reading it.
cut -d \” -f 1 | cut -d \’ -f 1)
if [ “$OLDURL” == “$FULLURL” ] —Amar Shukla,
then amarshukla123@gmail.com
echo “Download size cannot be determined”;
exit Easily restrict untrusted applications
fi using Linux namespaces
done Firejail is a SUID sandbox program that reduces the
risk of security breaches by restricting the running
Make it executable once by using the environment of untrusted applications. It does this by

108 | SEPTEMBER 2017 | OPEN SOURCE FOR YOU | www.OpenSourceForU.com

using Linux namespaces, seccomp-bpf and Linux Exec=/usr/bin/idle3
capabilities. Once you install Firejail, you will find Comment=The Drive to Develop
it is pretty easy to use. Given below is an example Categories=Development
that shows you how to use it. It runs Firefox with the Terminal=false
default security profiles. StartupWMClass=IDLE

#firejail firefox All the fields of the above configuration are self-
explanatory. Now, you can search the IDLE application in
There are many options available along with Firejail. Ubuntu Dash and also lock it to the launcher.
You can get all the details from the manual page of the
command. —Narendra Kangralkar, narendrakangralkar@gmail.com

—Renjith Thankachan, Adding colours to the BASH prompt

mail3renjith@gmail.com To add colours to the shell prompt, use the following
export command syntax…
Recording the command session
We often type a sequence of commands in Linux \e[x;ym $PS1 \e[m
and then forget what we have done. Here is a small
tip that will help you record all the commands that …where,
you type. The recorded commands will be stored in a
readable text file named file1.txt. \e[: Starts the colour scheme
x;y: Indicates the colour pair to use (x;y)
$ script file1.txt $PS1: This is your shell prompt variable
\e[m: Stops the colour scheme
Script started, file is file1.txt
$ ls To set a red colour prompt, type the following
$ ps -el command:
$ lsblk
$ exit $ export PS1=”\e[0;31m[\u@\h \W]\$ \e[m “
Script done, file is file1.txt
To set a blue colour prompt, type the following
$ cat file1.txt command:

The output will show all the commands that were $ export PS1=”\e[0;34m[\u@\h \W]\$ \e[m “
executed during the Script session.
Here is a list of colour codes:
— Pritam Nipane, Blue: 0;34
pritamnipane@gmail.com Green: 0;32
Cyan: 0;36
Creating a desktop launcher in Red: 0;31
Ubuntu Unity Purple: 0;35
Unity launchers are actually files stored in your Brown: 0;33
computer with a ‘.desktop’ extension. To create —Rajeeb Senapati,
a desktop launcher, create the .desktop file using Rajeeb.koomar@gmail.com
a text editor and save it under the ~/.local/share/
applications/ directory. For example, given below is
the .desktop file for Python IDLE.
Share Your Linux Recipes!
[bash]$ cat ~/.local/share/applications/idle.desktop The joy of using Linux is in finding ways to get around
problems—take them head on, defeat them! We invite you to
[Desktop Entry]
share your tips and tricks with us for publication in OSFY so
Version=1.0 that they can reach a wider audience. Your tips could be related
Type=Application to administration, programming, troubleshooting or general
Name=IDLE tweaking. Submit them at www.opensourceforu.com. The
sender of each published tip will get a T-shirt.

www.OpenSourceForU.com | OPEN SOURCE FOR YOU | SEPTEMBER 2017 | 109


DVD Of The
Test and secure your applications.
BackBox Linux 5 Live
This distro is designed to be fast, easy to use and provide
a minimal yet complete desktop environment. It is a
penetration testing and security assessment oriented Linux
Live (64-bit) distribution, which offers a network and systems analysis


toolkit. It includes some of the most commonly known/
used security and analysis tools, ranging from Web



This operating systems is designed to be fast, easy to use and provide

application and network analysis to stress tests, sniffing,


a minimal yet complete desktop environment. It is a penetration


testing and security assessment oriented Linux distribution, which

vulnerability assessment, computer forensic analysis, and

offers a network and systems analysis toolkit.


@e automotive and exploitation testing.



tended, and sh

s unin oul
c, i db

Mageia 6 GnOMe Live (64-bit)



terial, if found

d to t

Mageia is a GNU/Linux-based operating system. It is a

he complex n
l e ma

September 2017
community project, supported by a non-profit organisation



bj Int
o ern
Any t dat e

comprising elected contributors. The latest stable release


of the project, Mageia 6, was developed for over two years

before being released officially. It will be supported with
6 security and bug fix updates for 18 months, up to January
GNOME Live (64-bit)
Mageia is a GNU/Linux-based operating system. It is a 16, 2019. The bundled ISO image is for the live GNOME
community project, supported by a non-profit organisation
comprising elected contributors. edition, which can also be installed on your computer.
The live DVD contains all the supported locales and a
. In
ent cas
e th
lac is D
rep VD

preselection of software, making it a faster way to get

a free es n do
work p y.in f ot
roperly, write to us at support@ef

started working with Mageia.

What is a live DVD?

A live CD/DVD or live disk contains a bootable
operating system, the core program of any computer,
which is designed to run your programs and manage all
your hardware and software.
Live CDs/DVDs have the ability to run a complete,
modern OS on a computer even without secondary
storage, such as a hard disk drive. The CD/DVD directly
runs the OS and other applications from the DVD drive
itself. Thus, a live disk allows you to try the OS before
you install it, without erasing or installing anything on
your current system. Such disks are used to demonstrate
features or try out a release. They are also used for
testing hardware functionality, before actual installation.
To run a live DVD, you need to boot your computer
using the disk in the ROM drive. To know how to set
a boot device in BIOS, please refer to the hardware
documentation for your computer/laptop.

110 | september 2017 | OpeN sOUrCe FOr YOU | www.OpensourceForU.com

Earn up to
per hour
Some of our recent courses:

On Udemy:

 GCP: Complete Google Data Engineer and Cloud Architect


 Time Capsule: Trends in Tech, Product Strategy

Curious? Mail us at
Show And Tell: Sikuli - Pattern-Matching and Automation

On Pluralsight:
 Step 1: You work with us to create a course proposal
 aUnderstanding
2-10 hour course
the Foundations of TensorFlow

 Working
 Step with you
2: We pay Graph
advance inofPython
₹ 5,000/hour upon
course approval
 Building Regression Models in TensorFlow

 Step 3: You build the course, we help

 Step 4: We grade your work and pay according to the

rate card below (rates per hour)
Grade A: ₹100,000 | B: ₹50,000 | C: ₹25,000 | F: ₹5,000