Blog article category for blog articles on this site covering the areas of DevOps, Cloud Infrastructure, Site Reliability, Technical Writing, Project Management and Commerical Writing along with Event Management and associated areas. 

Cloud VMs or Containers

Cloud VMs or Containers

Containers or Virtual Machines in your Cloud Design?

I would not be an engineer if I did not muse over what kind of infrastructure I wanted my coded applications to sit on. One of the biggest architectural decisions to be made in any digital product is the housing for your coded application. There are two main choices. You can have Virtual Machines that have their own operating system but are isolated on a physical server from other resources hosted on it or maybe a dedicated physical server for your VM at a higher price point. Also of note is containers, which are lightweight and isolated application wrappers on top of VMs that share underlying logical server resources (VM). Container technology has matured greatly over recent years and raced past VM technology with the help of Kubernetes and a host of cloud provider products. The big leap for digital businesses has been of course abstracted Kubernetes orchestration products like EKS by AWS, Openshift by IBM and AKS by Azure. These technologies offer nearly immutable abstractions of the classic kubeadm build and are incredibly efficient in container orchestration increasing application reliability, and of course availability. Whilst containers do have their drawbacks, the benefits are considerable for SaaS companies noting the following use case considerations:

  • Your application requires 24x7 operation or is a short burst application to run back-end jobs in excess of 15 or so minutes that may suit a pay-as-you-go managed option like AWS Fargate.
  • Your application is cloud native and coded for scaleable container interaction targeting specific backend services and containers for operations.
  • Your database is cloud-native or migratable onto cloud resources. Compatible relational databases would include MariaDB, Postgres, MySQL, SQLServer and Oracle. NoSQL solutions like MongoDB plus more are available but if you are using NoSQL, consider your throughput needs and the suitability of containers in meeting them. The main advantage of containers is speed and not throughput.
  • Availability is important and you want to automate certain tasks such as orchestration.

All these are good reasons to choose containers with Kubernetes adding an extra layer of orchestration automation for scaling containers on appropriately provisioned and monitored nodes that can in themselves be scaled in Auto Scaling Groups/Machine Sets. So, I guess you can see how containers make sense but let's not forget the beauty of virtual machines, which directly support containers and also have use cases of their own outside of containerisation. If your use case has the following elements, you may want to directly use VMs instead of containers:

  • You have legacy applications that have been running on servers for a number of years and don't have the time and/or access to developers to sort out the legacy code that is not modular or altered for use with cloud services.
  • Your database is not compatible with containers or requires too much work in the time frame allotted to reach a migratable state for advanced cloud service offerings like container products.
  • You have cost considerations and containers are just too dear. You do however want to avail of cloud services to provide higher resiliency then a single AZ deployment. Your needs are met with services such as Auto Scaling Groups on AWS to create a steady state group of max 1,  min 1, and desired 1. This will allow the instance to be relaunched in any of its host region AZs should it fail.
  • You are not deploying web applications and have massive throughput needs in your requirements such as a big data mesh network that requires special infrastructure support with high throughput, memory-optimized and if using OLAP streaming, high I/O capable VMs to handle huge loads. 

As you can see, VMs are not going to disappear from the cloud landscape in the foreseeable future. With the arrival of mature container technology, knowing what use case suits what technology in your stack can mean the difference between the success and failure of your digital product. Stay tuned for more on DevOps in this blog along with articles on other areas of interest in the Writing and Cloud Infrastructure arenas. To not miss out on any updates on my availability, tips on related areas or anything of interest to all, sign up for one of my newsletters in the footer of any page on Maolte. I look forward to us becoming pen pals!

Hashicorp Terraform Exam Experience

Hashicorp Terraform Exam Experience

Last Wednesday (August 17th), I passed the Hashicorp Terraform Associate exam and can honestly say that I did not give this certification enough respect in my preparation for it. Many consider this an easy-to-take exam but that did not stop me from sweating under the pressure of finishing it in time. The 60 questions in 60 minutes challenge is not a small undertaking as the knowledge range in the exam syllabus is very wide. I found the below to be my main takeaways, which I will bear in mind for future exams including further Hashicorp certifications.

a) Checkin Strategy - I needed to get a bathroom break 20 mins from the exam end and it was denied by exam policy. It took me 30 mins to check-in. I strongly recommend taking a bathroom break after checking in and before starting the 60 min exam period to be undistracted by bodily function. 

b) Try to give 2 weeks to practice mock exams. I was under extreme time pressure during the exam on questions. Some questions were harder and wordier than I thought given my pre-exam information and research. They also had code snippets introducing complexity beyond any quizzes I did at any level. I finished the course 3 days prior to the exam and should have given the course more respect as I thought the knowledge quiz questions at two levels of complexity would be enough given there were no 'mock exams' available. However, I did find that the exam with my own industry experience and my lab practice made it doable. It would have been much easier under time pressure in the exam if I was well drilled with 2 weeks (1 hr a day) of well-levelled mock exam questions.

c) You get 60 mins for 57-60 questions and I had 60 questions to do. This time pressure combined with wordy questions made it difficult going for me as its 1 min per question and I got stuck twice. I flagged 14 questions and kept on going to the end, making 4 changes upon the revision of those 14 questions. I was not able to focus properly due to mother nature calling in the final 20 mins detracting from my overall score. I think a 200-250 mock question test base in an exam package would be useful for this one. The labs are key to not only abstracted troubleshooting questions but also professional use of Terraform as an engineer.

d) Like all tech exams based on multichoice, single choice, etc, there are various degrees of difficulty in the questions presented focusing on architecture, practice, theory and working config/process. 12 of my 14 questions were flagged in the first 25 questions as they were considerably more wordy and difficult. If that happens to you, take a breath and focus. Do not panic. Try to guess the best option, flag it and move on as time is your enemy if you get stuck in the mud. If you get stuck like I did on two occasions, snap yourself out of it, offer a best guess and move on. This exam is a numbers game, you need to cover all the questions to pass. Some of them are tricky in their wording despite reassurances from techs in Hashicorp to the contrary. I relaxed somewhat around question 27 and that certainly helped in my coverage of questions and my ability to revise the flagged ones.

e) My standard advice on course lessons applies here also. Try to take notes on video lectures and revise them before the next session of videos/lectures begins building knowledge incrementally. Also, practice practice practice in labs. Do them all the way through and try not to leave it past 3 days approx between study sessions. I did this twice for months due to doing other courses and work, etc and can confirm it's not recommended as best practice. Notes are key to picking up the course flow mid-stream and are also great revision tools; which I found helpful on the day before and on the morning of the exam. Also, try to get a good night's sleep and not be requiring food or the bathroom when doing the exam. It's only 60 minutes, but with 30 minutes to check in for this remote-only exam, my assumptions about exam length and the bathroom backfired. It materially detracted from my real ability to focus on questions in the final 20 minutes of a 60-minute exam.

All that said, I am delighted to pass it and look forward to the final leg of my current certification path, which will be a quick Kubernetes refresher and completion of the Openshift course to do the final exam of my current certification path. Stay tuned for more on DevOps in this blog along with articles on other areas of interest in the Writing and Cloud Infrastructure arenas. To not miss out on any updates on my availability, tips on related areas or anything of interest to all, sign up for one of my newsletters in the footer of any page on Maolte. I look forward to us becoming pen pals!


Will DARP replace BGP to run the internet?

Will DARP replace BGP to run the internet?

Will BGP's (Border Gateway Protocol) security/performance issues be fixed by replacing it with DARP (Distributed Autonomous Routing Protocol)?

For those in the networking, security and cloud industry, BGP is arguably the most famous of all networking protocols ever. It's credited with making the internet work given its dynamic routing features as a path vector protocol. This allows it to make decisions based on the routes at hand. With BGP neighbours (internet routing nodes) mapped, it can make the smart call to get your packet to its destination at a remarkable scale. It also has increased convergence (learning its routes) efficiencies via multiple features including the use of AS segments (isolated network segments) and the peering (linking) capability to sew these segments together with external BGP protocol processes.

BGP has been around since 1989 and version 4 is still a respected protocol that has had its day in the sun. BGP is considered by many now as the 'broken part of the internet' that has attracted much attention for its security and performance issues. On the security side, the term 'BGP Hijacking' was born when BGP advertisements for routes in the external BGP peering process were manipulated by bad actors identified by the security industry including the People's Republic of China. This manipulation of the process would in this case reroute network traffic through their peered BGP AS segments located in China. I recall an interesting case of this happening where traffic from Switzerland to the US was rerouted to China on its journey, which can give rise to its interception by the Chinese authorities. This is one of many examples that has led to the creation of taskforce MANRS, which promotes the use of Routing Public Key Infrastructure (RPKI). RPKI substantially increases the security posture of peerings if universally adopted. Network ISP circuit providers like AT&T, Telia plus others now use RPKI just a few years later, which is a fast pace for the telcos industry. Another area of concern is the BGP performance of even a fraction of a second lag in network transit. This can amount to millions of dollars in lost revenue for digital companies at scale. 

Syntrophy has focused on a causal solution in the development of DARP. Their Distributed Autonomous Routing Protocol is a blockchain-based protocol that uses AI instead of physical neighbour mapping to dynamically map a route given its modelling across the internet.  This software-defined network architecture based on centralised network infrastructure information is made whole by a one-way latency test packet for a particular route called a "pulse packet". This "pulse packet" leads the route into the local pulse group and is also used to populate the OWL matrix for latency evaluation by the algorithm.  When it is responded to by the receiving node, it creates a public/private key pair for use in an ad-hoc VPN for each stage of the route bearing the OWL modelling in mind. It's in essence a data mesh that secures node-to-node hops via a VPN-styled connection. Its other features provide lower latency given its routing path is mapped via the blockchain-based AI framework. DARP was made public recently and it's been in community testing for some months now.

Given the challenges taken on by Syntrophy, the ability of this routing protocol to replace BGP at scale is past its proof of concept stage. As a cloud infrastructure engineer, I am most interested to see how this encrypted low-level routing protocol interacts at scale with other areas of the network that are run with a BGP/OSPF protocol mix or maybe a eBGP/iBGP mix. Also of note is how it will interact at scale with higher-level encryption protocols like IPSec in VPN tunnelling and whether protocol isolation is proven at scale. Stay tuned for more on infrastructure in this blog along with articles on other areas of interest in the writing and DevOps arenas. To not miss out on any updates on my availability, tips on related areas or anything of interest to all, sign up for one of my newsletters in the footer of any page on Maolte. I look forward to us becoming pen pals!