Opening up the “Virtual Boozer” #vPub

With social distancing in effect here in the UK and all pubs, bars, cafés and restaurants closed for the foreseeable future, we need to find new ways to socialise.

So, Friday night saw the opening of the “Virtual Boozer”, with me in my Man Cave and my mates James, Pete and Phil all on a video conferencing link. I have to admit it was a little odd, but it worked… and, as we couldn’t meet in person, it was a good way to hook up – even if most of the conversation was COVID19/Coronavirus-related.

I’ll admit the idea is not my own:

  • For a while now, Matt Ballantine (@ballantine70) has been running a “Global Canteen” on an ad-hoc basis for some of us on the WB-40 Podcast WhatsApp group to meet virtually. Last week, the idea of a “Global Boozer” was suggested as a natural extension of this.
  • And Sharon O’Dea Tweeted about using Zoom to meet up with her friends:

So, how can you set up your own Virtual Boozer (a virtual pub for the lads – I should note that our more sophisticated female partners have asked for a virtual wine bar”)?

I thought quite hard about the platform to use:

In the end, I went with Microsoft Teams. Mostly because it’s a tool that’s familiar to me (I use it every day at work) and because it works cross-platform but partly because I have an Office 365 E1 subscription and it’s included. There is a free tier for Microsoft Teams too…

These are by no means all of the tools available though – there’s a huge list that’s been collated in the Remote Work Survival Kit.

I tried creating a Team for the Virtual Boozer and inviting external recipients. In the end, it was just too complicated – with six steps to ensure that external access was granted (and still failing to invite external people). So I fired up Outlook, created a meeting, used the Teams add-in to drop in some meeting details, and emailed my friends.

It worked, but as Matt Ballantine highlighted to me, using the same tools for work and home is perhaps not the best approach to take (he equated it to going for a drink with my mates in the office!). Next time I’ll be seriously considering using Google Hangouts, which seems to work as a mobile app or a browser add-in – and is perhaps a little more consumer-friendly than Microsoft Teams. Everyone has their own preferences – just go with what works for you!

Postscript

The Virtual Boozer opened again last night (27/3/2020); this time using Google Hangouts. It seemed to work well, in a browser or in-app but it does require that you have a Google Account. Much like Teams, scheduling involved creating a calendar appointment (which, for me, meant re-activating my dormant GMail/Google Calendar). Also, it jumps around to show you the person who is talking at the time – some of my friends would have prefered to have all faces on the screen together (which is one of the advantages of Zoom).

Another option, which my youngest son has been using with his mates is Houseparty.

Remote working: spare a thought for the managers…

With the current COVID-19/CoronaVirus health crisis, there’s a huge focus on working from home – for those who are able to.

I’ve been contractually based from home since 2005 and remote working has been the norm for a good chunk of my career. That means I kind of take it for granted and didn’t really appreciate some of the things that others struggle with – like how to focus, finding space to work, and social isolation. (I’m also an introvert, so social isolation is probably not a huge concern to me!)

Somewhat disappointingly, there also seems to be a surge of articles on “how to work from home” – mostly written by people who seem to have very little real-world experience of it. But there are some notable exceptions:

  • Last year, Matt Haughey (@mathowie) published his tips from 16 years of working from home. Whilst some of the advice about his chosen tech (wide angle lens, special lighting, Apple AirPods and Slack) might not be everyone’s cup of tea, there’s a lot of good advice in Matt’s post.
  • And, earlier this week, my colleague Thom McKiernan (@ThomMcK) wrote an excellent blog post on what he calls the six rituals of working from home. I hadn’t really twigged that working from home is a fairly new experience for Thom (since he came to work at risual) and Thom’s post also flagged some of the bad habits I’ve slipped into over the years: the 08:55 starts; the procrastination; not taking regular breaks; and going back to work in the evening). It’s definitely worth a read.

But equipping people with the tools and mindset to work remotely is only part of the problem. Whilst I believe (hope) that the current crisis will help organisations to discover the benefits of remote and flexible working, some of the major challenges are cultural – things like:

  • Trust: comments like “Oh, ‘working from home’ are you?” don’t help anyone. Sure, some people will slack off (it’s human nature) but they will soon be found out.
  • Presenteeism: linked to trust, this often comes down to a management style that is reliant on seeing people at a desk, between certain hours. It’s nothing to do with how productive they are – just that they are there!
  • Output: so many people are hung up on hours worked. Whilst we owe it to ourselves not to over-do things, or for employers to take advantage of employees, much of today’s work should be output-driven. How effective were you? Did you manage to achieve x and y, that contribute towards your personal goals (which ideally contribute towards the company’s goals)? It’s not about the 9-5 (8-6, or 6-8), it’s about delivery!

For managers, it can be difficult when you have a remote team. I struggle sometimes but in some ways, I’m fortunate that I have a relatively senior group of Architects working with me, who are (or at least should be) self-starters. If they don’t deliver, I will hear soon enough, but then I’ll have another problem in that I’ll have a dissatisfied client. So, ideally, I need to make sure that the team understands what’s expected of them.

My approach is this:

  • Trust: everyone I work with is trusted to “get on with it”. That’s not to say I don’t check in on them but there is a huge reliance on people doing what they have been asked to.
  • Empowerment: I don’t micro-manage. I’ll make sure that we have regular 1:1s, and cross-team communications (see below) but everyone in the team is empowered to make decisions. I’ll be there to support them if they want guidance but I need people to make their own choices too… which means I need to be ready to support them if those choices don’t work out the way they, or I, might have liked.
  • Clear guidance: of course, I have a manager too. And he reports to our Directors. Things get passed down. I may not like everything that “passes my desk” but sometimes you just need to get on with it. Communication to my team about what’s required/expected, why it’s needed and when it needs to be done by is vitally important.
  • Communication: related to the above but making sure there are regular team calls. Even once a fortnight for 45 minutes (because we always over-ran 30 minute meetings and an hour is too long) is an opportunity to disseminate key messages and for everyone to share recent experiences. Every few months we’ll meet face to face for a day and work on something that gets added value from being together – but that’s not really remote working (or advisable in the current climate).
  • Understanding my limits: I’m a practicing Consultant as well as a people manager, so I’m not always there (I also work part-time). This goes hand in hand with the empowerment I mentioned above.
  • Tools: everything I’ve written in this post up to now has been about the people but technology can play its part too. I use Microsoft Teams extensively (you may have another choice – email, Slack, SharePoint, Yammer, or something else):
    • I work in the open – on Teams. Email in an Inbox is hidden whereas a post on Teams is visible to all who may have an interest.
    • When I’m not at work, my out of office reply directs people to post on Teams, where another member of the team might be able to help them.
    • Voice and video calls – Teams. Even my mobile phone is directed to a phone number that ends up in Teams.
    • When I set up a project, I use a Wiki in Teams to set out what is required of people – useful information about the client; what they expect from us; what the project is delivering (and how).
    • When I want to get a message out to a team (either hierarchical or virtual/project), I use Teams to communicate (hopefully clearly).

I’m not sure how successful I am – maybe my team can tell me – but it’s an approach that seems to work reasonably well so far. Hopefully it’s of some use to other people.

If you’re a manager, struggling with managing a remote team – or if you have some advice/guidance to share – please leave a comment below.

Further reading

(The up-to-date version of the Remote Work Survival Kit can always be found in Google Docs, with most of the content on the website). A PDF extract is created periodically.

Office updates in an unfamiliar language

A few weeks ago, I spotted this, when I went to apply updates to Office 365 ProPlus on my work laptop:

It had me confused, but Colin Powers (@iamcolpow) pointed me to a Microsoft Community forum post with a potential fix.

I changed the language order in my Office settings so that English was the first option after Match Windows. Whatever was causing Windows to fall back down the list then went to English rather than Arabic.

Office Language Preferences as originally set
Office Language Preferences after the change

Now I can read the dialogue boxes on my Office updates!

Microsoft Online Services: tenants, subscriptions and domain names

I often come across confusion with clients trying to understand the differences between tenants, subscriptions and domain names when deploying Microsoft services. This post attempts to clear up some misunderstandings and to – hopefully – make things a little clearer.

Each organisation has a Microsoft Online Services tenant which has a unique DNS name in the format organisationname.onmicrosoft.com. This is unique to the tenant and cannot be changed. Of course, a company can establish multiple organisations, each with its own tenant but these will always be independent of one another and need to be managed separately.

It’s important to remember that each tenant has a single Azure Active Directory (Azure AD). There is a 1:1 relationship between the Azure AD and the tenant. The Azure AD directory uses a unique tenant ID, represented in GUID format. Azure AD can be synchronised with an existing on premises Active Directory Domain Services (AD DS) directory using the Azure AD Connect software.

Multiple service offerings (services) can be deployed into the tenant: Office 365; Intune; Dynamics 365; Azure. Some of these services support multiple subscriptions that may be deployed for several reasons, including separation of administrative control. Quoting from the Microsoft documentation:

“An Azure subscription has a trust relationship with Azure Active Directory (Azure AD). A subscription trusts Azure AD to authenticate users, services, and devices.

Multiple subscriptions can trust the same Azure AD directory. Each subscription can only trust a single directory.”

Associate or add an Azure subscription to your Azure Active Directory tenant

Multiple custom (DNS) domain names can be applied to services – so mycompany.com, mycompany.co.uk and myoldcompanyname.com could all be directed to the same services – but there is still a limit of one tenant name per tenant.

Further reading

Subscriptions, licenses, accounts, and tenants for Microsoft’s cloud offerings.

Digital transformation – it’s not about the technology

For the last few years, every IT organisation has been talking about “digital”. Digital this, digital that. “Digital transformation” has become a buzzword (OK, two words), just like “Cloud” in 2010 or “Big Data” a few years later.

But what do we mean when we talk about digital transformation? It certainly caused a stir in my recent team meeting.

To answer that, let’s look at three forms of transformation – Cloud, Business and Digital – and how they build on each other:

  • Cloud transformation is about tools and technology. It’s often IT-led (though it should involve business stakeholders too) and so it’s the domain where us techies are most happy. Often, it involves creating new platforms, using cloud services – Azure, Amazon Web Services, Office 365, G-Suite, Dynamics 365, Salesforce. But cloud transformation is just an enabler. In order to deliver value, business transformation is required.
  • Business transformation is about re-engineering internal processes to better serve the needs of the business and improve the way in which services are delivered. It’s about driving efficiencies and delivering better outcomes, but still focused on the way that a business (or other organisation) operates. Business transformation should be business-led but will often (but not always) demand new platforms and services from IT – which leads back to cloud transformation.
  • Digital transformation relates to the external interface with clients/customers/citizens/students. This is the domain of disruptive innovation. Evolve or become extinct. It’s often spoken about in terms of channel shifting – getting people to use digital services in place of older, more laborious alternatives but, ideally, its complementary, rather than replacing existing methods (because otherwise we run the risk of digital exclusion). Importantly though, it’s no good having digital transformation without business transformation and, like business transformation, digital transformation should be business-led.
Cloud, Business and Digital Transformation

Let’s take an example of digital transformation: when my bins were missed from a council waste collection, I logged a call via my local council’s website, which created an incident in a case management system and within an hour or so the bin lorry was back in my street because the driver had been alerted to the missed collection on his in-cab display. The service was excellent (OK, there was a mistake but it was quickly dealt with), the resolution was effective, and it was enabled using digital technologies.

But here’s another example. When I was held up in the neighbouring county by some defective temporary traffic lights at some roadworks, the local authority‘s out of hours phone service wanted me to channel shift to the website (not appropriate when driving a car). It also couldn’t cope with my problem – the out of hours phone service ended up at a random mobile voice mailbox. In the end, I called the Police on 101 (non-emergency) when really some basic business processes needed to be fixed. That shouldn’t necessarily require a technical solution but digital transformation of external services does rely on effective internal processes. Otherwise, what you have created is a shiny new approach on the outside, with the same clunky processes internally.

Hopefully this post has helped to describe the differences between cloud, business and digital transformation. But also consider this… digital transformation relies on business transformation – but not all business transformation needs new IT… the important thing is to identify the challenges being faced, the opportunities to innovate, and only then consider the platforms that are needed in order to move forwards.

Workaround for when Microsoft Edge Beta (v79) fails to load pages

Last week, following Microsoft’s announcement at Ignite that the (Chromium-based) Edge was nearing completion, ready for launch on 15 January 2020, my browser updated itself to the final beta – version 79.0.309.14. And it broke.

I found the original Microsoft Edge to be unreliable and for many years my default browser was Google Chrome. When my latest work PC came without Chrome, rather than ask an administrator to install Chrome for me I installed the Edge beta, which had been rock solid until this latest update.

Various people suggested a later build might fix it (my PC says Edge is up to date, but there will be faster deployment rings than the one I’m on) but the real fix came from my colleague Thom McKiernan (@thommck) who suggested adding the -no-sandbox switch to the Edge Beta shortcut:

"C:\Users\mark\AppData\Local\Microsoft\Edge Beta\Application\msedge.exe" -no-sandbox

The Edge beta complains that “You’re using an unsupported command-line flag –no-sandbox. This poses stability and security risks.” but all my tabs load without issue now. Let’s see if this is properly fixed next time my browser updates…

[Updated 14/11/2019: this post originally indicated that the use of the -no-sandbox switch was a fix for the issue. It’s not – it should be seen as a workaround and used with caution until this problem is properly resolved.]

Using Microsoft Bookings to manage device rollouts

Microsoft Bookings, showing available services

End-user computing (EUC) refreshes can place significant logistical challenges on an organisation. Whilst technologies like Windows 10 Autopilot will take us to a place where users can self-provision, often there’s more involved and some training is required to help users to adopt the technology (and potentially associated business changes).

Over the last few years, I’ve worked on projects that have used a variety of systems to manage the allocation of training/handover sessions to people but they’ve always been lacking in some way. We’ve tried using a PowerApps app, and a SharePoint calendar extension but then Microsoft made their Bookings app available on Office 365 Enterprise subscriptions (it was previously only available for Business subscriptions).

Microsoft Bookings is designed for small businesses and the example given in the Microsoft documentation is a pet grooming parlour. You could equally apply the app to other scenarios though: a hairdressing salon; bike repairs; or IT Services.

I can’t see Microsoft Bookings on my tenant!

That’s because, by default, it’s not there for Enterprise customers. Most of my customers use an E3 or E5 subscription and I was able to successfully test on a trial E3 tenant. My E1 was no good though…

The process to add the Business Apps (free) – including Bookings – to an Enterprise tenant will depend on whether it’s Credit Card (PAYG), Enterprise Agreement (EA) or Cloud Solutions Provider (CSP) licenced but it’s fully documented by Microsoft. When I enabled it on my test tenant, I received an invoice for £0.00.

So, how do I configure Microsoft Bookings?

The app is built around a calendar on a website, with a number of services and assigned staff. Each “staff member” needs to have a valid email address but they don’t need to be a real person – all of the email messages could be directed to a single mailbox, which also reduces the number of licences needed to operate the solution.

It took some thinking about how to do this for my End User Device Handover scenario but I set up:

  • A calendar for the project.
  • A service for the handover sessions. Use this to control when services are provided (e.g. available times and staff).
  • A number of dummy “staff” for the number of slots in each session (e.g. 10 people in each session, 10 slots so 10 “staff”).
Microsoft Bookings, showing confirmation of booked service Microsoft Bookings, calendar view

Once all of the staff available for a session are booked (i.e. all of the slots for a session are full), it’s no longer offered in the calendar. There’s no mechanism for preventing multiple/duplicate bookings but a simple manual check to export a .TSV file with all of the bookings each day will allow those to be identified and remediated.

(Incidentally, Excel wouldn’t open a TSV file for me. What I could do though was open the file in Notepad and copy/paste it to Excel, for sorting and identification of multiple bookings from the same email address.)

Further reading

These blog posts are a couple of years old now but helped a lot:

A logical view on a virtual datacentre services architecture

A couple of years ago, I wrote a post about a logical view of an End-User Computing (EUC) architecture (which provides a platform for Modern Workplace). It’s served me well and the model continues to be developed (although the changes are subtle so it’s not really worth writing a new post for the 2019 version).

Building on the original EUC/Modern Workplace framework, I started to think what it might look like for datacentre services – and this is something I came up with last year that’s starting to take shape.

Just as for the EUC model, I’ve tried to step up a level from the technology – to get back to the logical building blocks of the solution so that I can apply them according to a specific client’s requirements. I know that it’s far from complete – just look at an Azure or AWS feature list and you can come up with many more classifications for cloud services – but I think it provides the basics and a starting point for a conversation:

Logical view of a virtual datacentre environment

Starting at the bottom left of the diagram, I’ll describe each of the main blocks in turn:

  • Whether hosted on-premises, co-located or making use of public cloud capabilities, Connectivity is a key consideration for datacentre services. This element of the solution includes the WAN connectivity between sites, site-to-site VPN connections to secure access to the datacentre, Internet breakout and network security at the endpoints – specifically the firewalls and other network security appliances in the datacentre.
  • Whilst many of the SBBs in the virtual datacentre services architecture are equally applicable for co-located or on-premises datacentres, there are some specific Cloud Considerations. Firstly, cloud solutions must be designed for failure – i.e. to design out any elements that may lead to non-availability of services (or at least to fail within agreed service levels). Depending on the organisation(s) consuming the services, there may also be considerations around data location. Finally, and most significantly, the cloud provider(s) must practice trustworthy computing and, ideally, will conform to the UK National Cyber Security Centre (NCSC)’s 14 cloud security principles (or equivalent).
  • Just as for the EUC/Modern Workplace architecture, Identity and Access is key to the provision of virtual datacentre services. A directory service is at the heart of the solution, combined with a model for limiting the scope of access to resources. Together with Role Based Access Control (RBAC), this allows for fine-grained access permissions to be defined. Some form of remote access is required – both to access services running in the datacentre and for management purposes. Meanwhile, identity integration is concerned with integrating the datacentre directory service with existing (on-premises) identity solutions and providing SSO for applications, both in the virtual datacentre and elsewhere in the cloud (i.e. SaaS applications).
  • Data Protection takes place throughout the solution – but key considerations include intrusion detection and endpoint security. Just as for end-user devices, endpoint security covers such aspects as firewalls, anti-virus/malware protection and encryption of data at rest.
  • In the centre of the diagram, the Fabric is based on the US National Institute of Standards and Technology (NIST)’s established definition of essential characteristics for cloud computing.
  • The NIST guidance referred to above also defines three service models for cloud computing: Infrastructure as a Service (IaaS); Platform as a Service (PaaS) and Software as a Service (SaaS).
  • In the case of IaaS, there are considerations around the choice of Operating System. Supported operating systems will depend on the cloud service provider.
  • Many cloud service providers will also provide one or more Marketplaces with both first and third-party (ISV) products ranging from firewalls and security appliances to pre-configured application servers.
  • Application Services are the real reason that the virtual datacentre services exist, and applications may be web, mobile or API-based. There may also be traditional hosted server applications – especially where IaaS is in use.
  • The whole stack is wrapped with a suite of Management Tools. These exist to ensure that the cloud services are effectively managed in line with expected practices and cover all of the operational tasks that would be expected for any datacentre including: licensing; resource management; billing; HA and disaster recovery/business continuity; backup and recovery; configuration management; software updates; automation; management policies and monitoring/alerting.

If you have feedback – for example, a glaring hole or suggestions for changes, please feel free to leave a comment below.

Amazon Web Services (AWS) Summit: London Recap

I’ve written previously about the couple of days I spent at ExCeL in February, learning about Microsoft’s latest developments at the Ignite Tour and, a few weeks later I found myself back at the same venue, this time focusing on Amazon Web Services (AWS) at the London AWS Summit (four years since my last visit).

Even with a predominantly Microsoft-focused client base, there are situations where a multi-cloud solution is required and so, it makes sense for me to expand my knowledge to include Amazon’s cloud offerings. I may not have the detail and experience that I have with Microsoft Azure, but certainly enough to make an informed choice within my Architect role.

One of the first things I noticed is that, for Amazon, it’s all about the numbers. The AWS Summit had a lot of attendees – 12000+ were claimed, for more than 60 technical sessions supported by 98 sponsoring partners. Frankly, it felt to me that there were a few too many people there at times…

AWS is clearly growing – citing 41% growth comparing Q1 2019 with Q1 2018. And, whilst the comparisons with the industrial revolution and the LSE research that shows 95% of today’s startups would find traditional IT models limiting today were all good and valid, the keynote soon switched to focus on AWS claims of “more”. More services. More depth. More breadth.

There were some good customer slots in the keynote: Sainsbury’s Group CIO Phil Jordan and Group Digital Officer Clodagh Moriaty spoke about improving online experiences, integrating brands such as Nectar and Sainsbury’s, and using machine learning to re-plan retail space and to plan online deliveries. Ministry of Justice CDIO Tom Read talked about how the MOJ is moving to a microservice-based application architecture.

After the keynote, I immersed myself in technical sessions. In fact, I avoided the vendor booths completely because the room was absolutely packed when I tried to get near. My afternoon consisted of:

  • Driving digital transformation using artificial intelligence by Steven Bryen (@Steven_Bryen) and Bjoern Reinke.
  • AWS networking fundamentals by Perry Wald and Tom Adamski.
  • Creating resilience through destruction by Adrian Hornsby (@adhorn).
  • How to build an Alexa Skill in 30 minutes by Andrew Muttoni (@muttonia).

All of these were great technical sessions – and probably too much for a single blog post but, here goes anyway…

Driving digital transformation using artificial intelligence

Amazon thinks that driving better customer experience requires Artificial Intelligence (AI), specifically Machine Learning (ML). Using an old picture of London Underground workers sorting through used tickets in the 1950s to identify the most popular journeys, Steven Bryen suggested that more data leads to better analytics and better outcomes that can be applied in more ways (in a cyclical manner).

The term “artificial intelligence” has been used since John McCarthy coined it in 1955. The AWS view is that AI taking off because of:

  • Algorithms.
  • Data (specifically the ability to capture and store it at scale).
  • GPUs and acceleration.
  • Cloud computing.

Citing research from PwC [which I can’t find on the Internet], AWS claim that world GDP was $80Tn in 2018 and is expected to be $112Tn in 2030  ($15.7Tn of which can be attributed to AI).

Data science, artificial intelligence, machine learning and deep learning can be thought of as a series of concentric rings.

Machine learning can be supervised learning (betting better at finding targets); unsupervised (assume nothing and question everything); or reinforcement learning (rewarding high performing behaviour).

Amazon claims extensive AI experience through its own ML experience:

  • Recommendations Engine
  • Prime Air
  • Alexa
  • Go (checkoutless stores)
  • Robotic warehouses – taking trolleys to packer to scan and pack (using an IoT wristband to make sure robots avoid maintenance engineers).

Every day Amazon applies new AI/ML-based improvements to its business, at a global scale through AWS.

Challenges for organisations are that:

  • ML is rare
  • plus: Building and scaling ML technology is hard
  • plus: Deploying and operating models in production is time-consuming and expensive
  • equals: a lack of cost-effective easy-to-use and scalable ML services

Most time is spent getting data ready to get intelligence from it. Customers need a complete end-to-end ML stack and AWS provides that with edge technologies such as Greengrass for offline inference and modelling in SageMaker. The AWS view is that ML prediction becomes a RESTful API call.

With the scene set, Steven Bryen handed over to Bjoern Reinke, Drax Retail’s Director of Smart Metering.

Drax has converted former coal-fired power stations to use biomass: capturing carbon into biomass pellets, which are burned to create steam that drives turbines – representing 15% of the UK’s renewable energy.

Drax uses a systems thinking approach with systems of record, intelligence and engagement

System of intelligence need:

  • Trusted data.
  • Insight everywhere.
  • Enterprise automation.

Customers expect tailoring: efficiency; security; safety; and competitive advantage.

Systems of intelligence can be applied to team leaders, front line agents (so they already know that customer has just been online looking for a new tariff), leaders (for reliable data sources), and assistant-enabled recommendations (which are no longer futuristic).

Fragmented/conflicting data is pumped into a data lake from where ETL and data warehousing technologies are used for reporting and visualisation. But Drax also pull from the data lake to run analytics for data science (using Inawisdom technology).

The data science applications can monitor usage and see base load, holidays, etc. Then, they can look for anomalies – a deviation from an established time series. This might help to detect changes in tenants, etc. and the information can be surfaced to operations teams.

AWS networking fundamentals

After hearing how AWS can be used to drive insight into customer activities, the next session was back to pure tech. Not just tech but infrastructure (all be it as a service). The following notes cover off some AWS IaaS concepts and fundamentals.

Customers deploy into virtual private cloud (VPC) environments within AWS:

  • For demonstration purposes, a private address range (CIDR) was used – 172.31.0.0/16 (a private IP range from RFC 1918). Importantly, AWS ranges should be selected to avoid potential conflicts with on-premises infrastructure. Amazon recommends using /16 (65536 addresses) but network teams may suggest something smaller.
  • AWS is dual-stack (IPv4 and IPv6) so even if an IPv6 CIDR is used, infrastructure will have both IPv4 and IPv6 addresses.
  • Each VPC should be broken into availability zones (AZs), which are risk domains on different power grids/flood profiles and a subnet placed in each (e.g. 172.31.0.0/24, 172.31.1.0/24, 172.31.2.0/24).
  • Each VPC has a default routing table but an administrator can create and assign different routing tables to different subnets.

To connect to the Internet you will need a connection, a route and a public address:

  • Create a public subnet (one with public and private IP addresses).
  • Then, create an Internet Gateway (IGW).
  • Finally, Create a route so that the default gateway is the IGW (172.31.0.0/16 local and 0.0.0.0/0 igw_id).
  • Alternatively, create a private subnet and use a NAT gateway for outbound only traffic and direct responses (172.31.0.0/16 local and 0.0.0.0/0 nat_gw_id).

Moving on to network security:

  • Network Security Groups (NSGs) provide a stateful distributed firewall so a request from one direction automatically sets up permissions for a response from the other (avoiding the need to set up separate rules for inbound and outbound traffic).
    • Using an example VPC with 4 web servers and 3 back end servers:
      • Group into 2 security groups
      • Allow web traffic from anywhere to web servers (port 80 and source 0.0.0.0/0)
      • Only allow web servers to talk to back end servers (port 2345 and source security group ID)
  • Network Access Control Lists (NACLs) are stateless – they are just lists and need to be explicit to allow both directions.
  • Flow logs work at instance, subnet or VPC level and write output to S3 buckets or CloudWatch logs. They can be used for:
    • Visibility
    • Troubleshooting
    • Analysing traffic flow (no payload, just metadata)
      • Network interface
      • Source IP and port
      • Destination IP and port
      • Bytes
      • Condition (accept/reject)
  • DNS in a VPC is switched on by default for resolution and assigning hostnames (rather than just using IP addresses).
    • AWS also has the Route 53 service for customers who would like to manage their own DNS.

Finally, connectivity options include:

  • Peering for private communication between VPCs
    • Peering is 1:1 and can be in different regions but the CIDR must not overlap
    • Each VPC owner can send a request which is accepted by the owner on the other side. Then, update the routing tables on the other side.
    • Peering can get complex if there are many VPCs. There is also a limit of 125 peerings so a Transit Gateway can be used to act as a central point but there are some limitations around regions.
    • Each Transit Gateway can support up to 5000 connections.
  • AWS can be connected to on-premises infrastructure using a VPN or with AWS Direct Connect
    • A VPN is established with a customer gateway and a virtual private gateway is created on the VPC side of the connection.
      • Each connection has 2 tunnels (2 endpoints in different AZs).
      • Update the routing table to define how to reach on-premises networks.
    • Direct Connect
      • AWS services on public address space are outside the VPC.
      • Direct Connect locations have a customer or partner cage and an AWS cage.
      • Create a private virtual interface (VLAN) and a public virtual interface (VLAN) for access to VPC and to other AWS services.
      • A Direct Connect Gateway is used to connect to each VPC
    • Before Transit Gateway customers needed a VPN per VPC.
      • Now they can consolidate on-premises connectivity
      • For Direct Connect it’s possible to have a single tunnel with a Transit Gateway between the customer gateway and AWS.
  • Route 53 Resolver service can be used for DNS forwarding on-premises to AWS and vice versa.
  • VPC Sharing provides separation of resources with:
    • An Owner account to set up infrastructure/networking.
    • Subnets shared with other AWS accounts so they can deploy into the subnet.
  • Interface endpoints make an API look as if it’s part of an organisation’s VPC.
    • They override the public domain name for service.
    • Using a private link can only expose a specific service port and control the direction of communications and no longer care about IP addresses.
  • Amazon Global Accelerator brings traffic onto the AWS backbone close to end users and then uses that backbone to provide access to services.

Creating resilience through destruction

Adrian Horn presenting at AWS Summit London

One of the most interesting sessions I saw at the AWS Summit was Adrian Horn’s session that talked about deliberately breaking things to create resilience – which is effectively the infrastructure version of test-driven development (TDD), I guess…

Actually, Adrian made the point that it’s not so much the issues that bringing things down causes as the complexity of bringing them back up.

“Failures are a given and everything will eventually fail over time”

Werner Vogels, CTO, Amazon.com

We may break a system into microservices to scale but we also need to think about resilience: the ability for a system to handle and eventually recover from unexpected conditions.

This needs to consider a stack that includes:

  • People
  • Application
  • Network and Data
  • Infrastructure

And building confidence through testing only takes us so far. Adrian referred to another presentation, by Jesse Robbins, where he talks about creating resilience through destruction.

Firefighters train to build intuition – so they know what to do in the event of a real emergency. In IT, we have the concept of chaos engineering – deliberately injecting failures into an environment:

  • Start small and build confidence:
    • Application level
    • Host failure
    • Resource attacks (CPU, latency…)
    • Network attacks (dependencies, latency…)
    • Region attack
    • Human attack (remove a key resource)
  • Then, build resilient systems:
    • Steady state
    • Hypothesis
    • Design and run an experiment
    • Verify and learn
    • Fix
    • (maybe go back to experiment or to start)
  • And use bulkheads to isolate parts of the system (as in shipping).

Think about:

  • Software:
    • Certificate Expiry
    • Memory leaks
    • Licences
    • Versioning
  • Infrastructure:
    • Redundancy (multi-AZ)
    • Use of managed services
    • Bulkheads
    • Infrastructure as code
  • Application:
    • Timeouts
    • Retries with back-offs (not infinite retries)
    • Circuit breakers
    • Load shedding
    • Exception handing
  • Operations:
    • Monitoring and observability
    • Incident response
    • Measure, measure, measure
    • You build it, your run it

AWS’ Well Architected framework has been developed to help cloud architects build secure, high-performing, resilient, and efficient infrastructure for their applications, based on some of these principles.

Adrian then moved on to consider what a steady state looks like:

  • Normal behaviour of system
  • Business metric (e.g. pulse of Netflix – multiple clicks on play button if not working)
    • Amazon extra 100ms load time led to 1% drop in sales (Greg Linden)
    • Google extra 500ms of load time led to 20% fewer searches (Marissa Mayer)
    • Yahoo extra 400ms of load time caused 5-9% increase in back clicks (Nicole Sullivan)

He suggests asking questions about “what if?” and following some rules of thumb:

  • Start very small
  • As close as possible to production
  • Minimise the blast radius
  • Have an emergency stop
    • Be careful with state that can’t be rolled back (corrupt or incorrect data)

Use canary deployment with A-B testing via DNS or similar for chaos experiment (1%) or normal (99%).

Adrian then went on to demonstrate his approach to chaos engineering, including:

  • Fault injection queries for Amazon Aurora (can revert immediately)
    • Crash a master instance
    • Fail a replica
    • Disk failure
    • Disk congestion
  • DDoS yourself
  • Add latency to network
    • ~ tc qdisc add dev eth0 root netem delay 200ms
  • https://github.com/Netflix/SimianArmy
    • Shut down services randomly
    • Slow down performance
    • Check conformity
    • Break an entire region
    • etc.
  • The chaos toolkit
  • Gremin
    • Destruction as a service!
  • ToxiProxy
    • Sit between components and add “toxics” to test impact of issues
  • Kube-Money project (for Kubernetes)
  • Pumba (for Docker)
  • Thundra (for Lambda)

Use post mortems for correction of errors – the 5 whys. Also, understand that there is no isolated “cause” of an accident.

My notes don’t do Adrian’s talk justice – there’s so much more that I could pick up from re-watching his presentation. Adrian tweeted a link to his slides and code – if you’d like to know more, check them out:

How to build an Alexa Skill in 30 minutes

Spoiler: I didn’t have a working Alexa skill at the end of my 30 minutes… nevertheless, here’s some info to get you started!

Amazon’s view is that technology tries to constrain us. Things got better with mobile and voice is the next step forward. With voice, we can express ourselves without having to understand a user interface [except we do, because we have to know how to issue commands in a format that’s understood – that’s the voice UI!].

I get the point being made – to add an item to a to-do list involves several steps:

  • Find phone
  • Unlock phone
  • Find app
  • Add item
  • etc.

Or, you could just say (for example) “Alexa, ask Ocado to add tuna to my trolley”.

Alexa is a service in the AWS cloud that understands request and acts upon them. There are two components:

  • Alexa voice service – how a device manufacturer adds Alexa to its products.
  • Alexa Skills Kit – to create skills that make something happen (and there are currently more than 80,000 skills available).

An Alexa-enabled device only needs to know to wake up, then stream some “mumbo jumbo” to the cloud, at which point:

  • Automatic speech recognition with translate text to speech
  • Natural language understanding will infer intent (not just text, but understanding…)

Creating skills is requires two parts:

Alexa-hosted skills use Lambda under the hood and creating the skill involves:

  1. Give the skill a name.
  2. Choose the development model.
  3. Choose a hosting method.
  4. Create a skill.
  5. Test in a simulation environment.

Finally, some more links that may be useful:

In summary

Looking back, the technical sessions made my visit to the AWS Summit worthwhile but overall, I was a little disappointed, as this tweet suggests:

Would I recommend the AWS Summit to others? Maybe. Would I watch the keynote from home? No. Would I try to watch some more technical sessions? Absolutely, if they were of the quality I saw on the day. Would I bother to go to ExCeL with 12000 other delegates herded like cattle? Probably not…

WB-40 appearances

Several years ago, I met Matt Ballantine (@ballantine70), when he was working at Microsoft. Over the years, we’ve had many conversations in person, online and over social media and I’ve been listening to his WB-40 podcast with Chris Weston (@chrisweston) since they started it in 2016.

WB-40 Podcast logo

In recent months, I’ve been fortunate to feature a few times on the podcast:

Appearing on WB-40 (and on The Flexible Movement) has made me think a little about maybe starting a podcast of my own. James Bannan (@JamesBannan) and I had a podcast called Coalface Tech for a while a few years ago but we found working on opposite sides of the planet and recording decent audio challenging at the time. At the moment I struggle to write blog posts so, let’s see if that ever gets off the drawing board.

In the meantime, if you’re interested in the intersection of IT and business then I recommend checking out WB-40, which will also transition from its online form to the physical world next week, with a live event in London.