Do we need another as-a-service to describe functions?

Last week saw quarterly earnings reports for major cloud vendors and this tweet caught my eye:

You see, despite Azure growing by 93%, this suggests that Amazon has the cloud market sewn up. Except I’m not sure they do…

I think it would be interesting to see this separated into infrastructure-, platform- and software-as-a-service (IaaS/PaaS/SaaS). I suggest that would present three very different stories. And I’d expect that Amazon would only really be way out front for IaaS.

My friend and former colleague, Garry Martin (@GarryMartin) questioned the relevance of those “legacy” distinctions but I think they still have value today.

In the early days of what we now recognise as cloud computing, every vendor was applying their own brand of cloud-washing. It still happens today, with vendors claiming to offer IaaS when really they have a hosted service and a traditional delivery model.

Back in 2011, the US National Institute of Standards and Technology (NIST) defined cloud computing, including the service models of IaaS, PaaS and SaaS. Those service models, along with the (also abused) deployment models (public cloud, private cloud, etc.) have served us well but are they really legacy?

I don’t think they are. Six years is a long time in IT, let alone the cloud but I think IaaS, PaaS and SaaS are as relevant today as they were when NIST wrote their definition.

When asked how “serverless” technologies like AWS Lambda, Azure Functions or Google Cloud Functions fit in, I say they’re just PaaS. Done right.

Some people want to add another service model/definition for Function-as-a-Service (FaaS). But why? What value does it add? Functions are just PaaS but we’ve finally evolved to a place where we are moving past the point of caring about what the code runs on and letting the cloud manage that for us. That’s what PaaS has supposed to have been doing for years (after all, should I really need to define a number of instances to run my web application – that all sounds a bit like virtual machines to me…)

To my mind, “serverless” is just the ultimate platform as a service and we really don’t need another service model to describe it.

To quote a haiku from Onsi Fakhouri (@onsijoe):

“Here is my code
Run it in the cloud for me
I don’t care how”

Or, as Simon Wardley (@swardley) “fixed” this Cloud Foundry diagram:

Designing for failure does not necessarily mean multi-cloud

Earlier this week, Amazon Web Services’ S3 storage service suffered an outage that affected many websites (including popular sites to check if a website is down for everyone or just you!).

Unsurprisingly, this led to a lot of discussion about designing for failure – or not, it would seem in many cases, including the architecture behind Amazon’s own status pages:

The Amazon and Azure models are slightly different but in the past we’ve seen outages to the Azure identity system (for example) impact on other Microsoft services (Office 365). When that happened, Microsoft’s Office 365 status page didn’t update because of a caching/CDN issue. It seems Amazon didn’t learn from Microsoft’s mistakes!

Randy Bias (@RandyBias) is a former Director at OpenStack and a respected expert on many cloud concepts. Randy and I exchanged many tweets on the topic of the AWS outage but, after multiple replies, I thought a blog post might be more appropriate. You see, I hold the view that not all systems need to be highly available. Sometimes, failure is OK. It all comes down to requirements:

And, as my colleague Tim Siddle highlighted:

I agree. 100%.

So, what does that architecture look like? Well, it will vary according to the provider:

So, if we want to make sure our application can survive a region failure, there are ways to design around this. Just be ready for the solution we sold to the business based on using commodity cloud services to start to look rather expensive. Whereas on-premises we typically have two datacentres with resilient connections, then we’ll want to do the same in the cloud. But, just as not all systems are in all datacentres on-premises, that might also be the case in the cloud. If it’s a service for which some downtime can be tolerated, then we might not need to worry about a multi-region architecture. In cases where we’re not at all concerned about downtime we might not even use an availability set

Other times – i.e. if the application is a web service for which an outage would cause reputational or financial damage – we may have a requirement for higher availability.  That’s where so many of the services impacted by Tuesday’s AWS outage went wrong:

Of course, we might spread resources around regions for other reasons too – like placing them closer to users – but that comes back to my point about requirements. If there’s a requirement for fast, low-latency access then we need to design in the dedicated links (e.g. AWS Direct Connect or Azure ExpressRoute) and we’ll probably have more than one of them too, each terminating in a different region, with load balancers and all sorts of other considerations.

Because a cloud provider could be one of those single points of failure, many people are advocating multi-cloud architectures. But, if you think multi-region is expensive, get ready for some seriously complex architecture and associated costs in a multi-cloud environment. Just as in the on-premises world, many enterprises use a single managed services provider (albeit with multiple datacentres), in the cloud many of us will continue to use a single cloud provider.  Designing for failure does not necessarily mean multi-cloud.

Of course, a single-cloud solution has its risks. Randy is absolutely spot on in his reply below:

It could be argued that one man’s “lock-in” is another’s “making the most of our existing technology investments”. If I have a Microsoft Enterprise Agreement, I want to make sure that I use the software and services that I’m paying for. And running a parallel infrastructure on another cloud is probably not doing that. Not unless I can justify to the CFO why I’m running redundant systems just in case one goes down for a few hours.

That doesn’t mean we can avoid designing with the future in mind. We must always have an exit strategy and, where possible, think about designing systems with a level of abstraction to make them cloud-agnostic.

Ultimately though it all comes back to requirements – and the ability to pay. We might like an Aston Martin but if the budget is more BMW then we’ll need to make some compromises – with an associated risk, signed off by senior management, of course.

[Updated 2 March 2017 16:15 to include the Mark Twomey tweet that I missed out in the original edit]

Playing around with Azure Cognitive Services

I’ve been spending quite a bit of time recently getting more familiar with some of the advanced workloads in Microsoft Azure. After all, infrastructure as a service is commodity, so I’m looking at services that can be used to drive real value for my customers (more on that in another post…).

Yesterday, was our team meeting – with all but one of the risual Architects getting together, including some coaching from Microsoft around data and intelligence services. I was particularly taken with some of the demonstrations of Cognitive Services, so I set about getting some sample code to work for me…

Building the Intelligent Kiosk sample application

First up, I needed to install Visual Studio 2015 (Community Edition is fine) – it took a while, and needed admin credentials (so a visit to our support team) but eventually it installed on my PC.

Then, I downloaded the sample code for the “Intelligent Kiosk” from Github. F5 to build the solution told me that:

A project with an Output Type of Class Library cannot be started directly.

In order to debug this project, add an executable project to this solution which references the library project. Set the executable project as the startup project.

The Intelligent Kiosk sample code is a Universal Windows Platform (UWP) app, so I ignored that message, continued with the build, and tracked down the resulting IntelligentKioskSample.exe file. Trying to run that told me:

This application can only run in the context of an app container.

And StackOverflow told me that I need to sideload the app onto my PC, by creating a package to use locally.

Installing the Intelligent Kiosk sample application

The application package comes with a PowerShell script to install it (Add-AppDevPackage.ps1), but I found I needed to follow these steps:

  1. Enable developer mode in Windows 10 Settings
  2. Restart the PC
  3. Open a PowerShell session as an Administrator and run:

Show-WindowsDeveloperLicenseRegistration

Get-WindowsDeveloperLicense

Set-ExecutionPolicy unrestricted

.\Add-AppDevPackage.ps1

Now the app is ready and available via the Start Menu…

Running the Intelligent Kiosk sample application

  1. Get some API keys (for free) from the Microsoft Cognitive Services site.
  2. Run the Intelligent Kiosk app.
  3. Go to settings and paste in your API keys.
  4. Have some fun with the demos!
Demos in the Azure Cognitive Services Sample app
Intelligent Kiosk Demos
Azure Cognitive Services Emotion Detection
Emotion detection (web image)
Azure Cognitive Services Emotion Detection
Emotion detection (live image)
Azure Cognitive Services Face Detection
Face Detection
Azure Cognitive Services Mall Kiosk
Detect age and gender, recommend a product!

Further Reading

Why are the Microsoft Azure datacentre regions in Europe named as they are?

I’m often asked why Microsoft’s datacentre regions for Azure in Europe are named as they are:

  • The Netherlands is “West Europe”
  • Ireland is “North Europe”
  • (then there’s country-specific regions in UK and Germany too…)

But Ireland and the Netherlands are (sort of) on the same latitude. So how does that work?

MVP Martina Grom explains it like this:

I suspect the backstory behind the naming is more down to the way that the United Nations divides Europe up for statistical purposes [source: Wikipedia article on Northern Europe]:

Europe subregion map UN geoscheme

On this map, North Europe is the dark blue area (including the UK and Ireland), whilst West Europe is the cyan area from France across to Austria and Germany (including the Benelux countries).

It makes more sense when you see it like this!

Image credit: Kolja21 via Wikipedia (used under a Creative Commons Attribution 3.0 licence).

Migrating Azure virtual machines from ASM (Classic) to ARM (Resource Manager)

I was recently discussing Azure infrastructure services with a customer who has implemented a solution based on Azure Service Manager (ASM – also known as classic mode) but is now looking to move to Azure Resource Manager (ARM).

Moving to ARM has some significant benefits. For a start, we move to declarative, template-driven deployment (infrastructure as code). Under ASM we had programmatic infrastructure deployment where we wrote scripts to say “Dear Azure, here’s a list of everything I want you to do, in excruciating detail“ and deployment ran in serial. With ARM we say “Dear Azure, here’s what I want my environment to look like – go and make it happen” and, because Azure knows the dependencies (they are defined in the template), it can deploy resources in parallel:

  • If a resource is not present, it will be created.
  • If a resource is present but has a different configuration, it will be adjusted.
  • If a resource is present and correctly configured, it will be used.

ASM is not deprecated, but new features are coming to ARM and they won’t be back-ported. Even Azure AD now runs under ARM (one of the last services to come across), so there really is very little reason to use ASM.

But what if you already have an ASM infrastructure – how do you move to ARM? Christos Matskas (@christosmatskas) has a great post on options for migrating Azure VMs from ASM (v1) to ARM (v2) which talks about four methods, each with its own pros and cons:

  • ASM2ARM script (only one VM at a time; requires downtime)
  • Azure PowerShell and/or CLI (can be scripted and can roll-back; caveats and limitations around migrating whole vNets)
  • MigAz tool (comprehensive option that exports JSON templates too; some downtime required)
  • Azure Site Recovery (straightforward, good management; vs. setup time and downtime to migrate)

Full details are in Christos’ post, which is a great starting point for planning Azure VM migrations.

Short takes: what to do when Outlook won’t open HTTP(S) links; how to disable Outlook Clutter; and don’t run externally-facing mail servers in Azure!

Once again, my PC is running out of memory because of the number of open browser tabs, so I’ll convert some into a mini-blog post…

Outlook forgets how to open HTTP(S) links

I recently found that Outlook 2016 had “forgotten” what to do with HTTP(S) links – complaining that:

Something unexpected went wrong with this URL: […] Class not registered.

The fix was to reset my default browser in Windows. Even though I hadn’t changed it away from Edge, a Windows Update (I expect) had changed something and Edge needed to be reset as the default browser, after which Outlook was happy to open links to websites again.

Globally disable Outlook Clutter

I had a customer who moved to Exchange Online and then wanted to turn off the Clutter feature, because “people were complaining some of their email was being moved”.

Unfortunately, Clutter is set with a per-mailbox setting so to globally disable it you’ll need something like this:

get-mailbox | set-clutter -enable $false

That will work for existing mailboxes but what about new ones? Well, if you want to do make sure that Clutter remains “off”, then you’ll need a script to run on a regular basis and turn off Clutter for any new users that have been created – maybe using Azure Automation with Office 365?

Alternatively, you can create a transport rule to bypass Clutter.

Personally, I think this is the wrong choice – the answer isn’t to make software work the way we used to – it’s to lead the cultural change to start using new features and functionality to help us become more productive. Regardless, Clutter will soon be replaced by the Focused Inbox (as in the Outlook mobile app).

Don’t run externally-facing mail servers in Azure

I recently came across a problem when running an Exchange Hybrid server on a VM in Azure. Whilst sending mail directly outbound (i.e. not via Office 365 and hence Exchange Online Protection), consumer ISPs like Talk Talk were refusing our email.  I tried adding PTR records in DNS for the mail server but then I found the real issue – Azure adds it’s IP addresses to public block lists in order to protect against abuse.

It turns out that Microsoft’s documentation on sending e-mail from Azure compute resource to external domains is very clear:

“[…] the Azure compute IP address blocks are added to public block lists (such as the Spamhaus PBL).  There are no exceptions to this policy”

and the recommended approach is to use a mail relay – such as Exchange Online Protection or a third party service like SendGrid. Full details can be found in the Microsoft link above.

Inside the Microsoft datacentres

A datacentre is just a datacentre isn’t it? After all, isn’t it just a bigger version of the server room in the basement? But what about the huge datacentres that run cloud services? What’s it like inside the Microsoft datacentres that host Azure, Office 365, etc.?

Last week, Microsoft’s Modern Workplace webcast titled “An Inside Look at Your Secure Cloud” gave a sneak peek inside some of the Microsoft datacentres – comparing various generations and showing the improvements along the way.  And, as you might expect, these are the very definition of operating at scale…

As Doug Hauger (General Manager for National Cloud Programs at Microsoft) explained, organisations look to use a cloud datacentre for scale and professionalism.  Anyone can run a datacentre but the Microsoft Cloud is about robustness and security – whether that’s how staff are monitored or the physical and logical security models.

Each time Microsoft moves into a new region (like the two regions that opened in the UK earlier this month) there’s not just one super-scale datacentre but multiple facilities per region, providing redundancy and disaster recovery capability. Each facility has multiple power sources and multiple network ingress and egress points. Then there’s the investment Microsoft is making in physical infrastructure around the world – for example the joint project with Facebook for a new Europe-North America undersea cable (MAREA).

Each time Microsoft considers expanding into a new market they perform a business case analysis on the potential opportunity, considering the scale that they will go in at (tens of thousands of servers). Microsoft now has more than 100 datacentres in 30 regions around the world (with four more under construction). Because of the huge range of locations covered, Microsoft is now the industry leader for compliance and certification – whether that is meeting global or local requirements. Then there is the question of meeting customer needs around data residency, compliance, etc. (for example with the German datacentres that operate under a unique data trustee model in partnership with Deutsche Telekom).

With its cloud datacentres, Microsoft is aiming to meet customer needs around digital transformation, where the question is no longer “why should I go to the cloud” but one of “how to innovate more quickly in the cloud”. That’s what drives the agenda for where to geographically expand, where enhance scalability, etc.

Despite the question I posed in the opening paragraph of this post, a true datacentre is worlds apart from the typical server room in the basement (or wherever). The last time I got to visit a datacentre was when I was working at Fujitsu and I visited the London North facility, an Uptime Institute Tier III datacentre that won awards when it was built in 2008. Seeing the scale at which a modern datacentre operates is impressive. Then ramp it up some more for the big cloud service providers.

In the webcast, Christian Belady (General Manager Cloud Infrastructure Strategy and Architectures at Microsoft) explained that datacentres are the foundation of the Internet – they are where all the cloud services are served from (whether that is Microsoft services, or those provided by other major players).

There are several layers of physical security from the outside fence in, screening people, controlling access to parts of the buildings, even to cabinets themselves with critical customer data in locked cabinets covered with video surveillance. Used disks are destroyed, being wiped and then crushed on site! The physical security surpasses anything provided for on-premises servers and the logical security continues that defence in depth.

Each custom-built server is actually 2 computers with 10s of 1000s of computers per room, 100s of 1000s per datacentre, each datacentre the size of 20-30 football fields. Look at the racks and you can see the attention to detail – keeping things orderly not only adds to operational efficiency but it looks good too! The enterprise servers that most of us run on-premises have plastic bezels to make them look pleasant. Instead, Microsoft’s servers have focused on eliminating anything that has no useful function…

Each iteration of datacentres becomes more industrialised – with improvements to factors such as cooling (which is one of the biggest power usage factors).

A generation 2 datacentre from around 2007 has a Power Usage Effectiveness (PUE) efficiency score of 1.4-1.6 (for comparison, the Fujitsu facility I mentioned earlier has a PUE of 1.4 but a typical enterprise datacentre from the 2000s with a normal raised floor would have a PUE of 2-3). Cool and hot aisles are used with hot air returned to coolers and recirculated. Microsoft then raised the temperature of their servers to a level that is acceptable (working with manufacturers), rather than the lower levels they used to have (reducing the cooling demands).

Moving on to generation 4, efficiency is improved further (a PUE of 1.1-1.2), eliminating chillers by removing roofs, driving down costs and using outside air to chill. Containers use the outside cooling and a system of adiabatic cooling, spraying mist into the air to cool down – which evaporates before it hits the server”. Such datacentres use a lot less water too (compared with older styles of datacentre).

With the latest (generation 5) datacentres, further improvements are made, culminating the features of other generations – learning and adapting. The PUE is now down to 1.1 (and below at certain times of year) with running costs also improved. There are still hot a cold aisles but no raise floor and, instead of outside air, the datacentres use a closed liquid loop system (no chiller – cool the water outside) – and that water doesn’t need to be potable.

The actual datacentre design changes for each facility, based on the geography and the environmental impact. Backup power generation is a key component in the design, with several days of fuel onsite and contracts to keep bringing more fuel in. Power is often sustainably sourced, be that cheap and carbon-free hydro-electric power, wind or solar. Microsoft Research is even working on a tidal-powered under-sea datacentre (Project Natick).

Inside the Microsoft datacentres is very industrial. Whole racks are brought in (pre-tested), rather than single servers and, as previously mentioned, Microsoft design and build the servers for use at scale, stripping out enterprise features and retaining only what’s needed for the Microsoft environment.

Whilst I’ve worked with customers who have visited Microsoft datacentres in Dublin, it seems unlikely that I’ll ever get the chance. Watching the Modern Workplace webcast gave me a fascinating look at how Microsoft operates datacentres at scale though – and it truly is awe-inspiring. To find out more, visit the Microsoft website.

Some thoughts on naming Azure resources

During a recent project, I was caught out by a lack of consistency in naming for Azure resources (and an inability to rename some of them afterwards). Some resources had underscores in their names (_), some had hyphens (-) – and then there were the inconsistencies in case. For someone who generally pays attention to details like this, I found it all very frustrating.

So I started to look into what a standard for naming Azure resources might look like (I also asked Microsoft for advice). The general advice I received was, “stick to numbers and letters – no special characters because some resources won’t accept them”. Then, whilst naming a subnet and starting with a number I found that subnet names can’t start with a number or a space.

So, lets make that “use letters and numbers only, in lower case, and always starting with a letter”.

Now, consider uniqueness – some resources have an associated DNS name (e.g. *.cloudapp.net) that needs to be available for use.

I generally advise against including organisation names in resources like server names (because resources often outlive organisation names) but, in this case, the organisation name is likely to provide some uniqueness. So, let’s try “use letters and numbers only, in lower case, prefixed with an abbreviation for the organisation name, starting with a letter”.

Then, lets think about the naming for the resources themselves – a two letter code for the resource type (rr) and a suitable number of digits to count the instances (nn) – something like:

orgrrnn

This has two characters for the digits on the end, though three, or even four, may be better depending on the size of the organisation (remember to plan for growth!).  You’ll also need to consider the total length of the name – between 3 and 15 characters appears to be the sweet spot (some may be longer, few may be shorter).

Resource types might be:

Two-letter code Meaning
ad Active Directory
cs Cloud Service
db Database
gw Gateway
ln Local Network
ms Media Service
rg Resource Group
sg Storage Account
sp App Service Plan
sn Subnet
tm Traffic Manager
vm Virtual Machine
vn Virtual Network
wa Web App (App Service)

For my recent batch of resources when I was studying for an exam, that led to names like:

  • exam70534ms01 (Media Service 01)
  • exam70534db02 (Database 02)

(illustrated here for both ASM and ARM)

Azure resource naming in ASM (Class)

Azure resource naming in ARM

That looks to me to be unique, consistent and meaningful, but I’m sure there are other considerations too! Indeed the Azure documentation has some quite complex recommended naming conventions for Azure resources. My concerns with these are that they are not consistent (remember that not all resources can include certain characters), whereas the naming approach I’ve outlined in this post is.

Preparation notes for Microsoft exam 70-534: Architecting Microsoft Azure Solutions

I’ve been preparing for Microsoft exam 70-534: Architecting Microsoft Azure Solutions. At the time of writing, I haven’t yet sat the exam (so this post doesn’t breach any NDA) but the notes that follow were taken as I studied.

Resources I used included:

  • Microsoft Association of Practicing Architects (MAPA) bootcamp (unfortunately the delivery suffered from issues with the streaming media platform and the practical labs are difficult to follow, partly due to changes in the platform).
  • Hands-on time with Azure – though the exam is still mostly based on the old Classic/Azure Service Manager (ASM) model, so I found myself going back to learn things in ASM that I do differently under Azure Resource Manager (ARM).
  • The Microsoft Press exam preparation book, which contains a lot more detail and is pretty readable (or it would be if I wasn’t trying to read it in PDF form – sometimes paperback books are better for flicking back and forward!).
  • A free Azure subscription (either sign up for a one-off £125 credit for a month, or you can get £20 each month for 12 months through Visual Studio Dev Essentials).

Other potentially-useful references include:

The rest of this post contains my study notes – which may be useful to others but will almost certainly not be enough to pass the exam (i.e. you’ll need to read around the topics too – the Azure documentation is generally very good).

Note that Microsoft Azure is a fast-moving landscape – these notes are based on studying the exam curriculum and may not be current – refer to the Azure documentation for the latest position.

Azure Networking

  • Virtual networks (VNets) are used to manage networking in Azure. Can only exist in one Azure region.
  • CIDR notation is used to describe networks.
  • Use different subnets to partition network – e.g. Internet-facing web servers from internal traffic; different environments.
  • Subnet has to be part of VNet range with no overlap.
  • All virtual machines (VMs) in a VNet can communicate (by default) but anything outside cannot talk (by default) – so VNet is default network boundary.
  • In ASM, every VM has an associated cloud service (with its own name @cloudapp.net). Without subnets the VMs can only communicate via a public IP. If multiple cloud services are on same VNet then VMs can communicate using private IP.
  • Endpoints are used to manage connections: internal (private) endpoint listening on a given port (e.g. for RDP on 3389); external (public) endpoint on defined port number – therefore go to a particular server, rather than just to the cloud service.
    • Public from anywhere on the Internet; private only within the cloud service/VNet.
  • Dynamic IP (DIP) is the private IP associated with a VM; only resolvable inside the VNet – external access needs a public IP. Can chose an IP address to use – and will be reserved.
  • Virtual IP (VIP) – assigned to a cloud service – static public IP for as long as at least one VM running inside the cloud service.
  • Instance Level Public IP (ILPIP) – for direct connection to Azure VM from Internet (not via the cloud service); public IP attached to a VM. In this configuration, whatever ports open on the VM are open to the Internet – effectively bypassing the security of the VNet.
  • Use a VNet-to-VNet VPN to create a tunnel between VNets in different regions. This extends VNets to appear as if they were one.
  • Site-to-site VPN to create tunnel between on-premises network and Azure VNet. Uses persistent hardware on-premises.
  • Point-to-site VPN to create tunnel from individual computers to an Azure VNet. Software-based.
  • Multi-site VPN is a combination of the other methods, combined.
  • Azure ExpressRoute avoids routing via ISP – effectively a dedicated link from customer datacentre to Azure region, bypassing ISP. High throughput, low-latency and no effect on Internet link).
    • ExpressRoute Providers provide point-to-point Ethernet of connect via a cloud exchange. BGP sessions with edge routers on customer site. 200Mbps/500Mbps/1Gbps/10Gbps.
    • Can use for Azure computing (IaaS); Azure public services (web apps, etc. – PaaS) or Office 365 (SaaS).
  • Secure network with Network Access Control Lists (ACLs), attached to a VIP – define what traffic will be allowed/denied to/from the VIP (i.e. the cloud service). Lower number rule has higher priority. First match is executed and rest are ignored.
    • If there is no ACL – all traffic is allowed (whatever endpoints are open will allow access); if there is one or more permit, deny all others; if there is one or more deny, allow all others; combination of permit and deny to define a specific IP range.
    • Network ACL affects Incoming traffic only.
  • Network Security Groups (NSGs) are attached to a VM or a subnet and act on both inbound and outbound traffic.
    • By default all inbound access is blocked inbound rules (allow inbound within VNet and from Azure LB; deny all other inbound – rules 65000/65001/65500).
    • Outbound defaults allow outbound within VNet outbound, Internet outbound (0.0.0.0/0) and deny all others – rules 65000/65001/65500.
    • Default rules can’t be edited but can be overridden with higher priority rules.
  • Can only use Network ACLs or NSGs – not both together.
  • VMs can have multiple NICs in different subnets – i.e. dual-homed machine.

Azure Virtual Machines

  • Azure Hypervisor is similar to Hyper-V (but not the same).
  • Different sizes of VMs are available.
  • VMs are isolated at network and execution level – Azure customers never get access to the hypervisor – only to the VM layer.
  • Use Windows Server 2008 onwards or Linux: OpenSUSE; SUSE Enterprise Linux; CentOS; Ubuntu 12.04; Oracle Enterprise Linux; CoreOS; OpenLogic; RHEL
  • Basic and Standard service tiers – different machine types available:
    • General Purpose: A0-A4 Basic; A5-A7 Standard; A8-A9 Network Optimised (10Gbps networking); A10-A11 Compute Intensive (high end CPUs)
    • D1-D4, D11-D14 with SSD temp storage.
    • DS1-DS4, DS11-14 with premium (SSD) storage.
    • G1-G5 (and GS) with local SSD and lots of RAM.
    • F and N too

  • Every Azure VM has temporary storage drive (D:) – lost when VM is moved/restarted.
  • VMs may be attached to data disks that persist across VM restarts/redeployments and are locally replicated in-region (and beyond if specified).
  • Can use gallery images or create custom images (to meet custom requirements, e.g. with certain software pre-installed).
  • OS disk always has caching, default Read/Write (data disk caching is optional, default none) – changes need a reboot.
  • Can create a bootable image from an OS disk (not data disk).
  • Can change caching on data disk without reboot.
  • OS disk max 127GB, data disk max 1TB.
  • Only charged for storage used (regardless of what is provisioned).
  • Can take VHDs from on-premises: (Windows Server 2008 R2 SP1 or later), sysprep then upload with Add-AzureVhd -Destination storageaccount/container/name.vhd -LocalFilePath localfile.vhd; for Linux install WALinuxAgent (different preparation for different distributions).
  • Tell cloud service to load balance an endpoint to split load between VMs. With ARM there is the option to define a separate Load Balancer.
  • Encryption at rest for data disks requires third party applications (encryption is in preview though…).
  • Availability set: 2 or more VMs distributed across fault domains and upgrade domains for SLA of 99.95% (no SLA for single VMs).
  • Auto-scaling based on thresholds (mix/max number of instances, CPU utilisation, queue length – between web and worker roles) or time schedule (also time to wait before adding/removing more instances – AKA cooldown period). Needs at least 2 VMs in an availability set.
  • Basic VMs have no load balancing or auto-scaling.

Azure Storage Service

  • Blob, table, or queue storage (plus file storage for legacy apps) encapsulated inside a storage account.
    • Two types: Standard/Premium – essentially HDD/SSD.
    • Up to 500TB per storage account – can create multiple accounts.
  • Data stored in multiple locations (minimum 3 copies).
    • LRS (Locally Redundant Storage) synchronously replicates 3 copies data in separate fault and update domains. Use for: low cost; high throughput (less replication); data sovereignty concerns re: transfer out of region. If region goes down, so do all copies.
    • ZRS (Zone Redundant Storage) also 3 copies but in at least 2 facilities (1 or 2 regions). Data durable in case of facility failure.
    • GRS (Globally Redundant Storage) – 6 copies (3 copies in primary region asynchronously replicated to 3 more copies in a secondary region). Data still safe in a secondary region but cannot be read (unless Azure flips primary and secondary in event of catastrophic failure).
    • RA-GRS (Read Access Geo Redundant Storage) – read from secondary copy. -secondary.cloud.core.windows.net domain name.
  • More copies and more bandwidth is more cost! Also:
    • GRS ingress max 10 Gibps (20 egress) but does not impact latency of transactions made to primary location.
    • LRS ingress max 20 Gibps (30 egress)
  • File storage – mounted by servers and accessed via API. Provides shared storage for applications using SMB 2.1. Use cases:
    • On-premises apps that rely on file shares migrated to Azure VMs or cloud services without app re-write.
    • Storing shared application settings (e.g. config files) or diagnostc data like logs, metrics and crash dumps.
    • Tools and utils for developing or administering Azure VMs or cloud services.
    • Create shares inside storage accounts – up to 5TB per share, 1TB per file. Unlimited total number of files and folders.
    • https://storageaccountname.file.core.windows.net/sharename/foldername/foldername/filename
  • Blob storage: Not a file system – an object store.
    • Create containers inside storage accounts with up to 500TB data per container
    • https://storageaccountname.blob.core.windows.net/containername/blobname
    • Block blobs, with block ID; uploaded and then committed – unless committed doesn’t become part of the blob: max 64MB per upload (blocks <=4MB), max 200GB per blob; Can upload in parallel, better for large blogs (generally) and for sequential streaming of data.
    • Page blobs – collection of 512byte pages. Max size set during creation and initialisation (up to 1TB). Write by offset and range – instantly committed. Overwrite single page or up to 4MB at once; Generally used for random read/write operations (e.g. disks in VMs). Page blobs can be created on premium storage for higher IOPs.
    • Access control is via 512bit keys (secret key – used in API calls to sign requests) – two keys so can maintain connectivity whilst regenerate another (i.e. during key rotation).
    • Can have full public read access for anonymous access to blobs in a container; public read access for blobs only (but not list the blobs in the container); no public read access (default – only signed requests allowed); shared access signature – signed URL for access including permissions, start time and expiry time.
    • Lease blob for atomic operations – lease for 15-60 seconds (or infinite). Acquire/renew/change/release (immediately)/break (at lease end).
    • Snapshots – used to create a read-only copy of a blob (multiple snapshots possible but cannot outlive the original blob – i.e. deleting blob deletes the snapshots); charges based on difference.
    • Copy blob to any container within the same storage account (e.g. between environments).
  • Table storage:
    • Store data for simple query – NoSQL key-value store – no locks, joins, validation.
    • http://storageaccountname.table.core.windows.net/tablename
    • Generally, use row key to retrieve data.
    • Can partition tables and generate a partition key.
    • Use shared access signatures for querying/adding/updating/deleting/upserting (insert if does not already exist, else update) table entries
  • Queue storage:
    • Store and access messages through HTTP/HTTPS calls.
    • Each queue entry up to 64KB in size.
    • Store messages up to 100TB.
    • Use for an asynchronous list for processing; messaging layer between applications (avoid handshaking – just add to or consume from the queue); or messaging between web and worker roles.
    • http://storageaccountname.queue.core.windows.net/queuename
    • Operations to put (add), get (which makes message invisible), peek (get first entry without making invisible), delete, clear (all), update (visibility timeout or contents) for messages.
  • Pricing based on storage (per GB/month); replication type (LRS/ZRS/GRS/RA-GRS); bandwidth (ingress is free; egress charged per GB); requests/transactions.

Web Apps

  • Web Apps are available in 5 tiers: free/shared/basic/standard/premium.
  • These tiers affect: the maximum number of web/mobile/API apps (10/100/unlimited/unlimited/unlimited), logic apps (10/10/10/20 per core/20 per core, integration options (dev/test up to basic; Standard connectors for Standard; Premium Connectors and BizTalk Services for premium), disk space (1GB/1GB/10GB/50GB/500GB), maximum instances (-/-/3/10/50), App Service environments (Premium only), SLA (Free/shared none; Basic 99.9; Standard and Premium 99.95%)
  • Resource Group and Web Hosting Plan are used to group websites and other resources in a single view; can also add databases and other resources; deleting a resource group will delete all of the resources in it.
  • Instance types:
    • Free F1.
    • Shared D1.
    • Basic B1-B3 1 core, 1.75GB RAM, 10GB storage x2 cores and RAM (2/3.5; 4/7) – VMs running web apps.
    • Standard S1-S3 same cores and RAM but more storage (50GB).
    • Premium P1-P4 same again but 500GB storage (P4 is 8 cores, 14GB RAM).
  • Other things to configure:
    • .NET Framework version.
    • PHP version (or off).
    • Java version (or off) – use web container version to chose between Tomcat and Jetty; enabling Java disables .NET, PHP and Python.
    • Python version (or off).
  • Scale web apps by moving up plans: Free-Shared-Basic-Standard – changes apply in seconds and affect all websites in web hosting plan. No real scaling for Free or Shared plans. Basic can change instance size and count. Standard can autoscale based on schedule or CPU – min/max instances (checked every 5 mins).
  • Scale database separately.
  • Deployment pipeline can be automated and can flip environments when move from staging to production (flips virtual IP). Can flip back if there are issues.
  • SSL certificates – can add own custom certs (2 options – server name indication with multiple SSL certs on a single VM; or IP SSL for older browsers but only one SSL cert for IP address).
  • Site extensions – no RDP access to the VM, so tools for website: Visual Studio Online for viewing code or phpMyAdmin.
  • Webjobs allow running programs or scripts on website (like cron in Linux or scheduled task in Windows) – one time, schedules or recurring.
  • Can use .cmd, .bat or .exe; .ps1, .sh., php, .py, .js
  • Monitoring web app via metrics in the portal.

Cloud Services

  • For more complex, multi-tier apps.
    • Web role with IIS
    • Worker role for back-end (synchronous, perpetual tasks – independent of user interaction; uses polling, listening or third party process patterns).
  • Upload code and Azure manages infrastructure (provisioning, load balancing, availability, monitoring, patch management, updates, hardware failures…)
  • 99.95% SLA (min 2 role machines)
  • Auto-scale based on CPU or queue.
  • Communicate via internal endpoints, Azure storage queues, Azure Service Bus (pub/sub model – service bus creates a topic, published by web role and worker role subscriber is notified).
  • Availability: fault domain (physical – power, network, etc.) – cannot control but can programmatically query to find out which domain a service is running in. In ASM, normally 0 or 1. ASM automatically distributes VMs across fault domains.
  • Upgrade domain (logical – services stopped one domain at a time) – default is 5, can be changed.
    If have web and worker roles, automatically placed in Availability set.
  • Azure Service Definition Schema (.csdef file) has definitions for cloud service (number of web/worker roles, communications, etc.), service endpoints, config for the service – changes required restart of services.
  • Azure Service Configuration Schema (.cscfg file) runtime components, number of VMs per web/worker role and size etc. – changes do not require service restart.
  • Deployment pipeline as for Web Apps.

Azure Active Directory

  • Identity and Access Management in the cloud – provided as a service.
  • Optionally integrate with on-premises AD.
  • Integrate with SaaS (e.g. Office 365).
  • Use cases: system to take care of authentication for application in the cloud; “same sign-on” for applications on-premises and cloud; federation to avoid concerns re: syncing passwords and avoid multiple logins to different apps (even with same sign-on) – provide single sign-on; SSO for 1000s of third-party applications. Effectively, if sync password then same sign-on, if no password sync then single sign-on.
  • Can also enable Multi-Factor Authentication (MFA) for Azure AD and therefore add MFA to third party apps.
  • Directory integration with Azure Active Directory Synchronization Tool (DirSync) or Azure AD Sync. Use Azure Active Directory Connect instead.
  • Can also use Forefront Identity Manager 2010 R2 (or Microsoft Identity Manager?) – originally was needed if sync multiple ADs.
  • Each directory gets a DNS name at .onmicrosoft.com. Also possible to use custom domains (verify domains in DNS).
  • Supports WS-Federation (SAML token format); OAuth 2.0; OpenID Connect; SAML 2.0.

Role-based access control

  • Role = collection of actions that can be performed on Azure resources.
  • Users for RBAC are from the associated Azure AD.
  • Roles can be assigned to external account users by invite.
  • Roles can be assigned to Azure AD security groups (recommended practice, rather than direct role assignment).
  • Roles can also be assigned for Resource Groups (resources inherit access from subscription-Resource Group-Resource).
  • Built-in roles: Owner (create and manage all types of resource); Reader (read all types of resource); Contributor (manage everything except access). Lots of other roles built on this construct – e.g. Virtual Network Contributor.

Azure SQL Database

  • Relational database service as a service (PaaS) – up to 500 GB per database.
  • Easy provisioning, automatic HA, load balancing, built-in management portal, scalability, use existing skills to deploy database, patching, etc. taken care of so less time to manage, easy sync with offline data.
  • It is not same as SQL Server on a VM though!
    • Unsupported features may have corresponding features in Azure; some are just not available.
  • Performance model with different tiers: Basic, then Standard S0-S3, Premium P1-P2, P4, P6 (formerly P3).
    • Measured in Database Thoughput Units (DTUs) – standardised model to help sizing (relative model [like ACU for VMs]).
    • Only committing to transactions per hour in Basic, per minute in Standard, per second in Premium.
  • Scaling Azure SQL: Federation is deprecated; Custom Sharding (create multiple database and use application logic to separate, e.g. based on customer ID); Elastic Scale (application doesn’t need to be so smart, endpoint is same but multiple applications).
  • Backups:
    • SQL database creates automatic backup for active database; at least 3 replicas at any one time – one primary replica and two or more secondaries (more if using GRS).
    • Can restore to point-in-time (self-service capability to restore from automated system – creates new database on same server – zero-cost/zero-admin – number of days depends on service tier – 7, 14, 35 days for basic/standard/premium), or geo-restore (restore from geo-redundant backup to any server in any region.
    • Automatically enabled for all tiers at no extra cost – helps when there is a region outage – estimated recovery time <12h RPO <1h).
  • Also standard geo-replication (protect app from regional outage – one secondary database in Microsoft-defined paired region; secondary is visible but can’t connect to it until failover occurs – discount for secondary DB as offline until failover – standard/premium only with ERT <30s RPO <5s) and active geo-replication (database redundancy within different regions – up to 4 readable secondary servers – asynchronous replication of committed transactions from one DB to another; for write-intensive applications – e.g. load balancing for read-only workloads – premium only with ERT <30s RPO <5s).
    • Regional disaster – Geo Restore, Standard or Active Geo-Replication.
    • Online application upgrade – Active Geo replication.
    • Online application relocation – Active Geo replication.
    • Read load balancing – Active Geo replication.
  • Security: only available via TCP 1433 – blocked by default – define firewall rules at server and database level to open up (i.e. to own IP address). Can define firewall rules programmatically with T-SQL, REST API and Azure PowerShell.
  • Data encrypted on wire – SSL required all the time
  • Data encrypted at rest – encryption with transparent data encryption – real-time I/O encryption/decryption for data and log files.
  • Only supports SQL Server authentication or Azure AD authentication – i.e. no Windows authentication.
  • First user created (master database principal) cannot be altered or dropped; can configure user-level permissions by logging on to the database and issuing SQL commands.
  • Pricing: DB size plus outbound data transfers (per database, per month) – per hour pricing, so drop DTUs at quiet time.

Azure Mobile Service

  • Cross-platform app development service (PaaS).
  • Mobile apps need to be cross-platform, with cloud storage, ID management, database integration and push notifications.
  • Azure Mobile Services provides mobile back-end as a service (MBaaS).
  • Easily connect to SaaS APIs – e.g. Facebook, Salesforce, etc.
  • Auto-scaling based on incoming customer load.
  • User authentication taken care of by the service.
  • Push notifications to millions in seconds.
  • Offline-ready apps with sync capability.

Azure Content Delivery Network (CDN)

  • Caching public objects from a storage account at point of presence (POP) for faster access close to users (and to scale when a lot of traffic hits).
  • Content served from local edge location. If content not there (first serve), it fetches information from the origin and caches locally.
  • Drastic reduction in traffic on original content (so faster access and more scalable!)
    Use a CDN for lower latency, higher throughput, improved performance!
  • POP locations separate to Azure regions – not full-fledged DCs.
  • CDN origin can be Azure Storage, Apps, Cloud Services or Media Services (including live streaming) – or a custom origin on any web server.
  • CDN Edge is a cache – not a permanent store.
  • Anycast protocol is used to route user to closest endpoint.
  • Create a CDN endpoint: http://cdnname.azureedge.net/
  • Change website code to point to the CDN. Route dynamic content to origin, static to CDN.
  • Can set a custom domain too (e.g. cdn.domain.com) – avoid browser warnings about content from other domains.
  • Can also enable HTTPS – need to upload the SSL certificate.
  • Default cache is 72 hours – cache control header can be used to control (any value >300s). Use to ensure not serving stale content.
  • Use CDN to cache images, scripts, CSS from Azure Cloud Service but have to provide using HTTP on port 80.
  • Pricing based on bandwidth (between edge and origin) and requests.

Azure Traffic Manager

  • DNS-based routing for infrastructure. Route to different regions, monitoring health of endpoints (HTTP checks) to assist with DR. Many routing policies.
  • Create a Traffic Manager endpoint and route to this via DNS.
  • Options include failover load balancing (re-route based on availability, with priority list – 100% of traffic to one endpoint – used for DR/BC rather than scaling); round robin load balancing (shared across various endpoints in rotation – but only to healthy endpoints cf. DNS RR); Weighted round robin load balancing (use weight to distribute traffic between endpoints); performance load balancing (based on latency times).
  • Different to traditional load balancer in that it is DNS-based – user request is direct to endpoint, not through load balancer. Also, note that traffic is direct to web servers – not to Edge locations as in CDN.
  • Pay per DNS request resolved (TTL will keep this down) and per health-check configured.

Azure Monitoring

  • Diagnostic tasks may include performance measurement, troubleshooting and debugging, capacity planning, traffic analysis, billing and auditing.
  • Monitor via portal; Visual Studio (plugins to parse logs, etc.) or third party tools.
  • Azure management services to manage alerts or view operational logs. Create alerts based on metrics and thresholds (and average to smooth out spikes) and send email to service admins and co-admins or to a specific address.
  • Operational logs are service requests – operation, timestamped, by whom.
  • Visual Studio 2013 has Azure SDK for managing Azure services. Some limitations: with remote debugging cannot have more than 25 role instances in a cloud service.
  • Azure Redis cache monitoring allows diagnostic data stored in storage account – enable desired chart from Redis cache blade to display the metric blade for that chart.
  • System Center 2012 R2 can also monitor, provision, configure, automate, protect and self-service Azure and on-premises.
  • Third party tools like New Relic and AppDynamics.
  • For websites there are application diagnostic logs and site diagnostic logs (3 types: web server logging; detailed error messages; failed request tracing) – access via Visual Studio, PowerShell or portal. Kudu dashboard at https://sitename.scm.azurewebsites.net.
  • View streaming log files (i.e. just see the end): Get-AzureWebsiteLog -Name "sitename" -Tail -Path http
  • View only the error logs: Get-AzureWebsiteLog -Name "sitename" -Tail -Message Error
  • Options include -ListPath (to list log paths) -Message <string> -Name <string> -Path (defaults to root) -Slot <string> -Tail (to stream instead of downloading entire log)
  • Can also turn on diagnostics on storage accounts.

Azure HD Insight

  • Microsoft Implementation of Hadoop – create clusters in minutes (Windows or Linux); pay per use (no need to leave running); use blob storage as storage layer and Excel to visualise the data.
  • Hadoop uses divide and conquer approach to solving big data problems (chunking): processes the data, then combines it again – using HDFS and MapReduce components.
  • Provision cluster, take large data set (e.g. search engine queries) on master node, distributed to processing nodes (Map). Reduce collects results and collates.
  • Hybrid Hadoop – e.g. for organisations that offer analytics services – burst to cloud…
  • Either site-to-site VPN on-premises to Azure, or ExpressRoute.
  • Supports Storm and HBase clusters natively – can install other software via custom script.
  • Connectors in WebApp (Standard and Premium) – connect to other services (e.g. Azure HDInsight).

High Performance Computing (HPC)

  • HPC not the same as big data:
    • Big data analytics is usually bounded by data volumes and so network IO.
    • HPC usually CPU-bounded.
  • HPC good for financial modelling, media encoding, video and image rendering, smaller compter-aided engineering models, etc.
  • HPC instances are A8/9 (network optimised – high-bandwidth RDMA network 32Gbps within cloud service as well as 10Gbps Ethernet to other services) and A10/11 (compute intensive).
  • Both 8/16 cores, 56/112GB RAM, 382GiB disk.
  • Microsoft HPC Pack 2012 R2 SP1 on Windows Server (on-premises, in Azure or hybrid) – Message Passing Interface (MPI) used (over RDMA network).

Azure Machine Learning

  • Predictive analysis in cloud – as a service, no VMs etc. to manage.
  • Take existing data, analyse by running predictive models and predict future outcomes/trends.
  • Deploy in minutes; drag and drop machine learning algorithms (built-in); use data in Azure; add custom scripts; Marketplace of vendors providing custom solutions.
  • Terminology:
    • Classification (group data).
    • Regression (predict a value).
    • Ranking (order items by criteria).
    • Clustering (take a set of data, e.g. by date range).
  • Get raw data (unstructured or losely structured) -> data cleaning -> build machine learning model -> predict results.

Azure Automation

  • Script and automate the application lifecycle; simplify cloud management; automate manual, long-running and frequently-repeated tasks (save time and increase reliability).
  • Works with Web Apps. Virtual Machines, Storage, SQL Server and other Azure services.
  • Automation account is a container for Azure Automation resources.
  • Create runbooks – set of tasks that perform an automated process – PowerShell workflow.
  • Scheduler to start run-books daily/hourly/at a defined point in time.
  • Pricing based on minutes/triggers:
    • Free = 500 minutes
    • Basic tier
    • Standard tier
  • Automation is an enabler for DevOps:
    • Dev team loves changes.
    • Ops Team loves stability.
    • Agile used for development between business-dev.
    • DevOps fills gap between dev and ops.
    • Infrastructure as code; configuration automation; automation testing.
  • Continuous integration – pipeline to delivery and deployment – cycle of integrating solution with various phases:
    • Delivery team check-in to Version Control, triggers Build and Unit Tests (with Feedback). When Build and Unit tests are clean, triggers Automated Acceptance tests (with feedback). When approval gained, move to User Acceptance Tests, and then on FInal Approval move to release.
  • Continuous Delivery – push-button deployment of any version of software to any environment, on demand – similar to CI but can feed business logic tests.
    • Need automated testing to achieve CD.
  • Continuous Deployment – natural extension to CD; every check-in ends up in a production release.
  • Chef for Configuration Automation: Configuration Management between environments: Build, Test, Release, Deploy (and automate CI/CD). Manage Windows and Linux VMs, integration via Azure Portal. Chef and DSC can be used together to manage infrastructure.
  • Puppet – integrated with Azure and VS 2013 for easy deployment of infrastructure across physical and virtual machines. Can deploy pre-configured Puppet image to create a VM.
  • Deploy Custom Script with VM configuration – run when VM is launched (one of the available config extensions).
  • VM agent is used to install and manage extensions that help interact with the VM (Chef, Puppet, Custom Script).

Azure Media Services

  • Developing video on demand is challenging: cost/managing content/encoding/distribution across multiple devices/streaming experience/DRM content protection/providing high quality video for any device any time anywhere.
  • Ingest data, encode, format conversion, content protection (DRM policies), on-demand streaming, live streaming, analytics, advertising.
  • Need media service account and associated storage account.
  • Media Player is web video player service backed by Azure Media Service: one player for all popular devices – no need to develop device-specific player; plays format for that device; easy intergtaion with web and apps; standard player controls.
  • Data caching via Azure CDN.
  • Steps:
    • In management portal, create new Media Service with name, storage account and region.
    • Start the Media Service.
    • Scale up streaming units (1 unit=200Mbps).
    • Upload a video file (from local or from Azure storage) – will be stored in storage account without encryption.
    • Publish the file.
    • Configure the encoding options, then video is uploaded into portal (can encode multiple times for different formats with different names).
    • View the media content (copy link into browser).

Azure Resource Manager

  • With ASM even a VM has a cloud service.
  • ARM is pure IaaS, not necessarily cloud service.
  • Deploy, manage and monitor services as a group; deploy repeatedly throughout the application life cycle; use declarative templates to define deployment; can have dependencies between resources; apply RBAC; organise logically by tagging.
  • ASM tightly couples to cloud service – VM in subnet, in VNet, in cloud service, in region, with VIP for DNS and public IP.
  • ARM is more loosely coupled – can have multiple VIPs, NICs, etc. All in a RG (which can span regions). Attached via reference.
 ASM XML  ARM JSON
VM deployment  Cloud service as container  Does not require a cloud service
Availability set Define VMs under same availability set Availability set is a resource exposed by the Microsoft.Compute provider – VMs that need HA must be included in availability set
Fault domain  Maximum 2 fault domains  Maximum 3 fault domains
Load balancing Cloud service provides an implicit load balancer for the VMs  The load balancer is a resource exposed by the Microsoft.Network provider
Virtual IP address Default static VIP as long as one VM running in the cloud service Public IP is a resource exposed by Microsoft.Network – can be static (reserved) or dynamic
Reserved IP address Reserve an IP address in Azure and associate with a cloud service Public IP can be created as static and assigned to a load balancer
  • Choose deployment mode when provisioning resources. Limited inter-operability so choose the right model.
  • Deploy using
    • Portal
    • PowerShell: Switch-AzureMode -Name AzureResourceManager
    • ARM REST API
    • Azure CLI: azure config mode arm
  • Resource Manager template – JSON document – deploys and provisions all of the related resources in a single, co-ordinated operation.
  • Tags are key-value pairs of metadata: applied to individual ARM resources or ARM RGs – up to 15 tags per Resource or RG
    RBAC – Owner, Reader or Contributor.

Azure Messaging Solutions

  • Service Bus: multi-tenant cloud service – each user creates a namespace to work within.
    • Queues – one-way communication, asynchronous queuing with guarantee of message delivery order (worker has to keep polling).
    • Topics – let each receiving application create a subscription by defining a filter (avoid polling – get notification instead) – pub-sub model. Read with RecievAndDelete or PeekLock; can have multiple subscribers.
    • Relays – synchronous 2 way communications between applications – won’t help with buffering.
  • Event hubs – highly scalable ingestion system that can process millions of events per second (e.g. for IoT).
  • Can also queue via storage – more options with service bus but more scalable with storage.

Azure Backup

  • Backup service targeted at replacing tape backup.
  • Can work with on-premises workloads or Azure workloads.
  • On-premises backup – pick region and create a vault; download vault credential files; download and install Azure backup agent; can seed through Azure Import/Export Service; select backup policy (start time of backup (retention policies (weekly/monthly/yearly)) – backups are incremental.
  • Azure VM Backup – install agent if not already installed, register VMs with Azure Backup Service (installs backup agent in extensions); select backup policy.
  • Azure backup is to backup data on VM. Priced per protected instance and storage consumed (price for protected instance goes up at 50GB, then 500GB, then each additional 500GB.

Azure Site Recovery

  • Orchestrates failover and recovery of a VM.
  • On-premises machine replicated to vault in Azure, or to another datacentre – not Azure to Azure.
  • Protect AD and DNS, SQL Server, SharePoint, Dynamics AX, RDS, Exchange, SAP.
  • Can also perform a test failover, starting resources in Azure but not routing the traffic.
  • Use to protect VMware ESX or Hyper-V VMs or physical servers and can be used to migrate to Azure

Business continuity (BC) and disaster recovery (DR)

  • Scenarios: recover from local failures; loss of a region; on-premises to Azure
  • For Azure failures:
    • HA in PaaS (per region), just make sure web and worker roles 2 or more roles each – then will automatically be spread across fault domains.
    • For region failure need to plan across regions – more elaborate (make sure code and config is available in a second region).
  • HA in IaaS needs management of VMs in availability sets (need to define define manually).
  • At region level, also think about load balancing (VIP), storage (LRS, ZRS, GRS of RA-GRS), Azure SQL replication.
  • Recover from loss of region:
    • Redeploy on disaster (cold DR) – replicate data ready to run (not high RTO/RPO)
    • Warm spare (active/passive) – infrastructure in DR region but not fully available (e.g. SQL replication with secondary copy not accessed, not routing traffic to passive).
    • Hot spare (active/active) – two regions at the same time (e.g. SQL on IaaS and replicating itself).
  • Cross regional strategies for DR:
    • VNet – export settings, import in secondary region.
    • Cloud Services – create a separate cloud service in target region; publish to secondary region if primary files; use Traffic Manager to route traffic.
    • VM – use blob copy API to duplicate VM disks; geo-replicated VM images.
    • Storage – use GRS or RA-GRS (replicated in minutes, so tight RPOs cannot rely on this – need to write own algorithm).
    • Azure SQL:
      • Geo-restore (1 hour RPO/<12 hours RTO).
      • Standard geo-replication (5 secs RPO/30 mins RTO) – no access to secondary.
      • Active geo-replication (5 secs RPO/30 mins RTO) – read access to secondary.
      • Manually export to Azure Storage (blob) with Azure SQL database import/export service.

Securing Azure Resources

  • Cloud security model is shared security model:
    • Users are responsible for securing applications.
    • Cloud Service Provider (CSP) is responsible for providing controls; users for using them!
    • CSP is responsible for infrastructure security.
  • VNet/VM security: use endpoints (ACL for endpoints, NSGs at VM or VNet level).
  • Storage: use shared access signatures.
  • Role-based access control.
  • Encryption.

Microsoft’s UK datacentres: what you need to know

This morning, the UK woke up to an announcement from Microsoft that the UK datacentres for Azure and Office 365 are generally available, making Microsoft the first global provider to deliver a complete cloud (Iaas, PaaS and SaaS) from UK data centres.

That means:

  • Two new Azure regions in the UK:
    • UK West (Cardiff)
    • UK South (London)
  • Office 365 services from UK datacentres in Durham and London.

Dynamics CRM online will be offered from the UK in the first half of 2017.

That Azure location information was taken from the Azure regions page on the Microsoft website (although my sources tell me that “Cardiff” is really “Newport” – close enough as to make no difference anyway, and London is probably “near London” too).  The Office location information was taken from the Office 365 Interactive Data Maps.

Now, UK customers already using Azure or Office 365 will be asking “will my data be moved to a UK datacentre?”. There’s no official announcement from Microsoft (not that I’ve seen) but my (unofficial) answer is “no”. At least not automatically.

For Azure, it’s good practice to design across multiple regions. There are also implications around geo-replication (which regions are paired with which for business continuity and disaster recovery purposes). Moving resources from one region to another is possible but is also a project that would need to be undertaken by a customer (possibly working with a partner) as a programme of planned resource moves.

For Office 365, it’s worth reading the TechNet advice on Moving core data to new Office 365 datacenter regions. At the time of writing it hasn’t been updated to reflect UK datacentres (it was last updated 28 July 2016) but it currently says:

“Existing customers that have their core customer data stored in an already existing datacenter region are not impacted by the launch of a new datacenter region”

[…]

“The data residency option, and the availability to move customer data into the new region, is not a default for every new region we launch. As we expand into new regions in the future, we’ll evaluate the availability and the conditions of data moves on a region by region basis.”

“New customers or Office 365 tenants created after the availability of the new datacenter region will have their core customer data stored at rest in the new datacenter region automatically.”

The page goes on to state that, assuming the data residency option is made available for the UK (remember, nothing has been announced yet)

“Customers will need to request to have their data moved within a set enrollment window.”

and that:

“Data moves can take up to 24 months after the request period to complete”

There’s also a footnote on the UK interactive data map to say:

“Customers who signed up and selected the United Kingdom for their Office 365 services before September 2, 2016 will have their customer data located in the EMEA datacenter locations.”

So, in short, Office 365 (SaaS) data stays exactly where it is, unless you sign up for a new tenant, or wait for further announcements from Microsoft. Azure (IaaS and PaaS) workloads can be moved to the new regions whenever you are ready.