Microsoft Cortana Intelligence Suite Workshop Video Tutorial Series (3/5): Azure Data Factory

Machine Learning, predictive analytics, web services and all the rest to make it happen are really about one thing. And that is to acquire, process and act on data. For the workshop, this is done with a Data Factory pipeline configured to automatically upload a dataset to the storage account of a Spark cluster where Azure Machine Learning is integrated to score the dataset. Importantly, this addresses a fundamental requirement relevant to data-centric applications involved cloud computing. Which is to securely, automatically and on demand moving data between an on-premises location and a designated one in the cloud. For IT today, cloud can be a source, a destination and a broker of data and the ability to securely move data between an on-premises facility and a cloud destination is imperative for a hybrid cloud setting and a backup-and-restore scenarios. And Azure Data Factory is a vehicle to achieve that ability.

image

The workshop video tutorial series is as listed below:

Specifically, Exercises 2 -4 are to accomplish three things:

  • Creating an Azure Data Factory service and pairing which with a designated
    on-premises (file) server
  • Constructing an Azure Data Factory Pipeline to automatically and securely
    move data from the designated on-premises server to a target Azure blob storage
    account
  • Enabling the developed Azure Machine Learning model to score the date
    provided by Azure Data Factory pipeline

Notice that the lab VM is also employed as an on-premises file server hosting a dataset to be uploaded to Azure. At one moment, you may be using the lab VM as a workstation to access Azure remotely, and the next on an on-premises file server installing a gateway. When following the instructions, be mindful where a task is carried out, as the context switching is not always apparently.

Advertisements

Microsoft Cortana Intelligence Suite Workshop Video Tutorial Series (1/5): Introduction

This series, based on the content developed by Microsoft, offers a learning path with minimal time and effort to acquire the essential operation-level knowledge of Microsoft Cortana Intelligence Suite. The workshop steps through a process to construct and deploy a web application with predictive analytics, while along the way introducing key functional components. By specifying an origin and a destination airports, a future date and time and an airline carrier, this application predicts a flight delay with probability based on the weather forecast. The video tutorial series runs about 75 minutes and has captured exactly when and what you will see on the screen, where and how to respond based on the instruction of each exercise in the workshop.

I believe this series will most benefit those who function in a technical leadership capacity including: enterprise architect, solution architect, cloud architect, application architect, DevOps lead, etc. and are interested in the solution architecture of an application of predictive analytics. Going through the recordings will provide you an end-to-end view and clarity on how to constructing and deploying a predictive analytics solution, hence a better understanding on the processes and technologies, integration points, packaging and publishing, resource skill profiles, critical path, cost model, etc.

Cortana Intelligence Suite is a set of processes and tools. This workshop outlines an approach where analytic models, data, analysis, visualization, packaging, publishing and deployment are delivered in an integrated fashion. In my view, this is a productive and the right way to start learning how to architect a predictive analytics solution. The above video is the first of five to accelerate your learning of Cortana Intelligence Suite, and highlight a few important items before starting the workshop.

Content Repo

The content of this workshop made available by Todd Kitta is at http://aka.ms/CortanaManual in github. The readme file of the workshop details the scenario, architecture, prerequisites and a list of links to the instructions of all eight exercises.

image

The above architecture diagram of the workshop depicts the functional components for a web application with predictive analytics. Here the lab VM is also employed as an on-premises file server as the source of a data pipeline securely connected to a created Azure Data Factory service to automatically upload data to be scored by the Azure Machine Learning model. At the center is a Spark HDInsight cluster for data analysis, while the data are visualized by Power BI. The predictive analytics model is integrated and package as a web service consumed by a web application.

Introduction

Let’s first pay attention to a few important items before doing the workshop. There are eight exercises in this workshop and I have grouped them into five videos: an introduction and four learning units.

I recommend reading the instruction of an exercise in its entirety before doing the exercise, this will help set the context and gain clarity the objectives of each exercise. To do the workshop, one will need an active Azure subscription. Notice that a free trial account does provide sufficient credit for doing the entire workshop.

image

The workshop environment is a collection of resources deployed to Azure, as shown above, including:

  • A VM with Internet connectivity for a student to log in and work on all the exercises, such that there is no need to download or install anything locally for this workshop
  • A Machine Learning workspace accessed via Microsoft Azure Machine Learning studio to develop an experiment of predictive analytics
  • A Spark cluster for hosting and analyzing data including a scored dataset and a summary table
  • A number of storage accounts for storing workshop data

These resources do incur a cost. And to minimize the cost, try deploying the workshop environment only when you are ready to work on the exercises and delete it once completed the workshop. The deployment will take about fifteen minutes, if not more. And do deploy all resources and create services into the same resource group, so all can be later removed by simply deleting the resources group. Personally, when doing the workshop, I will set aside at least a four-hour block, find a quiet room and get a great cup of coffee. It is indeed a lot to consume.

Enjoy the workshop. Let’s get started.

US TechNet on Tour | Cloud Infrastructure – Resource Page

This wave of TechNet events focuses on Azure (IaaS) V2, namely Azure Resource Manager or ARM. It is part of IT Innovation series currently delivered in US metros and many other geo-locations in the spring of 2016. For those outside of the US, go to http://aka.ms/ITInnovation to find out events near you. Come and have some serious fun in learning.

imageimage

The presentations, available in PDF format, and the following lab material are included in this zip file.

GitHub repository for Lab Files if using your own machine

If you are not using the hosted virtual machine and are using your own workstation, any custom files the lab instruction call out can be found in a GitHub repository. The repository is located here: https://github.com/AZITCAMP/Labfiles.

Required Software

Description

Steps

Required software will be called out throughout the lab.

  1. Microsoft Azure PowerShell – http://go.microsoft.com/?linkid=9811175&clcid=0x409 (also installs the Web Platform Installer, minimum version 0.9.8 and higher)
  2. Visual Studio Code – https://code.visualstudio.com/
  3. Install GIT at: http://git-scm.com/download/win
  4. GitHub Desktop for Windows – https://desktop.github.com/
  5. Windows Credential Store for Git (if VSCode won’t authenticate with GitHub) – http://gitcredentialstore.codeplex.com/
  6. Iometer – http://sourceforge.net/projects/iometer/

Optional Software

Description

Software

Any additional software that you require will be called out in the lab. The following software may be useful when working with Azure in general.

  1. Remote Server Administration Tools – http://support.microsoft.com/kb/2693643 (Windows 8.1) or http://www.microsoft.com/en-ca/download/details.aspx?id=45520 (Windows 10)
  2. AzCopy – http://aka.ms/downloadazcopy
  3. Azure Storage Explorer – http://azurestorageexplorer.codeplex.com/downloads/get/891668
  4. Microsoft Azure Cross-platform Command Line Tools (installed using the Web Platform Installer)
  5. Visual Studio Community 2015 with Microsoft Azure SDK – 2.8.1 (installed using the Web Platform Installer)
  6. Msysgit – http://msysgit.github.io
  7. PuTTY and PuTTYgen – (Use the Windows Installer) http://www.chiark.greenend.org.uk/~sgtatham/putty/download.html
  8. Microsoft Online Services Sign-In Assistant for IT Professionals RTW – http://go.microsoft.com/fwlink/?LinkID=286152
  9. Azure Active Directory Module for Windows PowerShell (64-bit version) – http://go.microsoft.com/fwlink/p/?linkid=236297

IT Pros’ Job Interview Cheat Sheet of Multi-Factor Authentication (MFA)

Internet Climate

Recently, as hacking has become a business model and identity theft an everyday phenomenon, there is increasing hostility in Internet and an escalating concerns for PC and network securities. No longer is a long and complex password sufficient to protect your assets. In addition to a strong password policy, adding MFA is now a baseline defense to better ensure the authenticity of an examined user and an effective vehicle to deter fraud.

Market Dynamics

Furthermore, the increasing online ecommerce transactions, the compliance needs of regulated verticals like financial and healthcare, the unique business requirements of market segments like the gaming industry, the popularity of smartphones, the adoption of cloud identity services with MFA technology, etc. all contribute to the growth of MFA market. Some market research published in August of 2015 reported that “The global multi-factor authentication (MFA) market was valued at USD 3.60 Billion in 2014 and is expected to reach USD 9.60 Billion by 2020, at an estimated CAGR of 17.7% from 2015 to 2020.”

Strategic Move

While mobility becomes part of the essential business operating platform, a cloud-based authentication solution offers more flexibility and long-term benefits.The is apparent The street stated that

“Availability of cloud-based multi-factor authentication technology has reduced the maintenance costs typically associated with hardware and software-based two-factor and three-factor authentication models. Companies now prefer adopting cloud-based authentication solutions because the pay per use model is more cost effective, and they offer improved reliability and scalability, ease of installation and upgrades, and minimal maintenance costs. Vendors are introducing unified platforms that provide both hardware and software authentication solutions. These unified platforms are helping authentication vendors reduce costs since they need not maintain separate platforms and modules.”

Disincentives

Depending on where IT is and where IT wants to to be, the initial investment may be consequential and significant. Adopting various technologies and cloud computing may be necessary, while facing resistance to change in corporate IT cultural.

Snapshot

The following is not an exhaustive list, but some important facts, capabilities and considerations of Windows MFA.

mfa

Closing Thoughts

MFA helps ensure the authenticity of a user. MFA by itself nevertheless cannot stop identity theft since there are various ways like key logger, phishing, etc. to steal identity. Still, as hacking has become a business model for some underground industry, and even a military offense, and credential theft has been developed as a hacking practice, it is not an option to operate without a strong authentication scheme. MFA remains arguably a direct and effective way to deter identity theft and fraud.

And the emerging trend of employing biometrics, instead of a password, with a key-based credential leveraging hardware and virtualization-based security like Device Guard and Credential Guard in Windows 10 further minimizes the attack surface by ensuring hardware boot integrity and OS code integrity, and allowing only trusted system applications to request for a credential. Device Guard and Credential Guard together offers a new standard in preventing PtH which is one of the most popular types of credential theft and reuse attacks seen by Microsoft so far.

Above all, going forward we must not consider MFA as an afterthought and add-on, but an immediate and imperative need of a PC security solution. IT needs to implement MFA sooner than later, if not already.

Don’t Kid Yourself to Use the Same Password with Multiple Sites

I am starting a series of Windows 10 contents with much on security features. A number of topics including Multi-Factor Authentication (MFA), hardware- and virtualization-based securities like Credential Guard and Device Guard, Windows as a Service are all included in upcoming posts. These features are not only signature deliveries of Windows 10, but significant initiatives in addressing fundamental issues of PC security while leveraging market opportunities presented by a growing trend of BYOD. The series nevertheless starts with where all security discussions should start, in my view.

Password It Is

At a very high level, I view security encompassing two key components. Authentication is to determine if a user is sad claimed, while authorization grants access rights accordingly upon a successful authentication. The former starts with a presentation of user credentials or identity, i.e. user name and password, while the latter operates according to a security token or a so-called ticket derived based on a successful authentication. The significance of this model is that user’s identity, or more specifically a user password since a user name is normally a display and not encrypted field, is essential to initiate and acquire access to a protected resource. A password however can be easily stolen or lost, and is arguably the weakest link of a security solution.

Using the Same Password for Multiple Sites

When it comes to cyber security, the shortest distance between two points is not always a straight direct line. For a hacker to steal your bank account, the quietest way is not necessarily to directly hack your bank web site, unless the main target is the bank instead. Since institutions like banks and healthcare providers, for example, are subject to laws, regulations and mandates to protect customers’ personal information. These institutions have to financially and administratively commit and implement security solutions, and attacking them is a high cost operation and obvious much difficult effort.

image

An alternative, as illustrated above, is to attack those unregulated businesses, low profile, lesser known and mom-and-pop shops where you perhaps order groceries, your favorite leaf teas and neighborhood deliveries as a hacker learned your lifestyle from your posting, liking and commenting on subjects and among communities in social media. Many of those shops are family own businesses, operating on a string budget, and barely with enough awareness and technical skills to maintain a web site with freeware download from some unknown web site. The OS is probably not patched up to date. If there is antivirus software, it may be a free trail and have expired. The point is that for those small businesses the security of the computer environment is properly not an everyday priority, let alone a commitment to protect your personal information. 

The alarming fact is that many do use the same password for accessing multiple sites.

image

Bitdefender published a study in August of 2010, as shown above, and revealed more than 250,000 email addresses, usernames and passwords can be found easily online, via postings on blogs, collaboration platforms, torrents and other channels. And it pointed out that “In a random check of the sample list consisting of email addresses, usernames and passwords, 87 percent of the exposed accounts were still valid and could be accessed with the leaked credentials. Moreover, a substantial number of the randomly verified email accounts revealed that 75 percent of the users rely on the same password to access both their social networking and email accounts.”

image

 On April 23, 2013, Ofcom published that (as shown above) “More than half (55%) of adult internet users admit they use the same password for most, if not all, websites, according to Ofcom’s Adults’ Media Use and Attitudes Report 2013. Meanwhile, a quarter (26%) say they tend to use easy to remember passwords such as birthdays or names, potentially opening themselves up to the threat of account hacking.” As noted, this was based on 1805 adults aged 16 and over were interviewed as part of the research. Although the above statistics are derived from surveying UK adult internet users, it does represent a common practices in internet surfing and raises a security concern.

Convenience at the Risk of Compromising Security

imageWith free Wifi, our access to Internet and getting connected is available at coffee shops, bookstores, shopping malls, airports, hotels, just about everywhere. The convenience comes with a high risk nevertheless since these free accessing points are also available for hackers to identify, phish and attack targets. Operating your account with a public Wifi or using a shared device to access protected information is essentially inviting an unauthorized access to invade your privacy. Using the same password with multiple sites further increases the opportunities to possibly compromise high profile accounts of yours via a weaker account. It is a poor and potentially a costly practice with devastating results, while choosing convenience at the risk of compromising security.

User credentials and any Personally Identifiable Information (PII) are valuable asset and what hackers are looking for. Identifying and protecting PII should be an essential part of a security solution.

Fundamental Issues with Password

Examining the presented facts of using password surfaces two issues. First, the security of a password much relies on a user’s practice and is problematic. Second, a hacker can log in remotely from other states or countries with stolen user credentials with a different device. A direct answer to these issues includes to simply not use password, instead with something else like biometrics to eliminate the need for user to remember a string of strange characters. And associate user credentials with a user’s local hardware, so that the credentials are not applicable with a different device. Namely employ the user’s device as a second factor for MFA. 

Closing Thoughts

Password is the weakest link in a security solution. Keep it complex and long. Exercise your common sense in protecting your credentials. Regardless you are winning a trip to Hawaii or $25,000 free money, do not read those suspicious email. Before clicking a link, read the url in its entirety and make sure the url is legitimate. These are nothing new and just a review of what we learn in grade school of computer security.

Evidence nevertheless shows that many of us however tend to use the same passwords for multiple sites as passwords increase and are complex and hard to remember. And the risk of an unauthorized access becomes high.  

For IT, eliminate password and replace with biometrics is an emerging trend. Implementing Multi-Factor Authentication needs to be sooner than later. Assess hardware- and virtualization-based securities to fundamentally design out rootkit and ensure hardware boot integrity and OS code integrity should be a top priority. These are the subjects to be examined as this blog post series continues.