Posts

DSC_0006 Zach Doty Intermediate GROUP BY SQL Skills Challenge Cover Photo

Intermediate SQL Skills Challenge: GROUP BY

Hey there SQL-ites! Wow, it’s been awhile since I’ve last posted. Work has been crazy busy again, and just life in general. Sure feels good to be back, learning again! I’m daringly dropping straight back into my learnings from where we left off…in April!? Crazy.

Anyway, let’s get back to brass tacks. Before life and work got really crazy for me, we covered:

Another quick recap note, we’ve been using the famous DVD rental training database for our work. On to the good stuff.

 

GROUP BY SQL Skills Challenge #1

Let’s say it’s time for quarterly reviews, who doesn’t love evaluations? ¯_(ツ)_/¯ Implement your knowledge of GROUP BY against the following problem:

We need to figure out how much revenue our employees have realized from DVD rentals, and how many transactions each handled.

 

GROUP BY SQL Skills Answer #1

Let’s talk through the problem and dissect it before presenting code. “A problem well-stated is a problem-half solved” – paraphrase of some smart person.

  1. We’re talking about revenue, so we’ll need to be dealing with the payment table.
  2. We’re evaluating employees (staff), SUM of revenue, and COUNT of transactions.
  3. If we’re aggregating this data, we’re GROUPing BY employee.
  4. We’re also ORDERing the employees BY who handled the most transactions.

That said, here’s the code:

SELECT staff_id, SUM(amount), COUNT(amount)
FROM payment
GROUP BY staff_id
ORDER BY COUNT(amount) DESC;

…with our results!

 

GROUP BY SQL Skills Challenge #2

Well done on your challenge! Here’s the second:

It’s time to do inventory, flashbacks of retail and restaurant wonder for all of us. 🙂

In the name of forecasting and planning, we need to figure out the average replacement cost of our movies, by rating.

 

GROUP BY SQL Skills Answer #2

Ok, let’s walk through our problem.

  1. We need to use the film database here, since we’re gathering information on the movies.
  2. We’re GROUPing our films BY rating
  3. We’re using an aggregate function to determine the average replacement cost of each rating.

Drum roll, here’s a winning code snippet:

SELECT rating, AVG(replacement_cost)
FROM film
GROUP BY rating
ORDER BY AVG(replacement_cost) DESC;

With the output:

 

Are there more challenges we should be covering? Yes. However, I’m trying to do better about getting more sleep these days. Unlike past SQL articles, it’s still (barely) before midnight. So we’ll take a quick breather, possibly update this article, but definitely keep moving forward. Cheers!

DSC_0511 Zach Doty Cover Photo Univariate Linear Regression

Univariate Linear Regression Concepts

Howdy, machine learning compatriots! Welcome back to our foray into getting started with machine learning. Previously, we covered some core machine learning concepts, namely supervised machine learning algorithms and unsupervised / deep learning. (For the full series to date, here’s our Machine Learning for Beginners page.)

 

Today we’re learning the concepts behind supervised machine learning algorithms. Specifically, we examine univariate (one variable) linear regression. Univariate linear regression is the beginner’s playpen in supervised machine learning problems. We endeavor to understand the “footwork” behind the flashy name, without going too far into the linear algebra weeds.

 

Quick Recap: Supervised Machine Learning Problems

If you’re just dropping into the series, we’ll quickly set today’s stage. Univariate linear regression falls under the category of regression algorithms, withing supervised learning machine learning problems.

 

2017-04-30-001-Machine-Learning-Algorithm-Types

  • Supervised learning: we provide the algorithm with pre-cleaned, pre-labeled data. The algorithm learns off the data we provide to classify or predict new data.
  • Regression: making a line of best fit.

 

When we first covered supervised machine learning concepts, regression was shown to make a line of best fit from existing data, so we could predict new data points. Below, we first used the example of an SEO team predicting how many unique linking domains a page would need to achieve a certain rank. (A supervised learning problem, using a regression algorithm for future predictions.)

 

2017-04-11-002-Regression-Problem-Linear-Quadratic-Comp

Important note: our graphic above is similar to linear, but is not quite, linear regression. Details, details. At any rate, this should take us in nicely to examining the inner workings of univariate linear regression.

 

A High Level Look at the Regression Problem Process

If I’m being brutally honest, the process of translating machine learning education to public-facing blog posts has been my toughest effort to date. In other words, I try to make my posts easy to follow, like dummy notes I’m taking as I learn.

That being said, 2 weeks into a machine learning course, and the content has already gone off the wheels, deep into linear algebra and so forth. So, instead of going into the weeds for publication, I’m trying to keep it snackable (buzzword bingo, drink!) and down to earth.

Let’s settle in slowly on the regression problem process. Also illustrated below, we need a few key steps:

  1. A cleaned training data set with correct labels
  2. A program (such as Matlab or Octave) with functionality and access to an appropriate univariate linear regression algorithm
  3. A hypothesis and prediction of new values

2017-04-30-002-Univariate-Machine-Learning-Regression-Process

 

Let’s peel back a layer and go slightly deeper. Since cleaning and correctly labeling training data is largely dependent on you & your domain, we’re skipping that step. ¯_(ツ)_/¯

 

Instead, let’s look closer at the algorithm & hypothesis portions! Our first stop is something called the cost function.

 

The Cost Function in Linear Regression Learning Problems: Squared Error

Before we jump into cost function, let’s turn over a new leaf in visual examples. Instead of our SEO example, let’s look at a problem that could be more linear-friendly. Below, let’s assume we have some data on a customer’s lifetime value plotted against the number of marketing touchpoints they’ve interacted with.

2017-04-30-003a-Univariate-Linear-Regression-Sample-Data-Viz

 

Okay, with the housekeeping complete, let’s remember our goal for linear regression: find the line of best fit. 

Let’s also tie this back to the real world. Perhaps we’re a marketing director or VP of marketing needing to convey the ideal number of marketing touchpoints to the CMO and CEO. Doing so could help guide budgeting, channel mix, and planning questions.

How do we find a line of best fit? Through linear algebra and programming, we can objectively determine the best fit by testing hypotheses and measure each hypothesis line against the actual data points for closeness of fit.

2017-04-30-004a-Univariate-Linear-Regression-Cost-Function-Hypothesis-A

Being frank, the material up to this point is pretty humdrum. However, when we start making hypotheses such as the above, things get interesting. The program “makes a guess” as to the line of best fit, perhaps like the illustration above. I’m no “eggspert”, but that doesn’t look like a great line of fit.

 

But have no fear dear reader, math/science comes to the rescue. The next portion of the algorithm calculates the distance (cost function / squared error) from the training data to the predicted line of best fit in a process called squared error function. When you plot the hypothesis against the squared error sum, you may get a distribution something like the below.

2017-04-30-005-Univariate-Linear-Regression-Cost-Function-Plot

Bare with me. Let’s say we plotted:

  1. Our illustrated hypothesis (teal plus sign)
  2. Other attempted hypotheses (tan x’s), and,
  3. The best fit hypothesis (green outlined star)

This renders a convex parabolic distribution. To get the line of best fit, we want to get as low on the X axis as possible. (Known as the global minimum.) The further magic in machine learning is how we move from a lame hypothesis (teal plus sign) to solution (green outlined star). Now, meet a technique called gradient descent. Sidebar: if we’re being more mathematical and technical about it, this really plots to a 3D conic distribution, but the above explanation should suffice for now if we’re not getting bogged down in the math.

 

Parameter Learning & Gradient Descent

Gradient descent is the iterative mathematical process of working our way down the squared error plot from a lousy hypothesis to a line of best fit. Again, we’re not delving into calculations and derivatives – there’s a TON of math that goes behind this material.

Gradient descent systematically tests increments of hypotheses against a specified learning rate. The learning rate is essentially the magnitude or speed with which you which try to move along the convex function down to zero.

2017-04-30-006-Univariate-Linear-Regression-Simplified-Gradient-Descent

 

Wrap Up

Did I mention this is one of the toughest posts for me to date? The other contender is my DIY Alexa Raspberry Pi article. It’s now 3:20 a.m. on a Saturday night/Sunday morning as I type this conclusion. (Insert horror emoji.)

So, if we were to break down all of the above into a short bulleted list:

  1. Univariate linear regression takes sample data to make a line of best fit
  2. “Best fit” is objectively measured by a squared error function, or the summed distances of the hypothesis line from the actual data points
  3. The hypothesis and squared error function plot roughly a convex parabolic graph
  4. Gradient descent is an algorithm that systematically reduces the squared error hypothesis, guided in part by the learning rate
  5. The gradient descent iteratively seeks the global minimum on the convex function, AKA the line of best fit
  6. The line of best fit is determined, (and Teh Lurd of Teh Rings finishes on your second monitor to Herb Alpert’s Spanish Flea.)

 

Next up, we’ll be installing some machine learning software (Matlab & Octave) and diving into multivariate regression. Look after each other.

DSC_0045 Zach Doty Cover Photo Group By SQL Statement Function

GROUP BY SQL Statement

Introducing the GROUP BY SQL Statement in PostgreSQL

‘Ello SQL geeks! Welcome back to our SQL learning journey. We left off with a beginner SQL skills challenge and the aggregate SQL functions: MIN, MAX, AVG and SUM. Today we’re looking at the GROUP BY statement. We’ll learn about this function in PostgreSQL and walk through usage of this handy SQL statement.

 

About the GROUP BY SQL Statement/Clause

From my simple understanding, GROUP BY functions like a hybrid of the following:

  • SELECT DISTINCT keyword (If used without an aggregate function like SUM), and,
  • An Excel pivot table, rolling up aggregate figures (Count, Sum, Average, etc.) into unique rows

If you’re familiar with Excel Pivot tables, then you’ll recognize here the power of this function.

 

Let’s take a look at some examples to clarify.

 

First Look at Using the GROUP BY Function

To better illustrate the power of GROUP BY, we’ll first show its usage without aggregate functions. Consider the following:

If we query the address table of our sample database with a generic SELECT * FROM address; we get back an atrocious 605 rows of data. Aggregate and useless!

2017-04-27-001-GROUB-BY-Generic-SELECT-Query-Pitfall

In contrast, if we call the GROUP BY function, we’ll get back a cleaner output, with fewer rows – only unique values returned. While this is an incremental improvement to analyzing data, there’s much left to be desired.

2017-04-27-002-GROUP-BY-Without-Aggregate-Function

 

What’s missing here? How about that pivot table-esque functionality? This is where the power of using GROUP BY with aggregate functions gets awesome.

Using the GROUP BY SQL Statement with Aggregate Functions

As with most analysis, a single data point or data series rarely holds significant insight value on its own. Let’s drive home that point by leveraging the GROUP BY statement with the SUM aggregate function. Below, we compare the ratings of films in our sample database in aggregate by replacement cost. Perhaps this could serve in-store strategies for loss prevention.

SELECT rating, SUM(replacement_cost)
FROM film
GROUP BY rating;

2017-04-27-003-GROUP-BY-SUM-Aggregate-Function

If we extend this functionality to more real world examples, we could use the following for GROUP BY:

  • Grouping page-level / URL data to roll up clickstream analytics data
  • Large scale analysis of CRM data for customer segmentation analysis
  • Analyzing returns for financial data

The list could (and I’m sure it does) go on.

Let’s take this one step further and reduce potential future workload, by building sort functionality into our query. Below, we add a line to get most expensive ratings to least.

SELECT rating, SUM(replacement_cost)
FROM film
GROUP BY rating
ORDER BY SUM(replacement_cost) DESC;

2017-04-27-004-GROUP-BY-SUM-ORDER-BY-SUM

By the looks of this, no need to guard Land Before Time 8. 🙂

Extending our lesson: you can use the COUNT, AVG, and other aggregate functions to analyze as desired.

 

Wrap Up

Alright, this was a relatively gentle introduction into more advanced SQL functions. GROUP BY is a rather critical function, so in our next article, we’ll be doing yet another skills challenge. Joy!

Feel free to catch up on our other articles that help you learn PostgreSQL. Also, check out some how-to’s on developing Amazon Alexa skills, and a new series on getting started with Machine Learning. As always, please share with your colleagues and share thoughts in the comments below. Cheers.

SQL Aggregate Functions: Min, Max, Avg and Sum

Welcome back, SQL nerds! We’re back in action on the journey of learning SQL, after a beginner PostgreSQL skills challenge.

We’re reaching the end of basic functions and queries with this article. Based on what we’ve learned so far, we can do basic counting, filtering, sorting and pattern matching against PostgreSQL databases.

If you’re just tuning in, here’s the page on how to learn SQL, and the previous SQL article on the LIKE Statement.

Okay, enough jabber. Let’s jump in. The aggregate functions of MIN, MAX, AVG and SUM are our turning point into more complex SQL queries that involve concepts such as GROUP BY, among others.

At the same time, the functions on their own aren’t super complicated. Because we’re ass-u-ming you’re familiar with the general concepts of minimum, maximum, etc., we’re going to forgo conceptual and syntax explanations for demos.

 

AVG Aggregate Function

As we level up in SQL, we’re going to do less and less explaining / handholding / screenshots. That said, we’re going to explore our DVD rental data set for a table with a nice numerical component that would make good use of the functions.

We did a SELECT * FROM film LIMIT 15; to get a peek at the columns. For the purposes of this exercise, the replacement_cost column will do nicely.

To get the AVG:

SELECT AVG(replacement_cost) FROM film;

2017-04-17-001-AVG-Aggregate-PostgreSQL-Function

Using ROUND for Decimal Place Control

You’ll notice in the example above that we got 3 decimal places on what’s supposed to be a dollar amount. How do we fix that? Glad you asked. Meet the ROUND function.

We pass in the target value (average of replacement_cost) and mandate the number of decimal places we’d like returned. Below:

SELECT ROUND( AVG(replacement_cost), 2) FROM film;

2017-04-17-002-ROUND-for-AVG-Aggregate-PostgreSQL-Function

 

MIN Aggregate Function

To find the minimum value in a given column:

SELECT MIN(replacement_cost) FROM film;

2017-04-17-004-MIN-Aggregate-PostgreSQL-Function

 

MAX Aggregate Function

To find the maximum value in a given column:

SELECT MAX(replacement_cost) FROM film;

2017-04-17-003-MAX-Aggregate-PostgreSQL-Function

 

SUM Aggregate Function

To find the maximum value in a given column:

SELECT SUM(replacement_cost) FROM film;

2017-04-17-005-SUM-Aggregate-PostgreSQL-Function

 

Wrap-Up

Alright, that was a relatively quick article! Hopefully this was a reprieve from more involved sections in the past. We should find that as we continue to strengthen our core PostgreSQL  capabilities, these articles and our SQL queries should be easier and easier. If you found this article interesting, you might enjoy a new section on how to get started in machine learning. Cheers.

 

 

 

DSC_0007 Zach Doty Cover Photo for Beginner PostgreSQL Skills Challenge

Beginner SQL Skills Challenge!

Howdy, SQL geeks. Hope this post finds you swell!

Over the past few months, we’ve gained a ton of ground in learning SQL, or at least I have. 🙂

Let’s take a moment to:

  1. Test our knowledge of SQL skills learned thus far
  2. Start seeing SQL queries less as statements of code, and more as real world business challenges

In this article, we’ll have a recap plus three sections:

  • Recap of the training database we’ve been working with
  • General statements of each business problem
  • Hints and thoughts about how to approach each problem
  • Solutions to each problem

 

Recap: Our Training Database

Our training database is a best/old faithful. We’ve been using the surprisingly popular DVD rental database in a .tar file for our practice database.

Contained within this databases are various tables with fictitious information, including: customer information, film production information, business/pricing information and so forth.

In our challenges, we’ll execute various SQL queries to extract pieces of insight for business tasks. It’s assumed in this article that you’ve installed PostgreSQL via pgAdmin and have followed this article series so far, using the DVD rental training database.

Without further ado, let’s begin.

 

The SQL Challenges

Alright, here we go:

  1. How rentals were returned after July 17, 2005?
  2. How many actors have a last name that starts with the letter A?
  3. How many unique districts are represented in the customer database?
  4. Can you return the actual list of districts from challenge #3?
  5.  How many films have a rating of R and a replacement cost between $5 and $15?
  6. How many films have the word Truman somewhere in the title?

 

How to Approach the Challenges

Right, then. In this section, we’ll add some color commentary (read: hints) to our challenges. This should help you understand the mechanics of the solution, while ensuring you can’t see the answers all in one screen. 😉

 

Challenge 1: Rentals Returned After July 17, 2005

As with all challenges, a problem well stated is half (or more) solved. So let’s look at the high levels of the ask, and work our way down. We need information on rentals, so this means we’ll probably be querying the rentals table.

We would want to first examine the rentals table in a concise manner by doing:

SELECT * FROM rental LIMIT 5;

Once you’ve surveyed the table, we really only need one column returned (pun not intended) and we only want the sum figure of returns, where (HINT!!) the return date was after (think a logical operator here) July 17, 2005.

 

Challenge 2: Actors That Have a Last Name Beginning with “A”

Like our first challenge, let’s work from the “top down”. We need actor information, so querying the actor table would be a great place to start. Similar to last time, we need a count of values matching a condition. The difference versus challenge #1  is we need to find match a pattern like or such as an actor’s last name that begins with the letter A.

 

Challenge 3: Number of Unique Districts in the Customer Database

The title and description could cause some confusion here. You may need to do some basic SELECT * FROM table_name LIMIT X; queries to make sure you’ve got the right table. Once you do, we’re looking for an amount of distinct values in the database. Order of operations matters.

 

Challenge 4: Returning the Actual Lists of Districts from Challenge #3

Not much to hint at here – getting challenge #3 is the key here. You’ll really only be simplifying the correct query in challenge #3 to get the correct answer here.

 

Challenge 5: Cheap, (Mildly) Naughty Films

This one might be the longest query yet in this challenge. So we’re looking up film information, thus should know which table to query. We’re returning a value where a certain rating must be returned, and (HINT) we need to layer in one more lens of qualification. That lens dictates we specify a range between (cough, hint!) two values.

 

Challenge 6: Where in the World are Films Containing “Truman”?

This challenge is more of a recency test than retention of older concepts. You’ll need to employ pattern matching again for this business challenge/query to find titles that have some match like Truman in the title.

 

 

Challenge Solutions

Is that your final answer? Below are the queries, with screenshots of what I did.

 

Solution 1: Rentals Returned After July 17, 2005

SELECT COUNT(return_date) FROM rental
WHERE return_date > ‘2005-07-17’;

2017-04-15-001-Challenge-1-Sol-Count-From-Where

 

Solution 2: Actors That Have a Last Name Beginning with “A”

SELECT COUNT(*) FROM actor
WHERE last_name LIKE ‘A%’;

2017-04-15-002-Challenge-2-Sol-Count-From-Where-Like

 

Solution 3: Number of Unique Districts in the Customer Database

SELECT COUNT(DISTINCT(district)) FROM address;

2017-04-15-003-Challenge-3-Sol-Count-Distinct

 

Solution 4: Returning the Actual Lists of Districts from Challenge #3

SELECT DISTINCT(district) FROM address;

2017-04-15-004-Challenge-4-Sol-Select-Distinct

 

Solution 5: Cheap, (Mildly) Naughty Films

SELECT COUNT(*) FROM film
WHERE rating = ‘R’
AND replacement_cost BETWEEN 5 AND 15;

2017-04-15-005-Challenge-4-Sol-Count-From-Where-Between-And

 

Solution 6: Where in the World are Films Containing “Truman”?

SELECT COUNT(*) FROM film
WHERE title LIKE ‘%Truman%’;

2017-04-15-006-Challenge-4-Sol-Count-Where-Like-Wildcard

 

Wrap Up

Well done for completing these challenges! You shall indeed pass. 🙂

Soon, we’ll be covering aggregate SQL functions, such as MIN, MAX, AVG and SUM.

If you’re just joining us, here’s a running list of our articles so to date (4/16/2017):

 

DSC_0064 Zach Doty Unsupervised Machine Learning Intro Cover Photo

Unsupervised Learning Introduction: Machine Learning Essentials

Howdy, machine learning students! Today we’re going to introduce the concept of unsupervised machine learning algorithms.

Quick Recap: Supervised Learning

Before we jump in, let’s quickly recap our last article introducing supervised machine learning algorithms. This will give us the appropriate context for unsupervised learning.

In supervised machine learning problems, we supply pre-labeled data to the algorithm. By supplying data that’s already correctly labeled, we ask the algorithm to further predict (regression) or label (classification) new data.

2017-04-11-004-Multiple-Input-Classication-Machine Learning

 

Unsupervised Machine Learning = Unlabeled Data

The most immediate and prominent difference  for unsupervised learning is the data. Above, we gave the algorithm “a boost” by supplying the intended “right” answers in the data. Below, in an unsupervised machine learning problem, there are no right answers…yet.

2017-04-15-001M-Unsupervised-Machine-Learning-Problem-Data

We’ve supplied the algorithm with data in the problem, but it’s provided without labels or “answers”. We are mandating that the algorithm discover structure and infer patterns/labels on its own. We could also compare the above example to a clustering problem.

So in unsupervised learning, we supply a large amount of unlabeled data, without explicitly identified form or structure. We ask the algorithm to come up with ideas of structure and segmentation on its own.

Some additional applications of unsupervised learning could include:

  • Market segmentation of massive transaction data
  • Large scale social networking data
  • Astronomical data analysis
  • Large scale market data
  • Mass audio/voice analysis
  • Large scale gene clustering

 

Wrap Up

That was a bit of a quick one! The challenge with some these technical subject matter areas is sometimes we have limited room to run before going off into the technical weeds. This is one of those areas. Next, we’ll be covering some key concepts in the areas of machine learning model representation, cost function and parameter learning. Don’t worry too much about those yet, we’ll take it step by step. 🙂
As always, feel free to follow my other journeys of learning PostgreSQL, learning how to develop Amazon Alexa Skills, learning how to get started in algorithmic trading, JavaScript for beginners…and more to come soon! Cheers.

DSC_0024 Zach Doty Cover Photo for What is Machine Learning

What is Machine Learning?

Hello there, fellow Machine Learning (ML) students! Welcome back to our crash course in starting machine learning from an absolute beginner’s perspective.

In our previous article, we covered an introduction to Machine Learning, answering several key questions:

  • Where is machine learning used in our lives?
  • Where did machine learning come from?
  • Where is machine learning headed?

Forging ahead in our learning journey, we’ll introduce some definitions of machine learning and look at the major types of machine learning applications.

 

Machine Learning (ML), A Casual Definition by Arthur Samuel

Our first definition, teased in the last article, follows:

Machine learning is the practice of giving computers the ability to learn without being explicitly programmed to do so.

 

More on Arthur Samuel & Why His Definition on ML Matters

If you’re like me, you might not have heard of Arthur Samuel. Who is he, and why does his opinion matter in the fields of artificial intelligence and machine learning?

Arthur Samuel was a pioneer in artificial intelligence and computer gaming fields. In 1959, he coined the term “machine learning” as a founding father in the field. That’s why he’s important! Let’s also look at a more formal / scientific definition.

 

A More Formal Machine Learning Definition

Tom Mitchell, of Carnegie Mellon, offers a definition with more structure.

  • A well defined learning problem follows
  • E * T = P
    • Note: His definition does not include mathematical operators. I’m taking a large liberty to insert them myself. ¯\_(ツ)_/¯
  • Experience (E) placed against Task (T) is measured by Performance (P)

2017-04-06-001-Machine-Learning-Definition-ETP-Framework

Here’s a further example:

Example: playing Go.

E = the experience of playing many games of Go.

T = the task of playing and winning Go.

P = the probability that the program will win the next game.

 

Major Categories of Machine Learning Algorithms

If you judge by press coverage of ML as I have, it appears to be a nebulous field. (In all fairness, it may still be.) However, there is structure we can take in learning ML. There are two types of machine learning algorithms:

  • Supervised learning algorithms
  • Unsupervised learning algorithms

There are a couple of other prominent types of machine learning algorithms as well: reinforcement learning and recommender systems.

 

 Wrap-Up

Congratulations, we’ve cleared a very gentle introduction to machine learning, and it’s novice/high level definitions. I look forward to learning more with you, dear reader! Our next articles will cover a bit more detail about the two major ML algorithm types: supervised learning, and unsupervised learning. Until then, look after each other.

DSC_0013 Zach Doty Cover Photo for Introduction to Machine Learning

An Introduction to Machine Learning

Howdy! This is an abrupt interruption to our regularly scheduled programming of SQL lessons, Amazon Alexa Skill development and Algorithmic trading. For those readers who don’t personally know me, I’m on a quest/streak to level up as a technologist.

Getting around to the point, I’ve been taking self-paced courses in varying forms to learn, apply and share new skills. However, I’ve heard the Coursera Stanford class in Machine Learning taught by Andrew Ng recommended so widely, I’m just going to doggedly sprint a marathon. I’m starting almost a week behind, working a busy job, trying to have a social life, and many other things…but darnit, I’m going to give this class my best shot. Hopefully I finish. 🙂

Okay, over-sharing complete. Let’s jump into a brief introduction of Machine Learning.

Machine learning, according to Andrew Ng (Chief Scientist at Baidu), is the science of getting computers to learn without being explicitly programmed.

Where is machine learning used in our lives?

Machine learning is employed a large number of actors. Here are a few examples:

  • Search engines, such as Bing, use machine learning to process MASSIVE amounts of data to quickly rank web pages in order of relevance, with limited human intervention.
  • Social networks, such as Facebook, use machine learning to recognize your friend’s faces for auto-tagging capabilities.
  • Email providers such as Apple Mail may employ spam filters that continuously learn to protect your inbox, your computer, and most importantly, your sanity.
  • Tech companies such as Amazon use natural language processing (NLP) to create conversational experiences and transactions with skills and services.
  • Entertainment companies such as Netflix use self-learning algorithms to recommend compelling new films and TV shows for those of you who binge watch

Where did machine learning come from?

Machine learning originated from a computer science field known as artificial intelligence. Long seen as a pipe dream from Star Trek (crass and careless reference, I know), machine learning is a practical and attainable segue to artificial intelligence, or machines and programs that contain some degree of self awareness.

This capability is a relatively new, yet a rapidly exploding field that grows as mathematical, statistical, hardware and software capabilities continue to compound and improve.

What follows is a better question still.

Where is machine learning headed?

Machine Learning in the future could look like a few different things (but not limited to this list, obviously!):

  • Predictive and preventative applications in engineering, medicine, and security
  • “Load bearing” performance in complex tasks, such as architecting, coding and programming self-driving cars
  • Coordinate machines and programs that study our behavior at our request and perform tasks, such as performing spring cleaning
  • Assistants or programs that are intelligent – able to optimize and independently solve problems on our behalf

Wrap-Up on the Machine Learning Introduction

That wasn’t so bad was it? We’ll follow soon with a more formal definition of Machine Learning and its various tranches of study. Cheers all.

How to Build Your First Amazon Alexa Skill, Pt. 2: Skill Interface

Welcome back, Alexa geeks! In the last article, we laid the groundwork for making our first Amazon Alexa skill. We covered the concepts and frameworks in Alexa skill development. We also did some work in AWS Lambda to prepare for voice requests  being made to our service.

 

Quick Recap on The Alexa Skill Service / AWS

We’ve slept a few times since covering the concepts and AWS framework, so let’s quickly recap.

2017-03-01-002-Alexa-Skill-Process-Framework

Whenever we use an Alexa skill , our voice data is processed through the hardware device, through the skill interface (what we’re looking at today) for language processing, then converted into text for a program to execute against, and back again. Simple enough, right?

In the last article, we looked at the “last” leg of this process, the skill service and AWS Lambda. Now, we’ll be working with that and the skill interface portion.

2017-03-07-001-Alexa-Skill-Interface-Develpoment-Framework

 

Setting Up the Basic Skill Information

Alright, log into the Amazon Developer Portal and Select the Alexa menu option. Once there, you should have a choice between the Alexa Skills Kit or the Alexa Voice Service. Click on the Alexa Skills Kit link to continue. You’ll want to click, “Add a New Skill”. Because I’ve already developed a skill, Silly Marketing Strategies is already there. However,  most of y’all will probably not have anything else on screen.

2017-03-07-002-Alexa-Skill-Interface-Develpoment-Framework

To start, you’ll want to select Custom Interaction Model, leave the default language to US (unless you’re developing in Espanol?). Type out the name and invocation name for your skill. The name isn’t necessarily important this moment. However, the invocation name will be extremely important!! This will be how users will call your Skill into service.

Note: the astute will notice I’ve deviated from the Space Geek example of the last article. More on that later. 🙂

When you’re ready, click next to proceed, and we’ll tackle the interaction model.

2017-03-07-003-Alexa-Skill-Interface-Develpoment-Framework

 

Setting up the Skill’s Interaction Model

Alright, time for some important things here. For the purposes of the factoid-based game, we’ll only be looking at the Intent Schema and Sample Utterances fields.

First, we need to look at the Intent Schema. Remember the files you downloaded in the first article? You’ll need to go into the SpaceGeek folder, Speech Assets sub folder and open the IntentSchema.json file. I’ve opened it up in Sublime Text 2 briefly, so we can take a quick look at the file. So this is JSON, with intent pairs. Below, we’ve got a pretty simple set, intents for retrieving a fact from the skill, getting help, stopping and cancelling a function. It’s easy right? Ha.

2017-03-07-004-Alexa-Skill-Interface-Develpoment-Intent-Schemas

Quick note: because these intents are proceeded by Amazon – it means they’re built in for Amazon. Enough babbling, copy and paste the contents of this file into the Intent Schema section of Amazon Developer Console.

2017-03-15-005-Alexa-Skill-Interface-Develpoment-Intent-Schemas-Utterances

Above, we’ve pasted in the Intent Schema. Next, we need to provide some sample utterances. Sample utterances are what you think users might say to engage your Alexa Skill. Below, we’ve provided such examples as, “Tell me a Weimaraner fact”.

Next, we need to hook up the Alexa Skill interface we’ve put together with some computing power. Specifically, we need to hook it up to AWS Lambda! (Remember the first article where we did a bunch of Lambda setup?)

If you recall, there was an Amazon Resource Name (ARN) string that we copied and saved to a text file. Retrieve it now and paste into the “Configuration” screen of the Skill interface setup.

2017-03-15-006-Alexa-Skill-Interface-Develpoment-Intent-Schemas-Utterances

Providing the ARN you’ve provided is valid, you should be able to proceed to the next step. Note: we are ignoring the account linking functionality for now. This functionality allows you, for example, to integrate Twitter sharing functionality into your skill by sharing a Tweet.

Next, we’ll move on to the Test tab. Three things (below) to take note of:

  1. Ensure you’ve completed the Interaction Model tab, so you can complete the testing in this tab.
  2. Try / type out key phrases in your skill to hear how they’ll be pronounced, via the Voice Simulator.
  3. Enter some utterances into the Service Simulator to A) make sure your skill is functioning as intended and B) Get a feel for how the end user experience will happen

2017-03-18-006-Alexa-Skill-Interface-Develpoment-Testing-Tab

Once you’ve tested your Skill, proceed to Publishing Information. Here, you’ll need to include the following:

  • Category of your Alexa SKill
  • The relevant Sub Category
  • Optional: testing instructions if your skill requires credentials or other unusual needs. You probably don’t need to include anything for this example
  • Country & Region targeting
  • Short Skill Description
  • Full Skill Description
  • Example Phrases, drawn from your sample utterances, and preceded by the Alexa wake word and skill invocation name
  • Optional: keywords that will help Alexa users find your skill in search
  • Images in 108 and 512 pixel squared dimensions

Below are a couple of screenshots for how I’ve filled out these fields.

2017-03-18-007-Alexa-Skill-Interface-Develpoment-Publishing-Information

2017-03-18-008-Alexa-Skill-Interface-Develpoment-Publishing-Information

Alright, we’re so close! The last field is Privacy & Compliance. For this example, you should be checking “No” to all the radio buttons:

  • No, skill doesn’t allow users to make purchases or spend real money
  • No, skill does not collect personal information from users
  • No, skill does not target children under the age of 13

However, do note that some of these things (except for child targeting) may change as we progress in our Alexa Skill development capabilities.

If desired (or later required by more advanced capabilities) you may specify privacy policy and terms of use URLs.

If you’ve completed the above, you should be good to Save & Submit for Certification!

 

Wrap Up

This is a deceivingly involved process. You will note that neither in the previous article, nor this article, have we changed the original source code for the SpaceGeek / Weimaraner Facts Skill. We’ll cover this in more detail in the next article. Until next time, check out my journey in learning NoSQL and keep an eye out for more content soon!

 

The IN Statement: SQL Fundamentals

Howdy, budding SQL masters! It’s been a minute since the last SQL post! I’ve taken some time to devote to Alexa Skills, some other personal housekeeping, and adopting a dog with my wife!!

Last time we talked SQL, we learned the BETWEEN statement to hone in on the exact values we wanted for a SQL query. Prior to that, we looked at the LIMIT statement to get a small return data sample and also working with ORDER BY to take sorting and filtering into our own hands.

Today we’ll be working with the IN statement.

About the IN Statement

The IN SQL operator is a companion to the WHERE clause to see if a value matches any value in a list of values. Sounds something kind of like VLOOKUP’s, yeah? The syntax of an IN statement might look like:

value IN (value1, value2)

The expression in your SQL query will return true if the value(s) you’ve specified in the query match any value in the list you’re referencing.

Sub-queries: Using the IN Statement with SELECT

The list of values is not limited to a static number or list of strings. That is, you can be a bit more dynamic and free-flowing with your query by getting the value via a SELECT statement.

This is also known as a sub-query. To illustrate, the syntax might look like:

value IN (SELECT value FROM table)

NOT IN: Not Making Fetch Happen

The IN statement has a similar corollary as the BETWEEN statement- a NOT modifier. So, we can take the inverse of a statement. (Drawing on some experience in data analysis, this can just as, if not more useful than the original statement!)

value NOT IN (SELECT value FROM table)

 

Sample SQL Queries for the NOT IN Statement

Alright, let’s run some sample queries using the IN statement! Here’s our first go, working again with the ever-present dvdrental sample database. Below, we are selecting rental information for customers matching only a certain ID cohort (think perhaps loyalty group?), with a few columns and sorting by descending order for the return date:

SELECT customer_id,rental_id,return_date

FROM rental

WHERE customer_id IN (1,2)

ORDER BY return_date DESC;

Below, we see our results returned as intended!

2017-03-13-001-IN-SQL-Statement-Example 

How about our corollary? If we want to add the NOT IN modifier, it’s as straightforward as you would imagine.

FROM rental

WHERE customer_id NOT IN (1,2)

ORDER BY return_date DESC;

2017-03-13-002-IN-SQL-Statement-Example-2

Above, we again receive the desired result. I believe we also get a lesson in the importance of clean data here, as blank values are presented first. Let’s try one more sample query, just for kicks. We’ll switch it up ever so slightly, now working with the payment table.

SELECT city_id,city,last_update

FROM city

WHERE country_id IN (44,82,60)

ORDER BY city ASC;

2017-03-13-002-NOT-IN-SQL-Statement-Example

Above, we’ve demonstrated that we don’t necessarily need to display a column we use in the query (country_id) and have sorted by city name in ascending order.

 

Wrap Up

So why the IN statement? At face value, it seems rather simplistic among SQL clauses we’ve explored thus far. Let’s consider the following.

The IN statement allows us to avoid a messy swath of values courtesy of BETWEEN (if we’re working with a proper large data set), or daisy chain list of a bunch of equals OR statements. Here’s another plus, it’s thought by some that pgAdmin / PostgreSQL will execute the IN query faster than the list of OR statements.

Alright, that’s it for today’s article. If you’re like me and need a quick refresh, revisit our page on Learning SQL that contains all the topics we’ve covered so far.