Posts

DSC_0050 Zach Doty AS PostgreSQL Statement Cover Photo

JOINS Foundations: The AS PostgreSQL Statement

Intro to JOINS: the AS PostgreSQL Statement

What’s up SQL people? We’re back, and better than ever, in our foray to learn PostgreSQL. Since we’ve completed some intermediate skills challenges & learned GROUP BY, it’s time to examine JOINS.

Before we examine JOINS, there’s a key foundation piece we must cover first: the AS statement in PostgreSQL. Let’s jump in.

 

About the AS Statement

The AS statement in PostgreSQL enables us to rename sections of a table or table columns with a temporary alias, or almost a variable, for manipulation.

It’s a simple statement, so let’s see it in action.

 

1. Basic AS Statement Example

Our most basic example is a basic query where perhaps a column wasn’t named to our liking. Consider the following.

SELECT rental_rate AS film_cost

FROM film

LIMIT 10;

2017-08-30-001-AS-Statement-Syntax-Example-1

Great for an introductory example, but not inherently useful. Read on as we apply the AS statement more deeply.

2. Semi-Intermediate AS Statement Example

Let’s provide an example that’s a bit more engaged. Example, if we use aggregate functions, the column output doesn’t have a clean name attached to it. But no longer! The AS statement allows us to have the summation output in a GROUP BY statement to something we’ll recognize.

 

SELECT customer_id, SUM(amount) AS customer_ltv

FROM payment

GROUP BY customer_id

ORDER BY customer_ltv DESC

LIMIT 7;

2017-08-30-002-AS-Statement-Syntax-aggregate-example-2

 

This is something more useful for intermediate PostgreSQL work!

 

Wrap Up

We aren’t spending much further time here since this is a simple application and the JOINS statement is the function we’re truly after. If you’re just joining this series, check out our home page on how to learn PostgreSQL.

DSC_0012 Zach Doty PostgreSQL cover photo

PostgreSQL Intermediate Skills Test

Hey there, SQL-ites. Time for another (yes, another!) PostgreSQL skills challenge. It’s our last stop before moving into more intermediate and advanced PostgreSQL skills, such as JOINS.

If you’re just joining this series on SQL, we previously covered the HAVING statement & GROUP BY. Here’s the home page of our journey in learning PostgreSQL. Today, we do a more rounded knowledge check of older PostgreSQL statements, such as SELECT WHERE, COUNT & ORDER BY.

We have 3 problems, and provide the winning code for each, based on our sample database. In the past, I’ve given more explanation to the problems, but because I’m trying to get some traction myself in moving forward, we’ll only have problem & solution laid out here.

Alright, let’s go!

1. Return the customer IDs of customers who have spent at least $110 with the staff member who has an ID of 2.

The answer should be customers 187 and 148.

SELECT customer_id, SUM(amount)
FROM payment
WHERE staff_id=2
GROUP BY customer_id
HAVING SUM(amount) >110;

 

2. How many films begin with the letter J?

The answer should be 20.

SELECT COUNT(*) FROM film

WHERE title LIKE ‘J%’;

 

3. What customer has the highest customer ID number whose name starts with an ‘E’ and has an address ID lower than 500?

The answer is Eddie Tomlin.

SELECT customer_id, first_name, last_name, address_id
FROM customer
WHERE address_id <500 AND first_name LIKE ‘E%’
ORDER BY customer_id DESC;

 

We’ll see you on the other side soon, for some JOINS awesomeness. 🙂

DSC_0300 Zach Doty Cover Photo for HAVING PostgreSQL Clause

The HAVING Clause in PostgreSQL

Howdy SQL-ites! Welcome back to our next step in learning SQL. After a long hiatus, we recently stumbled our way through some intermediate SQL statement challenges, namely on GROUP BY.

Today, we’re back in the flow, and learning about the HAVING clause in PostgreSQL.

 

About the HAVING Clause in PostgreSQL

The HAVING clause is most often seen as a helper with the GROUP BY statementGROUP BY was pretty awesome, right? How might we improve upon it?

Let’s consider two concepts:

  1. Our occasional parallels of PostgreSQL to Excel, and,
  2. Our previous intermediate challenge example.

Relating PostgreSQL’s GROUP BY to Excel

If we hearken back to our first encounter with GROUP BY, we compared GROUP BY to pivot tables. Specifically, if we have a big data table that records data of recurring themes, e.g., you customer base, it can be helpful to manipulate as aggregated and assembled, vs. raw.

However, GROUP BY is only an intermediate step in data analysis. If we think about our final challenge in the last post, we had to limit the results to 5. Even if we group data, it’s neither segmented nor useful toward analysis.

Thus, we need an additional method for winnowing down our GROUP BY results.

 

Meet the HAVING Clause

So, about the HAVING clause. It’s most often used in tandem with GROUP BY to filter out data rows that don’t meet a certain condition. Think of it as similar to the WHERE function, just an accompaniment to GROUP BY.

 

Let’s take a look at basic syntax:

SELECT column, aggregate(column_2)

FROM table

GROUP BY column

HAVING condition;

 

Off the bat, we should this is extremely familiar if we’ve covered GROUP BY. The differentiation is the additional HAVING condition at the end. This condition could be something like, HAVING sum(column_2) less than 50.

The Difference Between WHERE and HAVING

Some of you sharp folks may want to know, “what’s the difference between WHERE and HAVING?” That would be an excellent question. Here’s the difference:

The WHERE clause sets conditions on individual rows, before the GROUP BY clause has been applied. The HAVING clause specifies conditions on grouped rows, created by the GROUP BY clause.

Let’s run some examples.

 

Using the HAVING Clause in PostgreSQL

Here’s our first example, very similar to our previous skills challenge:

SELECT customer_id, SUM(amount)
FROM payment
GROUP BY customer_id
HAVING SUM(amount) > 150;

Above, we have added another segmentation layer with the HAVING clause.

You can see we’re pulling information about how much our customers have paid us. Further, we specify that we only want to see customers with a lifetime purchase amount of greater than $150.

 

Let’s look at another example of the HAVING clause. Say for example, we want to know which store has served the most customers. Below, we’ll execute the following code:

SELECT store_id, COUNT(customer_id)
FROM customer
GROUP BY store_id
HAVING COUNT(customer_id) >275;

Above, we’ve selected both the store and customer ID columns from the customer table. Further, we group by the store ID, because we want store-level data, but we only want to see the stores which have served more than 275 customers. Below, we can see only store has done so. 🙂

 

Combining Usage of the WHERE & HAVING Clauses

We mentioned earlier the WHERE and HAVING clauses are different, somewhat in function, but mostly in order of execution. Here’s what we didn’t say: you can actually use them in tandem. There is a great theoretical use case for this, unfortunately our sample database is a bit small, but here goes.

Let’s think about the film table (used in previous examples.) Perhaps we want to analyze the films, by rental rate, but only films with certain ratings. For example, perhaps we’re no longer interested in carrying NC-17 films, but still want to get an aggregated view of on average, how much each films rents for, by rating. Additionally, we want to see which ratings, if any, have an average rental rate of less than $3.

Here’s the code we would use:

SELECT rating, ROUND(AVG(rental_rate),2)
FROM film
WHERE rating IN (‘R’,’G’,’PG’,’PG-13′)
GROUP BY rating
HAVING AVG(rental_rate)<3;

 

 

Wrap Up

Alright, that concludes our section for today. It feels good to be back. 🙂

Hopefully you found this section on the HAVING PostgreSQL statement useful. If you need to backtrack or further explore, here are some useful/recent links:

DSC_0006 Zach Doty Intermediate GROUP BY SQL Skills Challenge Cover Photo

Intermediate SQL Skills Challenge: GROUP BY

Hey there SQL-ites! Wow, it’s been awhile since I’ve last posted. Work has been crazy busy again, and just life in general. Sure feels good to be back, learning again! I’m daringly dropping straight back into my learnings from where we left off…in April!? Crazy.

Anyway, let’s get back to brass tacks. Before life and work got really crazy for me, we covered:

Another quick recap note, we’ve been using the famous DVD rental training database for our work. On to the good stuff.

 

GROUP BY SQL Skills Challenge #1

Let’s say it’s time for quarterly reviews, who doesn’t love evaluations? ¯_(ツ)_/¯ Implement your knowledge of GROUP BY against the following problem:

We need to figure out how much revenue our employees have realized from DVD rentals, and how many transactions each handled.

 

GROUP BY SQL Skills Answer #1

Let’s talk through the problem and dissect it before presenting code. “A problem well-stated is a problem-half solved” – paraphrase of some smart person.

  1. We’re talking about revenue, so we’ll need to be dealing with the payment table.
  2. We’re evaluating employees (staff), SUM of revenue, and COUNT of transactions.
  3. If we’re aggregating this data, we’re GROUPing BY employee.
  4. We’re also ORDERing the employees BY who handled the most transactions.

That said, here’s the code:

SELECT staff_id, SUM(amount), COUNT(amount)
FROM payment
GROUP BY staff_id
ORDER BY COUNT(amount) DESC;

…with our results!

 

GROUP BY SQL Skills Challenge #2

Well done on your challenge! Here’s the second:

It’s time to do inventory, flashbacks of retail and restaurant wonder for all of us. 🙂

In the name of forecasting and planning, we need to figure out the average replacement cost of our movies, by rating.

 

GROUP BY SQL Skills Answer #2

Ok, let’s walk through our problem.

  1. We need to use the film database here, since we’re gathering information on the movies.
  2. We’re GROUPing our films BY rating
  3. We’re using an aggregate function to determine the average replacement cost of each rating.

Drum roll, here’s a winning code snippet:

SELECT rating, AVG(replacement_cost)
FROM film
GROUP BY rating
ORDER BY AVG(replacement_cost) DESC;

With the output:

 

Are there more challenges we should be covering? Yes. However, I’m trying to do better about getting more sleep these days. Unlike past SQL articles, it’s still (barely) before midnight. So we’ll take a quick breather, possibly update this article, but definitely keep moving forward. Cheers!

 

Update- 8/20/2017 —

GROUP BY SQL Skills Challenge #3!

Alright SQL-ites. After getting some rest, I’ve regrouped a few days later to cover the last challenge:

From our database, we want to get the customer ID’s of the top 5 customers, by money spent, so we can send them a thank you for their business!

 

GROUP BY SQL Skills Answer #3

Let’s diagnose the problem.

  1. If we’re gathering revenue information, we’ll need to use the payment table.
  2. If we’re getting the top spending customers, we’ll need to GROUP all transactions BY customer ID
  3. To see the top 5 paying customers, we’ll want to ORDER the results BY the SUM of payment amount.

Considering the above, here’s our code:

SELECT customer_id, SUM(amount)
FROM payment
GROUP BY customer_id
ORDER BY SUM(amount) DESC
LIMIT 5;

DSC_0069 Zach Doty Cover Photo for What is data science

What is Data Science?

Welcome back, campers! It’s been a minute (read:months) since I’ve last posted, and we’re back! (For now.)

Today’s topic, data science: supposedly the latest exploding field, critical to every enterprise.

Why is it important and relevant? The rise of big data has created a relatively untapped treasure trove of insight. However, it’s undeveloped! Further, the tapping of this insight requires a blended skill set which is currently in short supply in the market: the data scientist.

Who and what is a data scientist?

A data scientist is someone who finds new discoveries in data. They investigate hypotheses and look for meaning and knowledge within the data. They visualize the data by creating reports and looking for patterns. What distinguishes a data scientist from a traditional business analyst is the use of algorithms. Algorithms are one of the fundamental tools for data scientists. This requires mathematics knowledge, computer science savvy and domain knowledge.

What does it mean to be a data scientist?

A data scientist may handle open-ended questions such as, “Which customers are more likely to churn?” The data scientist would gather all the data, and run algorithms to find dependable patterns to improve the situation. Seems straightforward, yes? However, there are a range of misconceptions about data science and data scientist For example,  a data scientist may not necessarily be a developer-only or business intelligence analyst-only.

A data scientist will be able to combine both technical know-how and business domain knowledge into mathematics and statistics for maximum effect. That being said, true data scientists are extremely difficult to find and train. However, it may be possible to become a data scientist without expensive and time-consuming degrees, via focused tools and application training.

An oversimplified Venn diagram showing the makeup and value of a data scientist

An oversimplified Venn diagram showing the makeup and value of a data scientist

More Notes on Data Science

When considering data science from a managerial perspective, it’s important to understand the current broad allocation of the average data scientist’s time. An estimated 60-705 of a data scientist’s time is spent assembling and cleaning data, tasks which could be delegated to technical specialists, data integration specialists and so forth. (For example, text mining, SQL queries and so forth.)

If you’ve followed my site and blog lately, you’ll noticed I’ve lapsed a bit on posting. I’m trying to get back into sharing my education again, so stay tuned. Things have just been busy lately. 🙂

DSC_0045 Zach Doty Cover Photo Group By SQL Statement Function

GROUP BY SQL Statement

Introducing the GROUP BY SQL Statement in PostgreSQL

‘Ello SQL geeks! Welcome back to our SQL learning journey. We left off with a beginner SQL skills challenge and the aggregate SQL functions: MIN, MAX, AVG and SUM. Today we’re looking at the GROUP BY statement. We’ll learn about this function in PostgreSQL and walk through usage of this handy SQL statement.

 

About the GROUP BY SQL Statement/Clause

From my simple understanding, GROUP BY functions like a hybrid of the following:

  • SELECT DISTINCT keyword (If used without an aggregate function like SUM), and,
  • An Excel pivot table, rolling up aggregate figures (Count, Sum, Average, etc.) into unique rows

If you’re familiar with Excel Pivot tables, then you’ll recognize here the power of this function.

 

Let’s take a look at some examples to clarify.

 

First Look at Using the GROUP BY Function

To better illustrate the power of GROUP BY, we’ll first show its usage without aggregate functions. Consider the following:

If we query the address table of our sample database with a generic SELECT * FROM address; we get back an atrocious 605 rows of data. Aggregate and useless!

2017-04-27-001-GROUB-BY-Generic-SELECT-Query-Pitfall

In contrast, if we call the GROUP BY function, we’ll get back a cleaner output, with fewer rows – only unique values returned. While this is an incremental improvement to analyzing data, there’s much left to be desired.

2017-04-27-002-GROUP-BY-Without-Aggregate-Function

 

What’s missing here? How about that pivot table-esque functionality? This is where the power of using GROUP BY with aggregate functions gets awesome.

Using the GROUP BY SQL Statement with Aggregate Functions

As with most analysis, a single data point or data series rarely holds significant insight value on its own. Let’s drive home that point by leveraging the GROUP BY statement with the SUM aggregate function. Below, we compare the ratings of films in our sample database in aggregate by replacement cost. Perhaps this could serve in-store strategies for loss prevention.

SELECT rating, SUM(replacement_cost)
FROM film
GROUP BY rating;

2017-04-27-003-GROUP-BY-SUM-Aggregate-Function

If we extend this functionality to more real world examples, we could use the following for GROUP BY:

  • Grouping page-level / URL data to roll up clickstream analytics data
  • Large scale analysis of CRM data for customer segmentation analysis
  • Analyzing returns for financial data

The list could (and I’m sure it does) go on.

Let’s take this one step further and reduce potential future workload, by building sort functionality into our query. Below, we add a line to get most expensive ratings to least.

SELECT rating, SUM(replacement_cost)
FROM film
GROUP BY rating
ORDER BY SUM(replacement_cost) DESC;

2017-04-27-004-GROUP-BY-SUM-ORDER-BY-SUM

By the looks of this, no need to guard Land Before Time 8. 🙂

Extending our lesson: you can use the COUNT, AVG, and other aggregate functions to analyze as desired.

 

Wrap Up

Alright, this was a relatively gentle introduction into more advanced SQL functions. GROUP BY is a rather critical function, so in our next article, we’ll be doing yet another skills challenge. Joy!

Feel free to catch up on our other articles that help you learn PostgreSQL. Also, check out some how-to’s on developing Amazon Alexa skills, and a new series on getting started with Machine Learning. As always, please share with your colleagues and share thoughts in the comments below. Cheers.

SQL Aggregate Functions: Min, Max, Avg and Sum

Welcome back, SQL nerds! We’re back in action on the journey of learning SQL, after a beginner PostgreSQL skills challenge.

We’re reaching the end of basic functions and queries with this article. Based on what we’ve learned so far, we can do basic counting, filtering, sorting and pattern matching against PostgreSQL databases.

If you’re just tuning in, here’s the page on how to learn SQL, and the previous SQL article on the LIKE Statement.

Okay, enough jabber. Let’s jump in. The aggregate functions of MIN, MAX, AVG and SUM are our turning point into more complex SQL queries that involve concepts such as GROUP BY, among others.

At the same time, the functions on their own aren’t super complicated. Because we’re ass-u-ming you’re familiar with the general concepts of minimum, maximum, etc., we’re going to forgo conceptual and syntax explanations for demos.

 

AVG Aggregate Function

As we level up in SQL, we’re going to do less and less explaining / handholding / screenshots. That said, we’re going to explore our DVD rental data set for a table with a nice numerical component that would make good use of the functions.

We did a SELECT * FROM film LIMIT 15; to get a peek at the columns. For the purposes of this exercise, the replacement_cost column will do nicely.

To get the AVG:

SELECT AVG(replacement_cost) FROM film;

2017-04-17-001-AVG-Aggregate-PostgreSQL-Function

Using ROUND for Decimal Place Control

You’ll notice in the example above that we got 3 decimal places on what’s supposed to be a dollar amount. How do we fix that? Glad you asked. Meet the ROUND function.

We pass in the target value (average of replacement_cost) and mandate the number of decimal places we’d like returned. Below:

SELECT ROUND( AVG(replacement_cost), 2) FROM film;

2017-04-17-002-ROUND-for-AVG-Aggregate-PostgreSQL-Function

 

MIN Aggregate Function

To find the minimum value in a given column:

SELECT MIN(replacement_cost) FROM film;

2017-04-17-004-MIN-Aggregate-PostgreSQL-Function

 

MAX Aggregate Function

To find the maximum value in a given column:

SELECT MAX(replacement_cost) FROM film;

2017-04-17-003-MAX-Aggregate-PostgreSQL-Function

 

SUM Aggregate Function

To find the maximum value in a given column:

SELECT SUM(replacement_cost) FROM film;

2017-04-17-005-SUM-Aggregate-PostgreSQL-Function

 

Wrap-Up

Alright, that was a relatively quick article! Hopefully this was a reprieve from more involved sections in the past. We should find that as we continue to strengthen our core PostgreSQL  capabilities, these articles and our SQL queries should be easier and easier. If you found this article interesting, you might enjoy a new section on how to get started in machine learning. Cheers.

 

 

 

DSC_0007 Zach Doty Cover Photo for Beginner PostgreSQL Skills Challenge

Beginner SQL Skills Challenge!

Howdy, SQL geeks. Hope this post finds you swell!

Over the past few months, we’ve gained a ton of ground in learning SQL, or at least I have. 🙂

Let’s take a moment to:

  1. Test our knowledge of SQL skills learned thus far
  2. Start seeing SQL queries less as statements of code, and more as real world business challenges

In this article, we’ll have a recap plus three sections:

  • Recap of the training database we’ve been working with
  • General statements of each business problem
  • Hints and thoughts about how to approach each problem
  • Solutions to each problem

 

Recap: Our Training Database

Our training database is a best/old faithful. We’ve been using the surprisingly popular DVD rental database in a .tar file for our practice database.

Contained within this databases are various tables with fictitious information, including: customer information, film production information, business/pricing information and so forth.

In our challenges, we’ll execute various SQL queries to extract pieces of insight for business tasks. It’s assumed in this article that you’ve installed PostgreSQL via pgAdmin and have followed this article series so far, using the DVD rental training database.

Without further ado, let’s begin.

 

The SQL Challenges

Alright, here we go:

  1. How rentals were returned after July 17, 2005?
  2. How many actors have a last name that starts with the letter A?
  3. How many unique districts are represented in the customer database?
  4. Can you return the actual list of districts from challenge #3?
  5.  How many films have a rating of R and a replacement cost between $5 and $15?
  6. How many films have the word Truman somewhere in the title?

 

How to Approach the Challenges

Right, then. In this section, we’ll add some color commentary (read: hints) to our challenges. This should help you understand the mechanics of the solution, while ensuring you can’t see the answers all in one screen. 😉

 

Challenge 1: Rentals Returned After July 17, 2005

As with all challenges, a problem well stated is half (or more) solved. So let’s look at the high levels of the ask, and work our way down. We need information on rentals, so this means we’ll probably be querying the rentals table.

We would want to first examine the rentals table in a concise manner by doing:

SELECT * FROM rental LIMIT 5;

Once you’ve surveyed the table, we really only need one column returned (pun not intended) and we only want the sum figure of returns, where (HINT!!) the return date was after (think a logical operator here) July 17, 2005.

 

Challenge 2: Actors That Have a Last Name Beginning with “A”

Like our first challenge, let’s work from the “top down”. We need actor information, so querying the actor table would be a great place to start. Similar to last time, we need a count of values matching a condition. The difference versus challenge #1  is we need to find match a pattern like or such as an actor’s last name that begins with the letter A.

 

Challenge 3: Number of Unique Districts in the Customer Database

The title and description could cause some confusion here. You may need to do some basic SELECT * FROM table_name LIMIT X; queries to make sure you’ve got the right table. Once you do, we’re looking for an amount of distinct values in the database. Order of operations matters.

 

Challenge 4: Returning the Actual Lists of Districts from Challenge #3

Not much to hint at here – getting challenge #3 is the key here. You’ll really only be simplifying the correct query in challenge #3 to get the correct answer here.

 

Challenge 5: Cheap, (Mildly) Naughty Films

This one might be the longest query yet in this challenge. So we’re looking up film information, thus should know which table to query. We’re returning a value where a certain rating must be returned, and (HINT) we need to layer in one more lens of qualification. That lens dictates we specify a range between (cough, hint!) two values.

 

Challenge 6: Where in the World are Films Containing “Truman”?

This challenge is more of a recency test than retention of older concepts. You’ll need to employ pattern matching again for this business challenge/query to find titles that have some match like Truman in the title.

 

 

Challenge Solutions

Is that your final answer? Below are the queries, with screenshots of what I did.

 

Solution 1: Rentals Returned After July 17, 2005

SELECT COUNT(return_date) FROM rental
WHERE return_date > ‘2005-07-17’;

2017-04-15-001-Challenge-1-Sol-Count-From-Where

 

Solution 2: Actors That Have a Last Name Beginning with “A”

SELECT COUNT(*) FROM actor
WHERE last_name LIKE ‘A%’;

2017-04-15-002-Challenge-2-Sol-Count-From-Where-Like

 

Solution 3: Number of Unique Districts in the Customer Database

SELECT COUNT(DISTINCT(district)) FROM address;

2017-04-15-003-Challenge-3-Sol-Count-Distinct

 

Solution 4: Returning the Actual Lists of Districts from Challenge #3

SELECT DISTINCT(district) FROM address;

2017-04-15-004-Challenge-4-Sol-Select-Distinct

 

Solution 5: Cheap, (Mildly) Naughty Films

SELECT COUNT(*) FROM film
WHERE rating = ‘R’
AND replacement_cost BETWEEN 5 AND 15;

2017-04-15-005-Challenge-4-Sol-Count-From-Where-Between-And

 

Solution 6: Where in the World are Films Containing “Truman”?

SELECT COUNT(*) FROM film
WHERE title LIKE ‘%Truman%’;

2017-04-15-006-Challenge-4-Sol-Count-Where-Like-Wildcard

 

Wrap Up

Well done for completing these challenges! You shall indeed pass. 🙂

Soon, we’ll be covering aggregate SQL functions, such as MIN, MAX, AVG and SUM.

If you’re just joining us, here’s a running list of our articles so to date (4/16/2017):

 

DSC_0064 Zach Doty Unsupervised Machine Learning Intro Cover Photo

Unsupervised Learning Introduction: Machine Learning Essentials

Howdy, machine learning students! Today we’re going to introduce the concept of unsupervised machine learning algorithms.

Quick Recap: Supervised Learning

Before we jump in, let’s quickly recap our last article introducing supervised machine learning algorithms. This will give us the appropriate context for unsupervised learning.

In supervised machine learning problems, we supply pre-labeled data to the algorithm. By supplying data that’s already correctly labeled, we ask the algorithm to further predict (regression) or label (classification) new data.

2017-04-11-004-Multiple-Input-Classication-Machine Learning

 

Unsupervised Machine Learning = Unlabeled Data

The most immediate and prominent difference  for unsupervised learning is the data. Above, we gave the algorithm “a boost” by supplying the intended “right” answers in the data. Below, in an unsupervised machine learning problem, there are no right answers…yet.

2017-04-15-001M-Unsupervised-Machine-Learning-Problem-Data

We’ve supplied the algorithm with data in the problem, but it’s provided without labels or “answers”. We are mandating that the algorithm discover structure and infer patterns/labels on its own. We could also compare the above example to a clustering problem.

So in unsupervised learning, we supply a large amount of unlabeled data, without explicitly identified form or structure. We ask the algorithm to come up with ideas of structure and segmentation on its own.

Some additional applications of unsupervised learning could include:

  • Market segmentation of massive transaction data
  • Large scale social networking data
  • Astronomical data analysis
  • Large scale market data
  • Mass audio/voice analysis
  • Large scale gene clustering

 

Wrap Up

That was a bit of a quick one! The challenge with some these technical subject matter areas is sometimes we have limited room to run before going off into the technical weeds. This is one of those areas. Next, we’ll be covering some key concepts in the areas of machine learning model representation, cost function and parameter learning. Don’t worry too much about those yet, we’ll take it step by step. 🙂
As always, feel free to follow my other journeys of learning PostgreSQL, learning how to develop Amazon Alexa Skills, learning how to get started in algorithmic trading, JavaScript for beginners…and more to come soon! Cheers.

DSC_0013 Zach Doty Cover Photo for Introduction to Machine Learning

An Introduction to Machine Learning

Howdy! This is an abrupt interruption to our regularly scheduled programming of SQL lessons, Amazon Alexa Skill development and Algorithmic trading. For those readers who don’t personally know me, I’m on a quest/streak to level up as a technologist.

Getting around to the point, I’ve been taking self-paced courses in varying forms to learn, apply and share new skills. However, I’ve heard the Coursera Stanford class in Machine Learning taught by Andrew Ng recommended so widely, I’m just going to doggedly sprint a marathon. I’m starting almost a week behind, working a busy job, trying to have a social life, and many other things…but darnit, I’m going to give this class my best shot. Hopefully I finish. 🙂

Okay, over-sharing complete. Let’s jump into a brief introduction of Machine Learning.

Machine learning, according to Andrew Ng (Chief Scientist at Baidu), is the science of getting computers to learn without being explicitly programmed.

Where is machine learning used in our lives?

Machine learning is employed a large number of actors. Here are a few examples:

  • Search engines, such as Bing, use machine learning to process MASSIVE amounts of data to quickly rank web pages in order of relevance, with limited human intervention.
  • Social networks, such as Facebook, use machine learning to recognize your friend’s faces for auto-tagging capabilities.
  • Email providers such as Apple Mail may employ spam filters that continuously learn to protect your inbox, your computer, and most importantly, your sanity.
  • Tech companies such as Amazon use natural language processing (NLP) to create conversational experiences and transactions with skills and services.
  • Entertainment companies such as Netflix use self-learning algorithms to recommend compelling new films and TV shows for those of you who binge watch

Where did machine learning come from?

Machine learning originated from a computer science field known as artificial intelligence. Long seen as a pipe dream from Star Trek (crass and careless reference, I know), machine learning is a practical and attainable segue to artificial intelligence, or machines and programs that contain some degree of self awareness.

This capability is a relatively new, yet a rapidly exploding field that grows as mathematical, statistical, hardware and software capabilities continue to compound and improve.

What follows is a better question still.

Where is machine learning headed?

Machine Learning in the future could look like a few different things (but not limited to this list, obviously!):

  • Predictive and preventative applications in engineering, medicine, and security
  • “Load bearing” performance in complex tasks, such as architecting, coding and programming self-driving cars
  • Coordinate machines and programs that study our behavior at our request and perform tasks, such as performing spring cleaning
  • Assistants or programs that are intelligent – able to optimize and independently solve problems on our behalf

Wrap-Up on the Machine Learning Introduction

That wasn’t so bad was it? We’ll follow soon with a more formal definition of Machine Learning and its various tranches of study. Cheers all.