Ask a Data Engineer: Warby Parker Edition 👓

0
1557
Ask a Data Engineer: Warby Parker Edition 👓

wp-header

Codecademy’s very own Nick Duckwiler (left) and Ryan Tuck from Warby Parker (right) in our office. (📷: Mitch Boyer)

Last month, Codecademy and Warby Parker came together to work on a special Learn SQL from Scratch Capstone Project. It was during this time when I met Ryan Tuck, a Data Engineer at Warby, who played a major part in this partnership. So when he decided to drop by our office for the final QA round, I had to break out my notebook and ask some questions. Enjoy.


Hey Ryan, let’s start off with a question I’ve had for a while — what is a Data Engineer? (Is it similar to a Data Analyst or a Software Engineer?)

At Warby Parker, data engineers are responsible for creating and maintaining the plumbing required to support the data and reporting needs of the business. We use software engineering practices to automate the work of data cleaning, normalizing, and model building so that data is always ready to be consumed by data analysts in every department.

What languages/frameworks do you use at Warby?

On data engineering, we use Python as our general purpose programming language, as do most of the other teams in our Technology department. When it comes to databases, we use PostgreSQL for the majority of our SQL needs, and are beginning to use Amazon Athena and Google BigQuery for some of our larger datasets. We use Looker as our exclusive business intelligence entry point to all of this data.

What are some of the projects you worked on?

I’ve had the privilege of working with a lot of of smart people in every department at our company to help them solve their varied data needs, from reconciling financial data with the Accounting team to automating and modeling standardized performance metrics for our team of over 200 customer experience advisors.

As part of a team of five supporting the data needs of a rapidly growing company, I’ve tried where possible to focus on helping our analysts solve their own problems. This includes helping people learn Python and commit to our codebase, guiding the creation of data models in SQL, and encouraging people to submit pull requests to add features in Looker, our BI tool.

Seeing dozens of otherwise “non-technical” colleagues opening up PRs on a daily basis, and consequently being part of the democratization of tech that we value at Warby Parker, is probably the most rewarding “project” I’ve been a part of.

One project finished recently during our first annual “Hackweek” is called Pipes, which allows anyone at the company to easily move large amounts of data from wherever to wherever (Looker, Google Sheets, PostgreSQL, BigQuery, etc) on a regular cadence, or manually through a simple one-line chatbot interface. The adoption has been overwhelmingly positive and we’re looking to grow this sort of tooling out even more.

“We use software engineering practices to automate the work of data cleaning, normalizing, and model building so that data is always ready to be consumed by data analysts in every department.”

What got you into the data field?

I’ve always been drawn to analytical fields like math, and became pretty proficient in Excel during some internships in college. Once I had learned to program and learned more about data science and its applications in artificial intelligence, I knew that anything I could do to immerse myself in the world of data would be a step in the right direction.

Three and a half years ago, I landed a job as a junior software engineer at Warby Parker not fully knowing what I was in for, but am so glad I got the opportunity to help build tools to support an interesting and ever-changing data-driven culture here.

Where did you learn SQL and Python?

I had a background in C++, and was exposed to Python through an Intro to Data Science course. When Warby Parker hired me onto the Data team in 2015, I had never written a SQL query in my life, but picked it up quickly and within a few months started up internal SQL training classes, which I still teach on a monthly basis.

What does your tattoo say?


The ultimate cheatsheet.

This is Bayes’ Theorem, which is an equation that describes how to update probabilities given new evidence. Two summers ago I worked on building a tool to help predict weekly fantasy football performance. Some colleagues suggested a Bayesian approach would be appropriate, since there aren’t really enough data points in an NFL season to be able to use statistical approaches that require larger datasets, and I’d want to regularly update my predictions after each player’s latest performance.

I did a deep dive into understanding the (simple) math underlying Bayes’ Theorem and came out of that experience with a whole new worldview, understanding my entire knowledge of the world as a big and intricate probabilistic model that I was continuously updating with every experience I ever have. It was pretty transformative, and I figured that was worth a tattoo.

What is a concept in SQL/Python that’s essential to your work?

Donald Knuth said, “Premature optimization is the root of all evil.” I’ve generally found this to be true, and try to live by it in my work. For example, I’ll generally prefer to keep a data model simple by rebuilding it for all time on a daily basis using a single SQL query instead of making a more complicated model that requires iteratively adding to a table, keeping track of state, updated timestamps, when something last ran, etc.

A wise man once said, “Duplicating data makes things go fast,” but databases are already impressively fast to begin with, without implementing anything to improve performance. Ultimately, I almost always approach a problem thinking about optimizing for my time over machine time, for readability over performance, and for introducing as little cognitive overhead as is required by the problem at hand. Only once performance issues or readability issues present themselves will some code be worth a rewrite.

Last question! Since you wrote Warby Parker’s internal SQL training courses, I know there gotta be some inner Curriculum Developer in you. Can you teach a SQL concept in 2 minutes?

Sure! Have you ever written a query that yields some result set and you think, “I’d love to query the stuff I just produced like it was a table?” Enter the WITH clause.

Suppose I have a mega query that gives the transaction summaries:

select
    transactions.date as transaction_date,
    sum(items.price) as total_cost,
    count(*) as number_of_items
from
    transactions
inner join
    customers
    on
    customers.id = transactions.customer_id
inner join
    transaction_items
    on
    transactions.id = transaction_items.transaction_id
inner join
    items
    on
    items.id = transaction_items.item_id

Using WITH, I can create a temporary table within my query that I can SELECT from and treat it just like a regular old table.

I will put everything from the previous query in a parentheses and use WITH to give it the name transaction_summaries.

Then I’ll apply the date and customer filtering down below for a more readable query, to separate out all the JOIN logic from the actual WHERE filters that I want to apply on that data.

with transaction_summaries as (
  select
      transactions.date as transaction_date,
      sum(items.price) as total_cost,
      count(*) as number_of_items
  from
      transactions
  inner join
      customers
      on
      customers.id = transactions.customer_id
  inner join
      transaction_items
      on
      transactions.id = transaction_items.transaction_id
  inner join
      items
      on
      items.id = transaction_items.item_id
)

select 
        * 
from 
        transaction_summaries
where 
        first_name = 'beyonce'
        and 
        transaction_date > '2018–01–01'
order by 
        total_cost desc
limit 
        5

If you’re familiar with subqueries, this does a similar thing but makes the SQL far more readable, even if your query isn’t quite as performant as it would have been. This is essentially an implementation of the mantra “Don’t Repeat Yourself” that’s common in the world of programming.

Incredible. And love the SQL styling! 😍


Huge shout out to Ryan and the whole Warby Parker team for making this partnership happen. Special hat tips for behind-the-scenes support from:

  • Lon Binder, Chief Technology Officer, Warby Parker
  • Maddie Tierney, Executive Assistant, Warby Parker
  • Kayla Robbins, Executive Assistant, Warby Parker
  • Kaki Read, Senior Communications Manager, Warby Parker
  • Isabel Seely, Senior Brand Manager, Warby Parker

It’s been an absolute pleasure. And of course, the fam at Codecademy. You know who you are. Couldn’t do it without you.