Areas of Expertise

Use Case Discovery:Identify the most relevant opportunities for your business.

Use Case Validation: Analyze what solutions can work, and how they fit into your overall plan.

Corporate Training: Training courses to help your business grasp the fundamentals of data science, deep learning, and AI.

How We Work

We use an internally-developed and industry-proven framework, with the goal of quickly delivering impact and value.

Our projects and workshops are:

• Fully remote

• Weeks, not months

• Collaborative in nature

• Business-centric

Course Modules

Audience

This course is suitable for a general audience including decision makers, business people, managers, system architects/engineers, business/systems analysts, and data scientists.

Prerequisites

No specific technical skills are expected.

Duration

1 Day

Learning Outcomes

The goal of this course is to provide a survey of current data analysis techniques, capabilities, and approaches enabling students to integrate advanced analytics and data science techniques into their organization. A brief history of data analytics and data science techniques are discussed including common data challenges. Common machine learning algorithms including support vector machines, decision trees, and random forests are explained. Applications of unsupervised learning including clustering, dimensionality reduction, and anomaly detection are explored.

Course Topics

  • A brief history of computing and data science
  • Setting up data for a data science project
    • The 3 v’s of big data: volume, velocity, variety
    • Structured vs. unstructured data
    • Data acquisition
    • Data quality considerations
    • Data cleaning
  • Overview of machine learning
    • What is machine learning?
    • Supervised learning problems and algorithms
    • Unsupervised learning and algorithms: clustering, dimensionality reduction, anomaly detection
    • Current and emerging trends of data analytics

Audience

This course is suitable for a general audience including decision makers, business people, managers, system architects/engineers, business/systems analysts, and data scientists.

Prerequisites

No specific technical skills are expected.

Duration

1 Day

Learning Outcomes

In this course, participants will learn the Python environment including the use of Jupyter Notebooks, basic Python syntax, Numpy, and data cleaning and aggregation with Pandas. The course is largely framed in helping students translate their Excel tasks into Python in order to introduce automation, increase robustness, decrease error, and empower more in depth and sophisticated analyses.

Course Topics

  • Introduction to basic Python syntax
    • Why we code and its usefulness
    • Basic control flow, e.g. “for” loops and “if” statements
    • The object-oriented paradigm
    • Python data structures
  • Introduction to Numpy and array manipulations
  • Introduction to Pandas
    • The Pandas DataFrame
    • I/O pathways
    • Descriptive statistics in Pandas
    • Merging, concatenating, and joining data
    • Data imputation
    • How to filter data in Pandas and filtering strategies
    • GroupBy operations and aggregations

Audience

This course is suitable for a general audience including business people, managers, system architects/engineers, business/systems analysts, and data scientists who need to learn SQL for their role.

Prerequisites

  • An understanding of basic programming and computer science concepts.
  • Hands-on experience with data manipulation

Duration

1 Day

Learning Outcomes

In this course, participants will learn:

  • About relational database management systems and relational tables
  • The types of SQL statements and table operations
  • How to build SQL queries, including complex multi-level nested queries
  • How to apply aggregate functions in querying
  • How to use Group By operation in queries
  • The types and execution of join and union operations
  • Advanced topics including indexing, triggers, date manipulation, and window functions

Course Topics

  • What is SQL?
  • (R)DBMS
  • What you get with RDBMS (object types)
  • Types of database systems
  • Types of SQL statements
  • Building Queries
  • Joins & Unions
  • Subqueries
  • HAVING
  • Indexing
  • SQL & Python
  • Triggers
  • Date & String Manipulations
  • Window Functions
  • Describing a database & SQL-like frameworks

Audience

This course is suitable for software developers, data scientist, and others who have some programming experience.

Prerequisites

  • Familiarity with the Python programming language.
  • An understanding of basic programming and computer science concepts, including program flow, loops, and functions.

Duration

1 Day

Learning Outcomes

In this course participants will learn:

  • What machine learning is and how it is different from traditional programming approaches to solving problems.
    • We will cover an overview of the field as a whole and the subtypes of problems studied in machine learning
  • How to evaluate machine learning models.
    • We will go over common evaluation metrics and discuss when and why each one is appropriate.
  • How to set up data for training and testing, including different ways of splitting the data and what pre-processing steps are commonly needed.

Course Topics

  • The machine learning workflow
  • How features are used in machine learning
  • The difference between supervised and unsupervised learning
  • The difference between regression and classification
  • Evaluation metrics for regression and classification
  • Introduction to scikit-learn
  • Common cleaning and pre-processing steps
  • Splitting Data in to train and test sets
  • Error Analysis

Audience

This course is suitable for a general audience including business people, managers, system architects/engineers, business/systems analysts, and data scientists.

Prerequisites

Minimal exposure to the Python programming language

Duration

1 Day

Learning Outcomes

In this course participants will learn:

  • How machine learning differs from standard programmed solutions.
  • The different types of machine learning algorithms, what problems they are meant to solve, and how to evaluate the different algorithms.
  • What steps are needed to begin work on a machine learning problem, including cleaning and other pre-processing steps.

 

  • An overview of supervised learning algorithms and examples of how to use them in the scikit-learn library.
  • An overview of the unsupervised learning algorithms and examples of how to use them from scikit-learn library.
  • How to integrate machine learning models in an enterprise environment, including an overview of complementary technologies.

Course Topics

  • What is Machine Learning?
  • Supervised and Unsupervised Learning
  • Evaluation of Machine Learning Algorithms
  • Pre-processing steps
  • Linear Regression
  • Logistic Regression
  • Decision Trees
  • Random Forest
  • K-means Clustering
  • Gaussian-based clustering
  • Dimensionality Reduction with PCA
  • DBSCAN for Anomaly Detection
  • Local Outlier Factor
  • Isolation Forest
  • Using Flask to deploy models
  • Using Spark to run models over big data

Audience

This course is suitable for software developers, data scientists, and project managers with some programming knowledge.

Prerequisites

  • Familiarity with the Python programming language.
  • An understanding of basic programming and computer science concepts, including program flow, loops, and functions.
  • Basic Familiarity with the pandas Python library.

Duration

1/2 Day

Learning Outcomes

In this course, participants will learn:

  • What supervised learning is and what types of problems it can solve.
  • Common supervised learning algorithms, their differences, and how to use these algorithms.
  • Common concerns to consider when using supervised models, such as sample vs population differences, bias, and variance.

Course Topics

  • Introduction to Supervised Learning
  • Linear Regression
  • Logistic Regression
  • Support Vector Machines
  • k-Nearest Neighbors
  • Decision Trees
  • Random Forests
  • Gradient-Boosted Trees
  • Sample vs population concerns
  • The bias/variance trade-off

Audience

This course is suitable for software developers, data scientists, and project managers with some programming knowledge.

Prerequisites

  • Familiarity with the Python programming language.
  • An understanding of basic programming and computer science concepts, including program flow, loops, and functions.
  • Basic Familiarity with the pandas Python library.

Duration

1/2 Day

Learning Outcomes

In this course, participants will learn:

  • The common subtypes of unsupervised learning and what type of exploration they are useful for.
  • Different clustering algorithms and distance calculations and when each one might be useful.
  • How dimensionality reduction works and its use in both visualization and feature extraction.
  • How anomaly detection can be used to find instances that stick out statistically, even with out knowing any labels.

Course Topics

  • Introduction to unsupervised learning
  • Distance measures in high-dimensional spaces
  • K-Means clustering
  • Hierarchal clustering
  • Gaussian-based clustering
  • Dimensionality Reduction uses
  • Principal Component Analysis
  • DBSCAN for Anomaly detection
  • Local Outlier Factor
  • Isolation Forest

Audience

This course is suitable for a technical audience including managers, system architects/engineers, and data scientists.

Prerequisites

  • An understanding of basic programming, including control flow, loops, and functions, as well as a familiarity with the basics of computer science like computer architectures.
  • Familiarity with the Python programming language.

Duration

2 Days

Learning Outcomes

In this course, participants will learn:

  • The basics of PySpark including the Hadoop environment and HDFS basics, the resilient distributed dataset (RDD) and its manipulation, and DataFrames.
  • How to aggregate data at scale using SparkSQL write User Defined Functions (UDFs), optimization strategies for PySpark, and how to configure the environment for performance.

Course Topics

Introduction to PySpark
  • Hadoop and HDFS basics
  • PySpark Overview
  • Resilient Distributed Datasets (RDDs)
  • DataFrames
  • SparkSQL
Advanced PySpark
  • User Defined Functions (UDFs)
  • Optimization Strategies
  • Data Formats
  • Configuring Spark

Audience

This course is suitable for any developer, data scientist, or software engineer who is comfortable with the basics of machine learning and vanilla neural networks.

Prerequisites

  • Familiarity with the Python programming language.
  • An understanding of basic programming and computer science concepts, including program flow, loops, and functions.
  • Basic familiarity with machine learning concepts, specifically linear regression and vanilla neural networks (multilayer perceptrons).

Duration

1 Day

Learning Outcomes

In this course, participants will learn:

  • The fundamentals of applying deep learning to computer vision. We cover common computer vision paradigms and discuss how deep convolutional neural networks can be applied to them.
    • Image classification
    • Object detection
    • Image segmentation
  • How the deep learning components fit into the overall computer vision pipeline

Course Topics

  • Convolutional Neural Network basics
  • Introduction to various computer vision paradigms
    • Image Classification
    • Single object detection
    • Multiple object detection
    • Image Segmentation
  • Real life use cases
  • Neural Network Architecture overviews

Audience

This course is suitable for any developer, data scientist, software engineer, etc. who is interested in the basics of NLP and how, through the use of libraries, it can be easily applied to many real world situations.

Prerequisites

  • Familiarity with the Python programming language.
  • An understanding of basic programming and computer science concepts, including program flow, loops, and functions.
  • Basic familiarity with machine learning concepts, exposure to the scikit-learn Python library.

Duration

1 Day

Learning Outcomes

In this course, participants will learn:

  • The fundamentals of NLP. We cover common tasks lower level tasks and discuss how the information gleaned from them is useful. We will use common Python libraries to show how this processing can be done.
  • Higher level common applications in NLP, their uses, and their potential drawbacks. We will discuss what the training data looks like for each application, some applicable algorithms, and libraries that implement these algorithms.
  • The basics of deep learning for NLP and what deep learning can do for NLP. Almost all NLP uses deep learning in some format or another, but you do not have to be a deep learning expert to use the techniques!

Course Topics

Introduction to NLP Concepts and Applications
  • Part of Speech Tagging
  • Named Entity Recognition
  • Tokenization
  • Lemmatization
  • Text Classification
  • Topic Modeling
  • Sentiment Analysis
  • Information Extraction
  • Using the Python Libraries Spacy and Gensim
Introduction to Deep Learning in NLP
  • Word Vectors
  • Sentence Embeddings
  • Machine Learning over Deep Representations of of Language

Audience

This course is suitable for data engineers, data analysts, and project managers who would like an in-depth working knowledge of Accumulo development and management.

Prerequisites

  • Familiarity with the Java programming language.
  • An understanding of basic programming and computer science concepts, including program flow, loops, and functions.

Duration

5 Days

Learning Outcomes

This course provides and in-depth introduction to Apache Accumulo. It is aimed at developers, administrators, and managers who will be implementing solutions using Accumulo. The course uses hands-on development to reinforce the material. Accumulo’s background and storage design enable developers to create efficient data structures enabling datasets to scale to trillions of records and hundreds of petabytes. A deep dive into System administration allows administrators to create a secure and robust system with performance that scales linearly. Managers will also gain insight on the benefits of using Accumulo and the ability to transition from RDBMS to NoSQL solutions.

Course Topics

Accumulo Introduction and Basics
  • Hadoop and Accumulo architecture
  • Writing to Accumulo and compactions
  • Reading from Accumulo
  • Compactions
  • Column visibility
  • Data layout and schema design
  • Locality groups
Accumulo Systems and Administration
  • Accumulo Installation
  • Accumulo Updates
  • Accumulo-wide configuration
  • Table administration and table configuration
  • RFiles
  • Hadoop, HDFS, and YARN considerations
  • Accumulo metrics and monitoring
  • Tracing
  • Accumulo on AWS best practices
Advanced Accumulo Topics and Table Design
  • Iterators
  • Survey of Accumulo table design and index types
Spark and Accumulo
  • Reading from Accumulo with Spark
  • Writing to Accumulo from Spark
  • Architectural discussion on how Spark and Accumulo work together
Graph Indexing
  • Building, querying, and deploying an edge-node graph
  • Iterators for graphs in Accumulo
Advanced Accumulo Ingestion
  • Advanced discussion on how writes to Accumulo work
  • Basic architecture patterns for Ingestion
  • Using NiFi to build data flows into Accumulo
  • Bulk loading strategies into Accumulo

Audience

This course is suitable for a general audience including business people, managers, system architects/engineers, business/systems analysts, and data scientists.

Prerequisites

  • Familiarity with the Python programming language.
  • An understanding of basic programming and computer science concepts, including program flow, loops, and functions.

Duration

1/2 Day

Learning Outcomes

In this course, participants will learn:

  • Why properly visualizations are crucial in understanding data and sharing that understanding
  • The roles that visualization plays in “exploration” vs. “explanation” and how to properly produce visualizations that fit different needs.
  • Visualization theory including the choice of visualization and the role that color, styling, and presentation play in conveying specific meanings.
  • How to mindfully construct visualizations around considerations like the medium with which it will be shared, color blindness, and prioritizing their key message.

Course Topics

  • Visualization use cases
  • Demonstration of Python’s plotting capabilities
  • Comparison of Python’s plotting libraries
  • A deep dive into Matplotlib
  • Visualization theory
  • Special plotting use cases