Artificial Intelligence Lab

The AI lab is a fully remote, collaborative work environment between Atos and our clients. Our goal is to discover, scope, and prototype potential AI solutions; helping clients go from where they are now, to where they want to be.

Areas of Expertise

Ideation
Identify the most relevant opportunities for your business.
Problem Analysis
Analyze what solutions can work, and how they fit into your overall plan.
Corporate Training
Training courses to help your business grasp the fundamentals of data science, deep learning, and AI.

How We Work

We use an internally-developed and industry-proven framework, with the goal of quickly delivering impact and value.

Our projects and workshops are:

Fully Remote
Weeks, Not Months
Collaborative in Nature
Business-Centric

Corporate Training Courses

Introduction to Data Analytics

Audience

This course is suitable for a general audience including decision makers, business people, managers, system architects/engineers, business/systems analysts, and data scientists.

Prerequisites

No specific technical skills are expected.

Duration

1 Day

Learning Outcomes

The goal of this course is to provide a survey of current data analysis techniques, capabilities, and approaches enabling students to integrate advanced analytics and data science techniques into their organization. A brief history of data analytics and data science techniques are discussed including common data challenges. Common machine learning algorithms including support vector machines, decision trees, and random forests are explained. Applications of unsupervised learning including clustering, dimensionality reduction, and anomaly detection are explored.

Course Topics

A brief history of computing and data science
Setting up data for a data science project
- The 3 v’s of big data: volume, velocity, variety
- Structured vs. unstructured data
- Data acquisition
- Data quality considerations
- Data cleaning

Overview of machine learning
- What is machine learning?
- Supervised learning problems and algorithms
- Unsupervised learning and algorithms: clustering, dimensionality reduction, anomaly detection
- Current and emerging trends of data analytics

Data Analysis with Python

Audience

This course is suitable for a general audience including decision makers, business people, managers, system architects/engineers, business/systems analysts, and data scientists.

Prerequisites

No specific technical skills are expected.

Duration

1 Day

Learning Outcomes

In this course, participants will learn the Python environment including the use of Jupyter Notebooks, basic Python syntax, Numpy, and data cleaning and aggregation with Pandas. The course is largely framed in helping students translate their Excel tasks into Python in order to introduce automation, increase robustness, decrease error, and empower more in depth and sophisticated analyses.

Course Topics

Introduction to basic Python syntax
- Why we code and its usefulness
- Basic control flow, e.g. “for” loops and “if” statements
- The object-oriented paradigm
- Python data structures
Introduction to Numpy and array manipulations

Introduction to Pandas
- The Pandas DataFrame
- I/O pathways
- Descriptive statistics in Pandas
- Merging, concatenating, and joining data
- Data imputation
- How to filter data in Pandas and filtering strategies
- GroupBy operations and aggregations

Introduction to SQL

Audience

This course is suitable for a general audience including business people, managers, system architects/engineers, business/systems analysts, and data scientists who need to learn SQL for their role.

Prerequisites

An understanding of basic programming and computer science concepts.
Hands-on experience with data manipulation

Duration

1 Day

Learning Outcomes

In this course, participants will learn:

About relational database management systems and relational tables
The types of SQL statements and table operations
How to build SQL queries, including complex multi-level nested queries

How to apply aggregate functions in querying
How to use Group By operation in queries
The types and execution of join and union operations
Advanced topics including indexing, triggers, date manipulation, and window functions

Course Topics

What is SQL?
(R)DBMS
What you get with RDBMS (object types)
Types of database systems
Types of SQL statements
Building Queries
Joins & Unions
Subqueries

HAVING
Indexing
SQL & Python
Triggers
Date & String Manipulations
Window Functions
Describing a database & SQL-like frameworks

Introduction to Machine Learning

Audience

This course is suitable for software developers, data scientist, and others who have some programming experience.

Prerequisites

Familiarity with the Python programming language.
An understanding of basic programming and computer science concepts, including program flow, loops, and functions.

Duration

1 Day

Learning Outcomes

In this course participants will learn:

What machine learning is and how it is different from traditional programming approaches to solving problems.
- We will cover an overview of the field as a whole and the subtypes of problems studied in machine learning
How to evaluate machine learning models.
- We will go over common evaluation metrics and discuss when and why each one is appropriate.
How to set up data for training and testing, including different ways of splitting the data and what pre-processing steps are commonly needed.

Course Topics

The machine learning workflow
How features are used in machine learning
The difference between supervised and unsupervised learning
The difference between regression and classification
Evaluation metrics for regression and classification

Introduction to scikit-learn
Common cleaning and pre-processing steps
Splitting Data in to train and test sets
Error Analysis

Introduction to Machine Learning for Business

Audience

This course is suitable for a general audience including business people, managers, system architects/engineers, business/systems analysts, and data scientists.

Prerequisites

Minimal exposure to the Python programming language

Duration

1 Day

Learning Outcomes

In this course participants will learn:

How machine learning differs from standard programmed solutions.
The different types of machine learning algorithms, what problems they are meant to solve, and how to evaluate the different algorithms.
What steps are needed to begin work on a machine learning problem, including cleaning and other pre-processing steps.

An overview of supervised learning algorithms and examples of how to use them in the scikit-learn library.
An overview of the unsupervised learning algorithms and examples of how to use them from scikit-learn library.
How to integrate machine learning models in an enterprise environment, including an overview of complementary technologies.

Course Topics

What is Machine Learning?
Supervised and Unsupervised Learning
Evaluation of Machine Learning Algorithms
Pre-processing steps
Linear Regression
Logistic Regression
Decision Trees
Random Forest

K-means Clustering
Gaussian-based clustering
Dimensionality Reduction with PCA
DBSCAN for Anomaly Detection
Local Outlier Factor
Isolation Forest
Using Flask to deploy models
Using Spark to run models over big data

Introduction to Supervised Learning

Audience

This course is suitable for software developers, data scientists, and project managers with some programming knowledge.

Prerequisites

Familiarity with the Python programming language.
An understanding of basic programming and computer science concepts, including program flow, loops, and functions.
Basic Familiarity with the pandas Python library.

Duration

1/2 Day

Learning Outcomes

In this course, participants will learn:

What supervised learning is and what types of problems it can solve.
Common supervised learning algorithms, their differences, and how to use these algorithms.
Common concerns to consider when using supervised models, such as sample vs population differences, bias, and variance.

Course Topics

Introduction to Supervised Learning
Linear Regression
Logistic Regression
Support Vector Machines
k-Nearest Neighbors

Decision Trees
Random Forests
Gradient-Boosted Trees
Sample vs population concerns
The bias/variance trade-off

Introduction to Unsupervised Learning

Audience

This course is suitable for software developers, data scientists, and project managers with some programming knowledge.

Prerequisites

Familiarity with the Python programming language.
An understanding of basic programming and computer science concepts, including program flow, loops, and functions.
Basic Familiarity with the pandas Python library.

Duration

1/2 Day

Learning Outcomes

In this course, participants will learn:

The common subtypes of unsupervised learning and what type of exploration they are useful for.
Different clustering algorithms and distance calculations and when each one might be useful.
How dimensionality reduction works and its use in both visualization and feature extraction.
How anomaly detection can be used to find instances that stick out statistically, even with out knowing any labels.

Course Topics

Introduction to unsupervised learning
Distance measures in high-dimensional spaces
K-Means clustering
Hierarchal clustering
Gaussian-based clustering

Dimensionality Reduction uses
Principal Component Analysis
DBSCAN for Anomaly detection
Local Outlier Factor
Isolation Forest

Introduction to PySpark

Audience

This course is suitable for a technical audience including managers, system architects/engineers, and data scientists.

Prerequisites

An understanding of basic programming, including control flow, loops, and functions, as well as a familiarity with the basics of computer science like computer architectures.
Familiarity with the Python programming language.

Duration

2 Days

Learning Outcomes

In this course, participants will learn:

The basics of PySpark including the Hadoop environment and HDFS basics, the resilient distributed dataset (RDD) and its manipulation, and DataFrames.
How to aggregate data at scale using SparkSQL write User Defined Functions (UDFs), optimization strategies for PySpark, and how to configure the environment for performance.

Course Topics

Introduction to PySpark

Hadoop and HDFS basics
PySpark Overview
Resilient Distributed Datasets (RDDs)
DataFrames
SparkSQL

Advanced PySpark

User Defined Functions (UDFs)
Optimization Strategies
Data Formats
Configuring Spark

Introduction to Machine Vision with Deep Learning

Audience

This course is suitable for any developer, data scientist, or software engineer who is comfortable with the basics of machine learning and vanilla neural networks.

Prerequisites

Familiarity with the Python programming language.
An understanding of basic programming and computer science concepts, including program flow, loops, and functions.
Basic familiarity with machine learning concepts, specifically linear regression and vanilla neural networks (multilayer perceptrons).

Duration

1 Day

Learning Outcomes

In this course, participants will learn:

The fundamentals of applying deep learning to computer vision. We cover common computer vision paradigms and discuss how deep convolutional neural networks can be applied to them.
- Image classification
- Object detection
- Image segmentation
How the deep learning components fit into the overall computer vision pipeline

Course Topics

Convolutional Neural Network basics
Introduction to various computer vision paradigms
- Image Classification
- Single object detection
- Multiple object detection
- Image Segmentation

Real life use cases
Neural Network Architecture overviews

Introduction to Natural Language Processing

Audience

This course is suitable for any developer, data scientist, software engineer, etc. who is interested in the basics of NLP and how, through the use of libraries, it can be easily applied to many real world situations.

Prerequisites

Familiarity with the Python programming language.
An understanding of basic programming and computer science concepts, including program flow, loops, and functions.
Basic familiarity with machine learning concepts, exposure to the scikit-learn Python library.

Duration

1 Day

Learning Outcomes

In this course, participants will learn:

The fundamentals of NLP. We cover common tasks lower level tasks and discuss how the information gleaned from them is useful. We will use common Python libraries to show how this processing can be done.
Higher level common applications in NLP, their uses, and their potential drawbacks. We will discuss what the training data looks like for each application, some applicable algorithms, and libraries that implement these algorithms.
The basics of deep learning for NLP and what deep learning can do for NLP. Almost all NLP uses deep learning in some format or another, but you do not have to be a deep learning expert to use the techniques!

Course Topics

Introduction to NLP Concepts and Applications

Part of Speech Tagging
Named Entity Recognition
Tokenization
Lemmatization
Text Classification
Topic Modeling
Sentiment Analysis
Information Extraction
Using the Python Libraries Spacy and Gensim

Introduction to Deep Learning in NLP

Word Vectors
Sentence Embeddings
Machine Learning over Deep Representations of of Language

Introduction to Accumulo

Audience

This course is suitable for data engineers, data analysts, and project managers who would like an in-depth working knowledge of Accumulo development and management.

Prerequisites

Familiarity with the Java programming language.
An understanding of basic programming and computer science concepts, including program flow, loops, and functions.

Duration

5 Days

Learning Outcomes

This course provides and in-depth introduction to Apache Accumulo. It is aimed at developers, administrators, and managers who will be implementing solutions using Accumulo. The course uses hands-on development to reinforce the material. Accumulo’s background and storage design enable developers to create efficient data structures enabling datasets to scale to trillions of records and hundreds of petabytes. A deep dive into System administration allows administrators to create a secure and robust system with performance that scales linearly. Managers will also gain insight on the benefits of using Accumulo and the ability to transition from RDBMS to NoSQL solutions.

Course Topics

Accumulo Introduction and Basics

Hadoop and Accumulo architecture
Writing to Accumulo and compactions
Reading from Accumulo
Compactions
Column visibility
Data layout and schema design
Locality groups

Accumulo Systems and Administration

Accumulo Installation
Accumulo Updates
Accumulo-wide configuration
Table administration and table configuration
RFiles
Hadoop, HDFS, and YARN considerations
Accumulo metrics and monitoring
Tracing
Accumulo on AWS best practices

Advanced Accumulo Topics and Table Design

Iterators
Survey of Accumulo table design and index types

Spark and Accumulo

Reading from Accumulo with Spark
Writing to Accumulo from Spark
Architectural discussion on how Spark and Accumulo work together

Graph Indexing

Building, querying, and deploying an edge-node graph
Iterators for graphs in Accumulo

Advanced Accumulo Ingestion

Advanced discussion on how writes to Accumulo work
Basic architecture patterns for Ingestion
Using NiFi to build data flows into Accumulo
Bulk loading strategies into Accumulo

Data Visualization

Audience

This course is suitable for a general audience including business people, managers, system architects/engineers, business/systems analysts, and data scientists.

Prerequisites

Familiarity with the Python programming language.
An understanding of basic programming and computer science concepts, including program flow, loops, and functions.

Duration

1/2 Day

Learning Outcomes

In this course, participants will learn:

Why properly visualizations are crucial in understanding data and sharing that understanding
The roles that visualization plays in “exploration” vs. “explanation” and how to properly produce visualizations that fit different needs.
Visualization theory including the choice of visualization and the role that color, styling, and presentation play in conveying specific meanings.
How to mindfully construct visualizations around considerations like the medium with which it will be shared, color blindness, and prioritizing their key message.

Course Topics

Visualization use cases
Demonstration of Python’s plotting capabilities
Comparison of Python’s plotting libraries

A deep dive into Matplotlib
Visualization theory
Special plotting use cases

Artificial Intelligence Lab

The AI lab is a fully remote, collaborative work environment between Atos and our clients. Our goal is to discover, scope, and prototype potential AI solutions; helping clients go from where they are now, to where they want to be.

Areas of Expertise

Ideation

Identify the most relevant opportunities for your business.

Problem Analysis

Analyze what solutions can work, and how they fit into your overall plan.

Corporate Training

Training courses to help your business grasp the fundamentals of data science, deep learning, and AI.

How We Work

We use an internally-developed and industry-proven framework, with the goal of quickly delivering impact and value. Our projects and workshops are:

Corporate Training Courses

Audience

Prerequisites

Duration

Learning Outcomes

Course Topics

Audience

Prerequisites

Duration

Learning Outcomes

Course Topics

Audience

Prerequisites

Duration

Learning Outcomes

Course Topics

Audience

Prerequisites

Duration

Learning Outcomes

Course Topics

Audience

Prerequisites

Duration

Learning Outcomes

Course Topics

Audience

Prerequisites

Duration

Learning Outcomes

Course Topics

Audience

Prerequisites

Duration

Learning Outcomes

Course Topics

Audience

Prerequisites

Duration

Learning Outcomes

Course Topics

Introduction to PySpark

Advanced PySpark

Audience

Prerequisites

Duration

Learning Outcomes

Course Topics

Audience

Prerequisites

Duration

Learning Outcomes

Course Topics

Introduction to NLP Concepts and Applications

Introduction to Deep Learning in NLP

Audience

Prerequisites

Duration

Learning Outcomes

Course Topics

Accumulo Introduction and Basics

Accumulo Systems and Administration

Advanced Accumulo Topics and Table Design

Spark and Accumulo

Graph Indexing

Advanced Accumulo Ingestion

Audience

Prerequisites

Duration

We use an internally-developed and industry-proven framework, with the goal of quickly delivering impact and value.

Our projects and workshops are: