Data and AI Concepts
Table of Contents
II Data and AI Foundation
Topics Covered: Data and AI Introduction, Mathematics, IT/Programming, Business Domain
In this section, I am going to build the foundation that is necessary to grasp before looking at components of data and AI platform.
And we will start from scratch, first we will cover the basic concepts of data and AI and how these fields are connected, then we will focus on core concepts of mathematics, IT/programming and business domain etc.
1 Data and AI Introduction
1.1 Data Concepts
Topics Covered: Data, Data Vs Information, DIKW Pyramid, Different Aspects of Data (Formats, Scope, Biases), Structured, Semi-structured and Unstructured Data, Data Usage (Scientific Research, Business Management, Finance, Governance), Data Analysis
Data is the back-bone of data-driven AI. So lets first understand what is data?
Data is the raw fact without any context i.e. a number, symbol, character, word, codes, graphs, etc.
Data has originated as a plural form of Latin word ‘Datum’, which means ‘a given fact’.
Broadly speaking, it can be any information in digital form, it can be output of sensing device or organ.
Loosely, data and information are used interchangeably, which is not correct, we will cover the difference in an upcoming section.
Data, information, knowledge and wisdom are closely related concepts, but each has its own role in relation to the other, and each term has its own meaning, we will also touch this part soon.
1.1.2 Datum, Data and Dataset
Mostly we talk about data but occassionally, you may hear terms like datum or dataset, lets understand the difference. Datum is single piece of information, which can be treated as an observation. Data is plural of datum, which we can say multiple observations. Dataset is a homogenous collection of data (each datum must have the same focus).
When data is processed and put into context, it becomes information, which can be utilized by humans in significant way i.e. making decisions, forecasting etc
1.1.4 Knowledge and Wisdom
When we put relevant information to work in a specific domain, it becomes knowledge. And when that knowledge is enhanced with first-hand experience, it becomes wisdom.
Lets relate it to an example:
- ‘100’ number is data
- ‘100 miles’ is information
- ‘100 miles is quite a far distance’ is a knowledge
- ‘100 miles is very difficult to walk’ is wisdom.
1.1.5 Different Aspects of Data
Formats of Data
We can classify data formats in three categories as structured, semi-structured and unstructured:
- Structured data has a definite structure like table with rows and columns.
- Semi-structured data has some structure like JSON, key-value or graph database.
- Unstructured data has no specific structure like photos, audio and video files.
Scope of Data
Data can be classified in two categories based on scope:
- Population, which means we have access to all the data
- Sample, which means only a portion is available or feasible
Biases in Data
Bias in data means over or under-representation of a sub-population, may not be intentional.
- Omission: using arguements from only one side
- Source selection: including more authoritative sources from one side
- Story selection: sharing stories that agree with one side
- Placement: unimportant stories gets important placement in reputed media platforms
- Labelling: labeled on one side or missing labels on other side
- Spin: stories providing only one interpretation of an event
1.1.6 Data Usage
Data is used in following fields:
- Scientific research
- Business Management
1.1.7 Data Analysis
- Data requirements
- Data collection
- Data processing
- Data cleaning
- Exploratory data analysis
- Data product
1.2 IT/Programming Concepts
Topics Covered: Technology, Information Technology, Data Structures and Algorithms, Data Processing and Storage, Data Models, Operational & Analytical Data, Databases, Data Warehouses, Streaming and Batch Data, ETL/ELT
1.3 AI Concepts
Topics Covered: Intelligence, Intelligent Agents, Applications (Web Search, Recommendation Systems, Self-driving Cars, Strategic Games), Aspects of AI (Search, Knowledge, Uncertainty, Optimization, Learning, Neural Networks, Language), Strong and Weak AI
1.4 From Data To AI
Topics Covered: Business Intelligence, Data Science, Machine Learning, Deep Learning, Artificial Intelligence
2.1 Linear Algebra
Topics Covered: Scalars, Vectors, Matrices and Tensors, Multiplying Matrices and Vectors, Identity and Inverse Matrices, Linear Dependence and Span, Norms, Special Kinds of Matrices and Vectors, Eigendecomposition, Singular Value Decomposition (SVD), The Moore Penrose Pseudoinverse, The Trace Operator, The Determinant, Principal Component Analysis
2.2 Multivariate Calculus
Topics Covered: Functions, Derivatives, Product Rule, Chain Rule, Integrals, Partial Derivatives, The Gradient, The Jacobian, The Hessian, Multivariate Chain Rule, Approximate Functions, Power Series, Linearization, Multivariate Taylor
2.3 Probability and Statistics
Topics Covered: Probability, Conditional Probability, Random Variables, Probability Distributions
Topics Covered: Statistics, Descriptive Statistics (Univariate, Bivariate, Multivariate Analysis, Function Models), Inferential Statistics (Sampling Distributions & Estimation, Hypothesis Testing, Correlation, Causation & Regression), Bayesian Statistics (Frequentist Vs Bayesian Statistics, Bayesian Inference, Test for Significance), Statistical Learning (Prediction & Inference, Parametric & Non-parametric methods, Prediction Accuracy and Model Interpretability, Bias-Variance Trade-Off)
3.1 Operating System Basics
*Topics Covered: *
3.2 Data Structures and Algorithms Basics
Topics Covered: Data Structures (Array, Linked List, Stack, Queue, Heap, Hashing, Binary Tree, Binary Search Tree, Graph, Matrix), Algorithms (Asymptotic Analysis, Searching and Sorting, Greedy Algorithms, Recursion, Dynamic Programming)
3.3 Programming Basics
*Topics Covered: *
3.3 Database Systems Basics
*Topics Covered: *
3. Cloud Computing
Topics Covered: Introduction, Public, Private and Hybrid Clouds, IaaS, PaaS and SaaS, Data and AI on Cloud, AWS, Azure and GCP
4 Business Domain
Topics Covered: Problem Solving, Problem Identification, Problem Definition, Prioritization, Root-Cause Analysis, Possible Solutions, Solution Evaluation, Cost-Benefit Analysis, Planning and Implementation
III Data and AI Components
Topics Covered: Data Governance, Data Architecture, Data Ingestion, Data Storage, Data Engineering, Data Science, Data Visualization, Data Operationalization
5 Data Governance
Topics Covered: Data Governance Basics, Why Data Governance is Important?, Aspects of Data Governance, How to do Data Governance?
6 Data Architecture
Topics Covered: Data Architecture Basics, Why Data Architecture is Required?, How to build Data Architecture?
7 Data Ingestion
Topics Covered: Data Ingestion Basics, Types of Data Ingestion, Tools for Data Ingestion
8 Data Storage
Topics Covered: Data Storage Basics, Types of Data Storage, Tools for Data Storage
9 Data Engineering
Topics Covered: Data Engineering Basics, Tools for Data Engineering, Building Data Pipelines
10 Data Science
Topics Covered: Data Science Basics, Overall Process, Algorithms, Tools for Data Science
11 Data Visualization
Topics Covered: Data Visualization Basics, Why Data Visualization is Important?, Tools for Data Visualization
12 Data Operationalization
Topics Covered: Operationalization Basics, Why Operationalization is required?, Tools for Data AI Operationalization
IV Data and AI Platforms
Topics Covered: Open Source, AWS, Azure, GCP, Databricks, Snowflake
13 Open Source
Topics Covered: Building Data and AI Platform in Open Source
Topics Covered: Building Data and AI Platform in AWS
Topics Covered: Building Data and AI Platform in Azure
Topics Covered: Building Data and AI Platform in GCP
Topics Covered: Building Data and AI Platform in Databricks
Topics Covered: Building Data and AI Platform in Snowflake
Topics Covered: SQL, Python, UNIX and Shell Scripting, Data Structure and Algorithms
Topics Covered: SQL, Data Models, ER Diagrams, Tables, Temporary Tables, Selecting (SELECT, FROM, DISTINCT), Filtering (WHERE, AND, OR, IN, NOT, BETWEEN, NULLs, Wildcards), Ordering (ORDER BY, DESC), Aggregating (GROUP BY, HAVING, AVERAGE, COUNT, MAX, MIN), Subqueries, Joins (Cartesian, Inner, Outer <Left/Right>, Self), Sets (UNION, UNION ALL, INTERSECT), Aliases, Views, Subqueries (WITH AS)
Topics Covered: Programming, Installation, Basic Syntax & Variable Types, Data Types and Conversion, Basic Operators and Loops, Functions, Exceptions and Modules, Data Science Specific Modules (NumPy, SciPy, Pandas, MatPlotLib, Scikit-Learn)
21 UNIX and Shell Scripting
Topics Covered: Operating System, Architecture, Basic UNIX Commands, Shell Scripting