Data Science Online Training

Data Science in Python Online Training Course Content


Linux_OS_Basics and Shell Scripting

  • Linux_Commands
  • File_System
  • vi editor
  • Advanced_Linux_Commands
  • System_Administration
  • Advanced_System_Administration
  • Grep
  • Shell_Scripting with examples(All types of loops)
  • Regular_Expressions
  • SED
  • AWK

MODULE 1: Introduction To Python - Data Science


  • Installation of Anaconda setup (Data Science Development Environment)
  • Installation of Pycharm
  • Working with Python List , List operation , Functions
  • Python Tuple , working and functions
  • Sets and Dictionary -operations and Working with them
  • Python More on Strings
  • Python Dates and Times
  • More on functions
  • Advanced Python Lambda
  • List Comprehensions

MODULE 2: Data Analysis


1. Data Wandering

  • All about files Files
  • importing and exporting data with CSV files
  • XLRD module - working with xls .xlsx formats
  • Json data
  • XML data
  • Relational data Bases
  • Sql in python
  • Data quality Analysis

2. DATA MANIPULATION - Cleaning - Munging - Cleansing Data with Python

  • strong>Data Manipulation steps(Sorting, filtering, duplicates, merging, appending, subsetting, derived variables, sampling, Data type conversions, renaming, formatting etc)
  • Data manipulation tools(Operators, Functions, Packages, control structures, Loops, arrays etc)
  • Python Built-in Functions (Text, numeric, date, utility functions)
  • Python User Defined Functions
  • Stripping out extraneous information
  • Normalising data
  • Formatting data
  • Important Python modules for data manipulation (Pandas, Numpy, re, math, string, datetime etc)

3. DATA VISUALIZATION

  • Introduction exploratory data analysis
  • Descriptive statistics, Frequency Tables and summarization
  • Univariate Analysis (Distribution of data & Graphical Analysis)
  • Bivariate Analysis(Cross Tabs, Distributions & Relationships, Graphical Analysis)
  • Creating Graphs- Bar/pie/line chart/histogram/ boxplot/ scatter/ density etc)
  • Important Packages for Exploratory Analysis(NumPy Arrays, Matplotlib, seaborn, Pandas and scipy.stats etc)

4. DATA ANALYSIS WITH PANDAS

  • The Series Data Structure
  • Querying a Series
  • The Data-Frame Data Structure
  • Data-Frame Indexing and Loading
  • Querying a Data-Frame
  • Indexing Data-frame
  • Understanding business problem
  • Selecting columns from Pandas Data Structures
  • Treating with missing values, outliers, NaN values
  • Creating new columns
  • Aggregate data ( use: groupby, merge, pivot, lambda)
  • Identifying unique values in data
  • Filter Data
  • Using basic functionality of Pandas API

MODULE 3: Mathamatics


1. STASTISTICS

  • Basic Statistics - Measures of Central Tendencies and Variance
  • Building blocks - Probability Distributions - Normal distribution - Central Limit Theorem
  • Inferential Statistics -Sampling - Concept of Hypothesis Testing
  • Statistical Methods - Z/t-tests( One sample, independent, paired), Anova, Correlations and Chi-square
  • Important modules for statistical methods: Numpy, Scipy, Pandas

2. PROBABILITY

  • Probability , Conditional Probability
  • Basic of Probability, Independent and Dependant events
  • Conditional Probability and Bayes Theorem
  • Continuous Probability Distributions
  • Mean, Median, Mode, Range
  • Determination of statistical techniques
  • Standard Deviation, Variance, Covariance, Correlation
  • outliners
  • Distribution of Data – Normal, Binomial, Gaussian
  • Different types of Data
  • Continuous , Categorical, Range
  • Testing of Hypothesis – which covers
  • Level of Significance (LOS), Level of Confidence, P-Value, T test, Z-test, ANOVA Test, CHI -Square Test

MODULE 4: Machine Learning


1. SUPERVISED LEARNING AND MODEL BUILDING

  • Process of Machine Learning
  • Model Building based on Data sets
  • Splitting Data: Training and Test sets
  • Regression Analysis (Linear, Multiple, Logistics Regression)
  • Classification concepts and Distance Functions
  • K-nn Algorithm concept and demonstration with data sets
  • Bayes Classification concept and demonstration with data sets
  • Decision Tree Algorithm concept and demonstration with data sets
  • Random Forests - Ensembling Techniques and Algorithms

2. UNSUPERVISED LEARNING AND MODEL BUILDING

  • Unsupervised Learning and Clustering Techniques
  • Centroid-based Clustering: K- Mean Algorithm concept and demonstration
  • Hierarchical Clustering concepts and Applications
  • Density-based Clustering: DBSCAN Algorithm concept and demonstration

3. DIMENSION REDUCTION TECHNIQUES

  • Dimension Reduction Introduction
  • Why Dimension Reduction Required
  • LDA (Linear Discriminant Analysis) concept and applications
  • PCA (Principle Component Analysis) concept and applications

4. TIME SERIES FORECASTING: SOLVING FORECASTING PROBLEMS

  • Introduction - Applications
  • Time Series Components( Trend, Seasonality, Cyclicity and Level) and Decomposition
  • Classification of Techniques(Pattern based - Pattern less)
  • vBasic Techniques - Averages, Smoothening
  • Advanced Techniques - AR Models, ARIMA

5. DATA SCIENCE PROJECTS WITH DATA SETS

  • Applying different algorithms to solve the business problems and bench mark the results

Enquiry Form