• fullslide1
    MOBILE APPLICATION DEVELOPMENT
    At iKompass, we believe great things are built by a series of small things coming together.
    We'll leave no stone unturned until we find the best way to add value to your business.
  • fullslide1
    iPHONE | iPAD | iOS APPLICATION DEVELOPMENT
    Our Fresh Ideas to help inspire your next app.
  • fullslide1
    ANDROID APPLICATION DEVELOPMENT
    Our Android Apps will bring out the adventurer in you providing you with infinite possibilities.
  • fullslide1
    MOBILE WEB
    We strive hard to align our vision in line with your business growth.
    Our Mobile Web strategy is seamlessly structured to meet your organisational goals.

Data Science/ Big Data/ Machine Learning Courses in Singapore

We offer multiple courses on Data Science. The 3-day Big Data Foundation course, 3-day Data cleaning course, 3-day Machine Learning course and 3-day Artificial Intelligence Neural Networks course. Programming pre-requisite in the foundation course is optional. For participants with no programming background in Python, you will start with the 3-day Big Data Foundation course. This course will teach you Python programming for analytics. For all other data science courses except the foundation course, Python programming experience is required.

All our data science related courses are taught by working practitioners, not academicians. The goal is to get you well versed in applying techniques to solve real world problems in the most efficient manner.

3-days Foundation Course 3-days Data Cleaning 3-days Machine Learning 3-days Al-Neural Networks Do I have the aptitude for data science?

CITREP Funding

Enhanced Funding Support for Professionals aged 40 and above and SMEs

Professionals aged 40 and above (i.e. self-sponsored individuals) and SMEs who are sponsoring their employees for training (i.e. organisation-sponsored trainees) will be entitled to CITREP enhanced funding support of up to 90% of the nett payable course and certification fees. This is applicable for Singapore Citizens and Permanent Residents (PR’s).

Please find FY17 CITREP+ funding support details as per following:

Organisation- sponsored Non SMEs

course + exam

Up to 70% of the nett payable course and certification fees, capped at $3000 per trainee

exam only

Up to 70% of the nett payable certification fees, capped at $500 per trainee
Singapore Citizens and Permanent Residents (PR’s)
SMEs Up to 90% of the nett payable course and certification fees, capped at $3000 per trainee Up to 70% of the nett payable certification fees, capped at $500 per trainee
Self-Sponsored Professionals (Citizens and PRs) Up to 70% of the nett payable course and certification fees, capped at $3000 per trainee Up to 70% of the nett payable certification fees, capped at $500 per trainee Singapore Citizens and Permanent Residents (PR’s)
Professionals (40 years old and above)* as of 1 Jan of the current year Up to 90% of the nett payable course and certification fees, capped at $3000 per trainee Up to 70% of the nett payable certification fees, capped at $500 per trainee
Students (Citizens) and/or Full-Time National Service (NSF) Up to 100% of the nett payable course and certification fees, capped at $2500 per trainee Up to 100% of the nett payable certification fees, capped at $500 per trainee
 
 

CCC Big Data/ Data Science Foundation

The Big Data/Data Science Foundation course in Singapore offers participants with the option of getting certified as a CCC Big Data/Data Science Foundation by the Cloud Credential Council . The foundation course is non-technical and is open to managers, professionals and decision makers.

Big Data is a process to deliver decision-making insights. The process uses people and technology to quickly analyze large amounts of data of different types (traditional table structured data and unstructured data, such as pictures, video, email, transaction data, and social media interactions) from a variety of sources to produce a stream of actionable knowledge. Organizations increasingly need to analyze information to make decisions for achieving greater efficiency, profits, and productivity.

As relational databases have grown in size to satisfy these requirements, organizations have also looked at other technologies for storing vast amounts of information. These new systems are often referred to under the umbrella term “Big Data.” Gartner has identified three key characteristics for big data: Volume, Velocity, and Variety. Traditional structured systems are efficient at dealing with high volumes and velocity of data; however, traditional systems are not the most efficient solution for handling a variety of unstructured data sources or semi structured data sources.

Big Data solutions can enable the processing of many different types of formats beyond traditional transactional systems. Definitions for Volume, Velocity, and Variety vary, but most big data definitions are concerned with amounts of information that are too difficult for traditional systems to handle—either the volume is too much, the velocity is too fast, or the variety is too complex.


 

Data Science Course Sample Content

 
 
 
 

Big Data/ Data Science Foundation

3 days

This course leads to the Big Data Foundation certification by the Cloud Credential Council (CCC). The CCC Big Data Foundation certification is the certification awarded to individuals who have successfully passed the CCC Big Data Foundation exam

Our CCC Big Data/ Data Science Foundation course is a good place to start in case you do not have any experience with Big Data. It provides information on the best practices in devising a Big Data solution for your organization.

Course features:

  • 3 days classroom training
  • Cloud Credential Certification
  • 6 months of online learning with weekly assignments and feedback
  • Post course Video tutorials with support
Timeline_small-01

Classroom Training Outline

Big Data Foundation

Course Outline

 
CCC Big Data Foundation

Day 1

1. Introduction to Big Data
  • What is Big Data?
  • Usage of Big Data in real world situations
2. Data Processing Lifecycle
  • Collection
  • Pre-processing
  • Hygiene
  • Analysis
  • Interpretation
  • Intervention
  • Visualisation
  • Sources of Data

Technical Components (Optional). The below modules will be covered end of the day.

Introduction to Python

  • Jupyter
  • Interactive computing
  • Functions, arguments in Python

Introduction to Pandas

Day 2

3. Source of Data

Data collection is expensive and time consuming. In some cases you will be lucky enough to have existing datasets available to support your analysis. You may have datasets from previous analyses, access to providers, or curated datasets from your organization. In many cases, however, you will not have access to the data that you require to support your analysis, and you will have to find alternate mechanisms. Twitter data is a good example as, depending on the options selected by the twitter user, every tweet contains not just the message or content that most users are aware of. It also contains a view on the network of the person, home location, location from which the message was sent, and a number of other features that can be very useful when studying networks around a topic of interest.

  • Network Data
  • Social Context Data
  • Sendor Data
  • Systems Data
  • Machine log data
  • Structured Vs Unstructured Data
4. First Order Analysis and exploration
  • Basic Statistics
  • Analyse your dataset and determine features
  • Data validation
  • Noise and bias
  • Random errors
  • Systematic errors
5. Graph Theory

Technical Components (Optional). The below modules will be covered end of the day.

Introduction to NetworkX

  • Adjacency Matrix
  • Clustering
  • Create a Graph
  • Measure centrality
  • Degree distribution
6. Second order analysis

According to the SAS institute, machine learning is a method of data analysis that automates analytical model building. Using algorithms that iteratively learn from data, machine learning allows computers to find hidden insights without being explicitly programmed where to look. There are two main classes of machine learning algorithms: (i) supervised and (ii) unsupervised learning. Exactly what does learning entail? At its most basic, learning involves specifying a model structure f that hopefully can extract regularities for the data or problem at hand as well as the appropriate objective function to optimize using a specified loss function. Learning (or fitting) the model essentially means finding optimal parameters of the model structure using provided input/target data. This is also called training the model. It is common (and best practice) to split the provided data into at least two sets - training and test data sets.

  • Machine Learning
  • Meta Data
  • Training data and test data
  • Identifying Features

Technical Components (Optional). The below modules will be covered end of the day.

  • Introduction to Scikit-learn
  • Introduction to Mlxtend

Day 3

7. Rolling out Big Data projects

Hypothetical Big Data project use case:

Cybersecurity measures within a company in relation to insider threats. The company hosts thousands of applications for various business functions. The context will be User Behavior Analytics. Signals include, login meta data for each application, location data, network data, employee data, performance appraisal data, travel data, deaktop activity data. The analytics is focused on determining a risk score based for each user.

Technological component or trend:

The technology component in the insider threat context requires collection and processing of the following data:

  • User Data
  • Application logs
  • Access data
  • Business data
  • Assets, CMDB
  • User activity
  • Network data

A layered approach for data processing is ideal starting with implementation of a ETL (Extract, Transform, Load). Processing of data is done through tools.

  • Extract, Transform, Load
  • Data processing
  • Normalization
  • Correlations
  • Risk profiling
  • Data lake

The last layer is the data lake which stores all structured and unstructured data. This can be accessed through libraries such as pandas, hadoop, graph db etc.,

The data lake will enable building algorithms to determine risky behavior and send alerts. The objective is to prioritize the alerts based on a risk score. Example, a user accessing a certain application from a specific ip address with a recent low rating on his performance appraisal and has booked a long holiday will be flagged as high risk.

  • Project Management
  • Different Phases
  • Technology components
  • Privacy
  • System architecture

Technical Components (Optional). The below modules will be covered end of the day.

  • K-Anonimity
  • Data Coarsing
  • Data suppression
Final Exam

40 Questions

Pass mark: 65%

Format of the Examination

  • Attend 3 days class
  • Schedule Exam
  • Take Exam
DSdc_web1

Master Nodes

Job Tracker
 
Name Node
 
Secondary Name Node

 
 
 

Data Cleaning

Our 3 days data cleaning course teaches you techniques to scrub or process big data with the goal of making it ready for building models. Most algorithms require data that is cleaned and normalized. Data scientists typically end of spending more than 70% of their effort in data cleaning/wrangling. Knowledge of techniques to work with with unstructured data is essential in data science.

Real-world data is frequently dirty and unstructured, and must be reworked before it is usable. Data may contain errors, have duplicate entries, exist in the wrong format, or be inconsistent. The process of addressing these types of issues is called data cleaning. Data cleaning is also referred to as data wrangling, massaging, reshaping , or munging. Data merging, where data from multiple sources is combined, is often considered to be a data cleaning activity.

We need to clean data because any analysis based on inaccurate data can produce misleading results. We want to ensure that the data we work with is quality data.

Data quality involves:

  • Validity: Ensuring that the data possesses the correct form or structure
  • Accuracy:The values within the data are truly representative of the dataset
  • Completeness:There are no missing elements
  • Consistency: Changes to data are in sync
  • Uniformity: The same units of measurement are used

There are several techniques and tools used to clean data. We will examine the following approaches: Handling different types of data

  • Cleaning and manipulating text data
  • Filling in missing data
  • Validating data

We will be using Python libraries. These libraries often are more expressive and efficient. However, there are times when using a simple string function is more than adequate to address the problem. Showing complimentary techniques will improve the student’s skill set.

The basic text based tasks include:

  • Data transformation
  • Data imputation (handling missing data)
  • Subsetting data
  • Sorting data
  • Validating data

Learning Objectives

After completing this course, you should have the skills and be familiar with the following topic

  • Handling various kind of data importing scenarios that is importing various kind of datasets (.csv, .txt), different kind of delimiters (comma, tab, pipe), and different methods (read_csv, read_table)
  • Getting basic information, such as dimensions, column names, and statistics summary
  • Getting basic data cleaning done that is removing NAs and blank spaces, imputing values to missing data points, changing a variable type, and so on
  • Creating dummy variables in various scenarios to aid modelling
  • Generating plots like scatter plots, bar charts, histograms, box plots, and so on

Who should attend

Data Analysts, Data Engineers, Data Science Enthusiasts, Business Analysts, Project Managers

Prerequisite

Foundational certificate in Big Data/Data Science This course is meant for anyone who are comfortable developing applications in Python, and now want to enter the world of data science or wish to build intelligent applications. Aspiring data scientists with some understanding of the Python programming language will also find this course to be very helpful. If you are willing to build efficient data science applications and bring them in the enterprise environment without changing your existing python stack, this course is for you

Delivery Method

Mix of Instructor-led, case study driven and hands-on for select phases

H/w, S/w Reqd

Python, Pandas, Numpy, , Spark, Elasticsearch, MongoDbSystem with at least 8GB RAM and a Windows /Ubuntu/Mac OS X operating system

Tools covered

  • Pandas
  • Numpy
  • MongoDb
  • Apache Spark
  • Elasticsearch
  • Kafka
  • Jupyter notebook
  • Ipython
  • EC2
  • S3
 

Data Cleaning – Working with Data Lakes

DataLake

Data Cleaning – Process

DataProcess

Data Cleaning Training Roadmap

TRAINING_ROAD_MAP_ITPACS

Data

Machine Learning course in Singapore

Our 3 days data cleaning course teaches you techniques to scrub or process big data with the goal of making it ready for building models. Most algorithms require data that is cleaned and normalized. Data scientists typically end of spending more than 70% of their effort in data cleaning/wrangling. Knowledge of techniques to work with with unstructured data is essential in data science.

Real-world data is frequently dirty and unstructured, and must be reworked before it is usable. Data may contain errors, have duplicate entries, exist in the wrong format, or be inconsistent. The process of addressing these types of issues is called data cleaning. Data cleaning is also referred to as data wrangling, massaging, reshaping , or munging. Data merging, where data from multiple sources is combined, is often considered to be a data cleaning activity.

We need to clean data because any analysis based on inaccurate data can produce misleading results. We want to ensure that the data we work with is quality data.

We need to clean data because any analysis based on inaccurate data can produce misleading results. We want to ensure that the data we work with is quality data.

Learning Objectives

After completing this course, you should have the skills and be familiar with the following topics

  • Apply mathematical concepts regarding the most common machine learning problems, including the concept of learnability and some elements of information theory.
  • Explain the process of Machine Learning
  • Describe the most important techniques used to preprocess a dataset, select the most informative features, and reduce the original dimensionality.
  • Describe the structure of a continuous linear model, focusing on the linear regression algorithm. Explain Ridge, Lasso, and ElasticNet optimizations, and other advanced techniques.
  • Describe the concept of linear classification, focusing on logistic regression and stochastic gradient descent algorithms.
  • Describe the concept of classification algorithms including Decision Trees, Support Vector Machines, Random Forests, Naive Bayes and K Nearest Neighbors
  • Demonstrate knowledge of evaluation metrics

Who should attend

Data Analysts, Data Engineers, Data Science Enthusiasts, Business Analysts, Project Managers

Prerequisite

Foundational certificate in Big Data/Data Science This course is meant for anyone who are comfortable developing applications in Python, and now want to enter the world of data science or wish to build intelligent applications. Aspiring data scientists with some understanding of the Python programming language will also find this course to be very helpful. If you are willing to build efficient data science applications and bring them in the enterprise environment without changing your existing python stack, this course is for you

Delivery Method

Mix of Instructor-led, case study driven and hands-on for select phases

H/w, S/w Reqd

Python, Pandas, Numpy, System with at least 8GB RAM and a Windows /Ubuntu/Mac OS X operating system

Duration

3 days

 

Sample concepts covered as part of the Machine Learning course in Singapore

The course will cover in detail both the mathematical aspects as well as the business application aspect of algorithms

Training data and test data

The observations in the training set comprise the experience that the algorithm uses to learn. In supervised learning problems, each observation consists of an observed response variable and one or more observed explanatory variables. The test set is a similar collection of observations that is used to evaluate the performance of the model using some performance metric. It is important that no observations from the training set are included in the test set.

training_test_accuracy

Memorizing the training set is called over-fitting. A program that memorizes its observations may not perform its task well, as it could memorize relations and structures that are noise or coincidence. Balancing memorization and generalization, or over-fitting and under-fitting, is a problem common to many machine learning algorithms. In this course we will discuss regularization, which can be applied to many models to reduce over-fitting.

Random Forests – Ensemble Voting

Ensembling by voting can be used efficiently for classification problems. We now have a set of classifiers, and we need to use them to predict the class of an unknown case. The combining of the predictions of the classifiers can proceed in multiple ways. The two options that we will consider are majority voting, and weighted voting. Ideas related to voting will be illustrated through an ensemble based on the homogeneous base learners of decision trees, as used in the development of bagging and random forests.

Voting

Bias Variance Trade-off

Many metrics can be used to measure whether or not a program is learning to perform its task more effectively. For supervised learning problems, many performance metrics measure the amount of prediction error. There are two fundamental causes of prediction error: a model’s bias, and its variance. Assume that you have many training sets that are all unique, but equally representative of the population.

A model with high bias will produce similar errors for an input regardless of the training set it used to learn; the model biases its own assumptions about the real relationship over the relationship demonstrated in the training data. A model with high variance, conversely, will produce different errors for an input depending on the training set that it used to learn. A model with high bias is inflexible, but a model with high variance may be so flexible that it models the noise in the training set. That is, a model with high variance over-fits the training data, while a model with high bias under-fits the training data. It can be helpful to visualize bias and variance as darts thrown at a dartboard.

Darts

Decision Trees

A model with high bias will produce similar errors for an input regardless of the training set it used to learn; the model biases its own assumptions about the real relationship over the relationship demonstrated in the training data. A model with high variance, conversely, will produce different errors for an input depending on the training set that it used to learn. A model with high bias is inflexible, but a model with high variance may be so flexible that it models the noise in the training set. That is, a model with high variance over-fits the training data, while a model with high bias under-fits the training data. It can be helpful to visualize bias and variance as darts thrown at a dartboard.

Decision_tree

Entropy

In statistics, entropy is the measure of the unpredictability of the information contained within a distribution. The entropy technique takes cues from information theory. The premise is that more homogeneous or pure nodes require less information to be represented.

Entropy_Graphic

Support Vector Machines

Support vector machines (SVMs) are supervised learning methods that analyze data and recognize patterns. SVMs are primarily used for classification, regression analysis, and novelty detection. Given a set of training data in a two-class learning task, an SVM training algorithm constructs a model or classification function that assigns new observations to one of the two classes on either side of a hyperplane, making it a nonprobabilistic binary linear classifier

linear_separability

Hyperplane

A support vector machine (SVM) is a supervised machine learning model that works by identifying a hyperplane between represented data. The data can be represented in a multidimensional space. Thus, SVMs are widely used in classification models. In an SVM, the hyperplane that best separates the different classes will be used.

geometric

Need for Applied Machine Learning

Graph

Source of Data for Machine Learning




There is obvious visible information, which one is conscious of and there is information that comes off you. Example, from your phone one can determine which website you visited, who you called, who your friends are, what apps you use. Data science takes it further to reveal how close you are to someone, are you an introvert or an extrovert, when during the day are you most productive, how often do you crave for ice cream, what genre of movies you like, what aspects of social issues interest you the most etc.,

Sensors everywhere

With the possibility of adding sensors to everything, now there is deeper insight into what is going on inside your body. Spending 10 minutes with a doctor who gives you a diagnosis based on stated or observed symptom is less useful than a system that has data about everything going on inside your body. Your health diagnosis is likely to be more accurate with analysis of data collected through devices such as fitbits and implantables.

The amount of data available with wearables and other devices provides for rich insight about how you live, work with others and have fun.

Digital Breadcrumbs

Big Data and analytics is made possible due to the digital breadcrumbs we leave. Digital breadcrumbs include things like location data, browsing habits, information from health apps, credit card transactions etc.,

The data lets us create mathematical models of how people interact, what motivates us, what influences our decision making process and how we learn from each other.

Big Data versus Information

One can think of Big Data as the raw data available in sufficient volume, variety and velocity. Volumes here refer to terabytes of data. Variety refers to the different dimensions of data. Velocity refers to the rate of change.

A bank can use credit card information to develop models that’s more predictive about future credit behavior. This provides better financial access. What you purchased, frequency of purchase, how often do you pay back, where do you spend money are better predictors of payment credibility than a simple one dimensional credit score.


Machine Learning Process

FAQs


Data Science is a combination of business, technical and statistical worlds. We will be covering the theoretical aspects of all three in class. As such, we don’t require participants to have a background in all three. Background in any one of the three will be sufficient. Those with a programming or statistical background can explore the practical technical aspects with the instructors from 4 - 7 pm.
http://www.ikompass.com.sg/data_science_big_data_foundation_training_singapore.php
No. The optional technical modules don’t have additional costs. However, to work through the optional technical modules, you need to have a background in either statistics or programming.
For CITREP+ funding, you must be a Singapore Citizens and Permanent Residents (PR’s) and pass an exam at the end of the course. Exam will be on the last day of the class. CITREP+ funding is based on a claim that you will make after passing the exam. This means you will pay us the full course fees and IMDA will reimburse 70% of the course and exam fees after you make a claim. We will assist you with the claim process.
Upon passing the exam, you will receive a certificate from Cloud Credential Council as Certified in Big Data Foundation.
You can take the exam 2 times with no additional costs. Beyond the second attempt, you will need to pay for the exam fees.
Yes, the funding applies to all Singapore Citizens and Permanent Residents (PR’s) irrespective of the industry.
The course does not have an academic minimum requirement. However, you need to be familiar with basic data analysis and have an understanding of school/ college statistics. You should already have knowledge of mean, standard deviation, median, variance. You should be able to make inferences from charts and graphs. Before joining the class, we will send you some data and you need to send us some insights about the data. Your insights about the data will determine if you will be able to get the most value from attending the class. Below is the link to the data analysis you need to perform before attending the class.
http://www.ikompass.com.sg/data_science_form.php
The difficulty level of the concepts depends on your background. If your job involves analyzing trends from data, you are likely to find the course easy. Before joining the class, we will send you some data and you need to send us some insights about the data. Your insights about the data will determine if you will be able to get the most value from attending the class.
Technology is one part of the data science world. The course covers business, statistical and technology. For example, the business side of the course covers figuring out the factors that influence sales. The statistical aspects involves uncovering the correlation between various factors that affect sales. The technology aspect involves writing code to elicit predictions. We spend about 2 hours at the end of the day in writing code in Python for those interested in the programming aspects.
No. This is a 3 day introductory course. Data science is an extensive field and can take years to be an expert. Many data scientists specialize in one particular domain. This course provides you with an overview of what is involved in data science.
The course covers the theoretical aspects of a Big Data Solution. The technical aspects of building a big data solution is not covered because there are so many different architectures and technologies.
Most of the participants are managers in companies across different industries who are evaluating opportunities for using analytics to make decisions. These managers are either exploring the application of data science within their own domain or are already working with data scientists and analysts. Upon completion of the course, these managers are in a better position to drive data science projects in their context.  Most of these managers represent the business side of data science.
Gartner said there would be a shortage of 100,000 data scientists (US) by 2020. McKinsey put the national gap (US) in data scientists and others with deep analytical expertise at 140,000 to 190,000 people by 2017, resulting in demand that’s 60 percent greater than supply.
Accenture found that more than 90 percent of its clients planned to hire people with data science expertise, but more than 40 percent cited a lack of talent as the number one problem.

Other Courses


Check Out Our Other Professional Courses

PMP® Project Management Professional

Our Project Management Professional course in Singapore covers the best practices in the field of Project Management.

Lorem ipsum blah blah blah blah...

S$ 1390

iOS Application Development

We teach you everything you need to know to build great iOS apps for the iPhone, iPad devices.

S$ 1970

CCC Big Data Foundation
3 Days

We cover Big Data concepts including the business aspects, the technical aspects as well as the deployment and maintenance aspects. Lorem ipsum blah blah blah blah...

S$ 2590

Data Science Bootcamp
3 Weeks

Intensive bootcamp covers in depth concepts around data science. Lorem ipsum blah blah blah blah...

S$ 4990

Android Application Development

We cover Java programming language and then teach you the skills to build apps for devices running Android OS.

S$ 1970

CCC Cloud Technology Associate Lorem ipsum blah blah blah blah...

We cover cloud concepts related to application development. Lorem ipsum blah blah blah blah...

S$ 2970

Web Developer Bootcamp

We cover tools and techniques for full stack development which includes front end, back end and business layer.

S$ 3990

t-ACP® Agile Certified Practitioner

Our Agile covers covers SCRUM, XP and Lean. We teach you the most current Agile tools and techniques. Lorem ipsum blah blah blah blah...Lorem ipsum blah blah blah blah... blah blah blah...Lorem ipsum blah blah blah blah...blah blah blah...

Call for monthly offer

Develop iOS Mobile Applications - School Program

We teach you everything you need to know to build great iOS apps for the iPhone, iPad devices.

Call for monthly offer

iOS Application Development Short Course

We teach you everything you need to know to build great iOS apps for the iPhone, iPad devices.

S$ 990

Android Application Development Short Course

We cover Java programming language and then teach you the skills to build apps for devices running Android OS.

S$ 990

JavaScript Programming Short Course

In this course, you will learn the fundamental programming concepts and syntax of the JavaScript programming language.

S$ 490

Programming, Coding Basics for Non-IT Professionals

In this course, you will learn the basics of programming and apply Object Oriented Programming concepts.

S$ 450

Copyright 2015 iKompass. All rights reserved.