Saturday, September 30, 2017

Decision Tree Models Based on Matthew North Book’s for the datasets: eReader-Training and eReader-Scoring Analysis (Assignment 4)

Muhammad Bambang Matofani 9:47 PM 1 comment

Abstract

This analysis was made to fulfill the assignment of big data subject and to know the unique information from the data example.

The Attributes:

User_ID: A numeric, unique identifier assigned to each person who has an account on the company’s web site.
Gender: The customer’s gender, as identified in their customer account. In this data set, it is recorded a ‘M’ for male and ‘F’ for Female. The Decision Tree operator can handle non-numeric data types.
Age: The person’s age at the time the data were extracted from the web site’s database. This is calculated to the nearest year by taking the difference between the system date and the person’s birthdate as recorded in their account.
Marital_Status: The person’s marital status as recorded in their account. People who indicated on their account that they are married are entered in the data set as ‘M’. Since the web site does not distinguish single types of people, those who are divorced or widowed are included with those who have never been married (indicated in the data set as ‘S’).
Website_Activity: This attribute is an indication of how active each customer is on the company’s web site. Working with Richard, we used the web site database’s information which records the duration of each customers visits to the web site to calculate how frequently, and for how long each time, the customers use the web site. This is then translated into one of three categories: Seldom, Regular, or Frequent.
Browsed_Electronics_12Mo: This is simply a Yes/No column indicating whether or not the person browsed for electronic products on the company’s web site in the past year.
Bought_Electronics_12Mo: Another Yes/No column indicating whether or not they purchased an electronic item through Richard’s company’s web site in the past year.
Bought_Digital_Media_18Mo: This attribute is a Yes/No field indicating whether or not the person has purchased some form of digital media (such as MP3 music) in the past year and a half. This attribute does not include digital book purchases.
Bought_Digital_Books: Richard believes that as an indicator of buying behavior relative to the company’s new eReader, this attribute will likely be the best indicator. Thus, this attribute has been set apart from the purchase of other types of digital media. Further, this attribute indicates whether or not the customer has ever bought a digital book, not just in the past year or so.
Payment_Method: This attribute indicates how the person pays for their purchases. In cases where the person has paid in more than one way, the mode, or most frequent method of payment is used. There are four options:

Bank Transfer : payment via e-check or other form of wire transfer directly from the bank to the company.
Website Account : the customer has set up a credit card or permanent electronic funds transfer on their account so that purchases are directly charged through their account at the time of purchase.
Credit Card : the person enters a credit card number and authorization each time they purchase something through the site.
Monthly Billing : the person makes purchases periodically and receives a paper or electronic bill which they pay later either by mailing a check or through the company web site’s payment system.

Steps:

1. Open the Rapid Miner with a new blank space

2. Input data of 7 - eReaderAdoption-Training.csv and 7 - eReaderAdoption-Scoring.csv

Here is our data example

3. Designing the process

The first thing we need to do is drag the Training and Scoring Data from the repository to the process page, and pick the set role from the operations menu, like the picture below:

We set the attribute name is “User_ID” and for the target role is “id” on the “Set Role” operation like the picture above.

After setting the set role of the user_id, add one more “Set Role” process from the operation menu, like before and take it to the Training Process line.

Re-set again the “Set Role” but the attribute name is “eReader_Adoption” and for the target role is “label” like the previous one.

4. Designing decision tree

Take the “Decision Tree” process from the operation menu and choose the “gain_ratio” for the criterion. It means we use the basic Decision Tree (C4.5). After put the “Decision Tree” on the Training line we put again the “Apply Model” process it also from the operation menu.

Results:

Frequent Decision Tree

Frequent Decision Tree Description

Regular Decision Tree

Regular Decision Tree Description

Seldom Decision Tree

Seldom Decision Tree Description

Adoption Graph

Payment Method Graph

Conclusion:

After we collect all the data above, we have information about people behavior of payment method. People mostly seldom to adopt a new payment method, and mostly people using website account and bank transfer as their payment method.

Continue reading →

Sunday, September 24, 2017

Prediction Model using Rapid Miner (Assignment 3)

Muhammad Bambang Matofani 7:40 AM 9 comments

RapidMiner Studio is a powerful visual programming environment for rapidly building complete predictive analytic workflows. This all-in-one tool features hundreds of pre-defined data preparation and machine learning algorithms to efficiently support all your data science needs. It is can be used for business and commercial applications as well as for research, education, training, rapid prototyping, and application development and supports all steps of the data mining process including data preparation, results visualisation, validation and optimization. I use RapidMiner for my tools to finish my assignment about create a prediction model using data training on data pemilu that given by my lecture.

In this case, we have to predict weather the legistative candidates is elected or not using using the following
algorithm :

1. Decision Tree (C4.5)
2. Naïve Bayes (NB)
3. K-Nearest Neighbor (K-NN)

And for th evaluation / accuracy testing, we are asked to use 10-fold X Validation.

This diagram shown the flow of my activity from inputting the data, setting the role and creating the visualization.

The steps are:

Open Rapidminer and click new process and open with the blank space.
Then you can start it.
In operators menu search ‘Read Excel’, ‘Set Role’, and ‘Validations’ then drag and drop to the Process window.
Double click on ‘Read Excel’ then input ‘datapemilukpu.xls’.
Click on ‘Set Role’ and edit on parameters box, input “TERPILIH ATAU TIDAK” and change Target Role become Label.
Connect the operators dot to dot.
Double click on ‘Validation’ to open the Training and Testing.
In operator box, search ‘Decision Tree’/’K-NN’/’Naive Bayes’ then drag and drop in the Training space.
In operator box, search ‘Apply Model’ and ‘Performance’ then drag and drop in Testing space.
10. Then connect the operators.

Decision Tree

Naive Bayes

K-NN

Refrences

https://rapidminer.com/products/studio/

https://kevinbwstudenttelkomuniversity.wordpress.com/2016/10/23/big-data-task-evaluation-prediction-elektabilitas-caleg/

Continue reading →

Saturday, September 16, 2017

IMPLEMENTING BIG DATA ANALYSIS TO GET INFORMATION ABOUT MYSELF IN DOTA 2 (Assignment 2)

Muhammad Bambang Matofani 11:52 PM No comments

IMPLEMENTING BIG DATA ANALYSIS

TO GET INFORMATION ABOUT MYSELF

IN DOTA 2

· Abstract:

Dota 2 is the most played game in the world. Dota 2 has 113 heroes with unique skills and dozens of items. DotA 2 is the most attractive game that should be played by everyone.

And with various heroes and items in Dota 2, I tried to make a research using big data. I already played 700 games so far in Dota 2, and I want to know unique information that comes up after played 700 games.

In this case, I am using dota buff to help me provide all the data that I need to find out my unique information. And here is some information that I got.

· Win rate by activity by day of week:

Here is the result of my win rate by activity by day by week. The higher the bar means the more often I played in that day. As you can see, my highest win rates is on Saturday with 53% win rates, followed by Monday with 50% of win rates. The rest of it, I often got loose than win.

· Win rate by activity by hour of day :

Here is the result of my win rate by activity by hour of day. The higher the bar means the more often I played in that day. As you can see, my highest win rates are at 1a.m followed by 9a.m, 9p.m, and 7p.m. The rest of it, I often got loose rather than win.

· Most played heroes with win rates:

This picture shows us my top 10 most played heroes all the time. The top 3 of the tables is my signature heroes. It means that the most heroes I used all the time. As you can see, my highest win rate heroes is witch doctor with 67.5% win rates, followed by Warlock with 65.22% win rates, Shadow Fiend 52.38% win rates, and viper with 51.43% win rates.

· Conclusions:

After I got all the in formations, now I know if I want to win the game. When I should play dota 2, and what hero should I play.

Continue reading →

Sunday, September 10, 2017

IMPLEMENTATION OF BIG DATA FOR DISASTER MANAGEMENT (Assignment 1)

Muhammad Bambang Matofani 8:45 AM No comments

Objective

A core objective of this plan is disaster recovery to gained and monitored
recorded as an evaluation and analysis material for the future can be done
disaster recovery better.

Problems

Indonesia is located in disaster prone areas, can be considered as laboratory
Disasters, due to geographical, geological, and demographic conditions. Disaster intensity
increase and become more complex, must use multi-sectoral and
multi-disciplinary approach, in an integrated manner
and in a coordinated manner. This emphasizes the need for a disaster management system.
Law No. 24/2007 on Disaster Management as a basis for
developing the National Countermeasures System

Solution Idea

The position of Indonesia which lies between the Asian continent plate and the Australian continent,
as well as being on the volcano ranks prompted Indonesia to potentially occur
earthquakes, both earthquakes caused by continental plate shifts, as well as earthquakes
due to volcanic activity or tectonic earthquakes. While Indonesia is
in the form of an archipelago, causing Indonesia to have the potential of a tsunami such as a disaster
the devastating tsunami that once occurred in Aceh. So the data like this become very
large and large are administered as study materials or analysis that can be used as
basic decision-making.

Methodology

In this case, The writer using study case. We can see the data that the writer served.

Measurement

Big Data Utilization combined with disaster and prevention disaster for further analysis, modeling and computing capabilities.
Increased Resilience and Resilience of information technology for enable real time sensing, visualization, analysis, experimentation and prediction, and sensitive decision-making for all critical circumstances.
Development of fundamental knowledge and innovation for resilience and the sustainability of civil infrastructure and the network of distributed infrastructure.

Source: http://suyatno.dosen.akademitelkom.ac.id/wp-content/uploads/2015/11/Master-Plan-Big-Data-dan-Manajemen-Bencana.pdf

Continue reading →

Menu

Saturday, September 30, 2017

Decision Tree Models Based on Matthew North Book’s for the datasets: eReader-Training and eReader-Scoring Analysis (Assignment 4)

Abstract

The Attributes:

Sunday, September 24, 2017

Prediction Model using Rapid Miner (Assignment 3)

Saturday, September 16, 2017

IMPLEMENTING BIG DATA ANALYSIS TO GET INFORMATION ABOUT MYSELF IN DOTA 2 (Assignment 2)

Sunday, September 10, 2017

IMPLEMENTATION OF BIG DATA FOR DISASTER MANAGEMENT (Assignment 1)

Popular Posts

Blogger templates

Blogroll

Categories

Total Pageviews

Blog Archive

Blog Archive

About Me

Followers

Saturday, September 30, 2017

Abstract

The Attributes:

Sunday, September 24, 2017

Saturday, September 16, 2017

Sunday, September 10, 2017

Popular Posts

Blogger templates

Blogroll

Categories

Total Pageviews

Blog Archive

Blog Archive

About Me

Subscribe To

Followers