Thứ Hai, 21 tháng 12, 2015

Predicting how long to replace N servers node

My customer (a Data Center) give me a small dataset that contain time to replace N servers node.
I need to analysis and build predicting program that can predict how long to replace N servers node.

Input Training:

  •  Data of "p027.txt" show below:
Minutes Units
23    1 
29    2 
49    3 
64    4 
74    4 
87    5 
96    6 
97    6  
109   7  
119   8  
149   9  
145   9  
154   10  
166   10 

Example Input test:


2
40
8
10

Example Output:

Predicting minutes for units


35
622
128
159

Visualization about Model:  


### Code:
import graphlab
# if not csv format, need specify delimiter
sf = graphlab.SFrame.read_csv('http://www.ats.ucla.edu/stat/examples/chp/p027.txt',delimiter='\t')
graphlab.canvas.set_target('ipynb')
sf.show(view="Scatter Plot", x="Minutes", y="Units")

Finally, what i need:  SFrame - Python

Resources you will need:

  • Graphlab library
  • SFrame library

Example Code:


import graphlab
sf = graphlab.SFrame.read_csv('http://www.ats.ucla.edu/stat/examples/chp/p027.txt',delimiter='\t')
my_features1 = ["Units"]

my_features_model = graphlab.linear_regression.create(sf,target='Minutes',features=my_features1,validation_set=None)

repair_time = {
                           'Units':10
             }

print my_features_model.predict(repair_time)

Finish. Classify the problem to become linear regression single value. It's my simple way to solve.

Chủ Nhật, 20 tháng 12, 2015

Predicting house prices

My fucking boss give me the fucking challenge "how to predict house prices". I wondered what to do... It's difficult because i don't know what is the feature1 and feature2. How to start?

Input Traning:

  • Content Traning.csv file:

features1,features2,price
0.44,0.68,511.14
0.99,0.23,717.1
0.84,0.29,607.91
0.28,0.45,270.4
0.07,0.83,289.88
0.66,0.8,830.85
0.73,0.92,1038.09
0.57,0.43,455.19
0.43,0.89,640.17
0.27,0.95,511.06
0.43,0.06,177.03
0.87,0.91,1242.52
0.78,0.69,891.37
0.9,0.94,1339.72
0.41,0.06,169.88
0.52,0.17,276.05
0.47,0.66,517.43
0.65,0.43,522.25
0.85,0.64,932.21
0.93,0.44,851.25
0.41,0.93,640.11
0.36,0.43,308.68
0.78,0.85,1046.05
0.69,0.07,332.4
0.04,0.52,171.85
0.17,0.15,109.55
0.68,0.13,361.97
0.84,0.6,872.21
0.38,0.4,303.7
0.12,0.65,256.38
0.62,0.17,341.2
0.79,0.97,1194.63
0.82,0.04,408.6
0.91,0.53,895.54
0.35,0.85,518.25
0.57,0.69,638.75
0.52,0.22,301.9
0.31,0.15,163.38
0.6,0.02,240.77
0.99,0.91,1449.05
0.48,0.76,609.0
0.3,0.19,174.59
0.58,0.62,593.45
0.65,0.17,355.96
0.6,0.69,671.46
0.95,0.76,1193.7
0.47,0.23,278.88
0.15,0.96,411.4
0.01,0.03,42.08
0.26,0.23,166.19
0.01,0.11,58.62
0.45,0.87,642.45
0.09,0.97,368.14
0.96,0.25,702.78
0.63,0.58,615.74
0.06,0.42,143.79
0.1,0.24,109.0
0.26,0.62,328.28
0.41,0.15,205.16
0.91,0.95,1360.49
0.83,0.64,905.83
0.44,0.64,487.33
0.2,0.4,202.76
0.43,0.12,202.01
0.21,0.22,148.87
0.88,0.4,745.3
0.31,0.87,503.04
0.99,0.99,1563.82
0.23,0.26,165.21
0.79,0.12,438.4
0.02,0.28,98.47
0.89,0.48,819.63
0.02,0.56,174.44
0.92,0.03,483.13
0.72,0.34,534.24
0.3,0.99,572.31
0.86,0.66,957.61
0.47,0.65,518.29
0.79,0.94,1143.49
0.82,0.96,1211.31
0.9,0.42,784.74
0.19,0.62,283.7
0.7,0.57,684.38
0.7,0.61,719.46
0.69,0.0,292.23
0.98,0.3,775.68
0.3,0.08,130.77
0.85,0.49,801.6
0.73,0.01,323.55
1.0,0.23,726.9
0.42,0.94,661.12
0.49,0.98,771.11
0.89,0.68,1016.14
0.22,0.46,237.69
0.34,0.5,325.89
0.99,0.13,636.22
0.28,0.46,272.12
0.87,0.36,696.65
0.23,0.87,434.53
0.77,0.36,593.86

Input test


0.49 0.18
0.57 0.83
0.56 0.64
0.76 0.18

Output:

Predicting house price

105.22
142.68
132.94
129.71

Finally, i found what i need:  SFrame - Python

Resources you will need:

  • Graphlab library
  • SFrame library

Example Code:

Bill gates' house 

import graphlab
sf = graphlab.SFrame('Training.csv')
my_features1 = ["features1","features2"]
my_features_model = graphlab.linear_regression.create(sf,target='price',features=my_features1,validation_set=None)
bill_gates = {
                           'features1':0.49,
                           'features2':0.18
                    }
print my_features_model.predict(bill_gates)

Done. i classify this challenge to become linear regression multi values. It's my simple way to solve. Your simple some way?