盒子
盒子

梯度下降实现bp神经网络(R语言)

###1.bp神经网络

一直听到过神经网络,没有机会好好学习下,研究逻辑斯谛回归时,正好延伸到神经网络,就顺带学习了下,其实原理很简单,就是设置一个隐藏层,使用输入层,对隐藏层上的每个节点进行逻辑斯谛回归,这些节点再作为输入点,对目标tag进行逻辑斯谛回归。这就是2层的神经网络,多层神经网络就是增加隐藏层的数目。

而BP算法是为了解决多层前向神经网络的权系数优化而提出来的,采用梯度下降的方法求解时,每个神经元的求解方法和logistic回归类似,就是求导时结果略有不同

两种方法的代价函数比较:

i

数据下载digits.csv


代码部分:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
library(data.table) # allows us to use function fread,
# which quickly reads data from csv files
expit <- function(x)
{
y <- 1/(1+exp(-x))
return(y)
}
# calculate the sigmoid function
train <- function(X, Y, m=100, num_iterations=2000,
learning_rate=1e-2)
{
n = dim(X)[1]
p = dim(X)[2]
sigma = .1;
alpha = matrix(rnorm(p*m)*sigma, nrow=p)
beta = matrix(rnorm(m)*sigma, nrow=m)
for(i in 1:num_iterations)
{
layer_0 = X
layer_1 = expit(layer_0%*%alpha )
layer_2 = expit(layer_1%*%beta)
layer_2_err = layer_2 - Y
layer_2_delta = layer_2_err*layer_2*(1-layer_2)
layer_1_err = layer_2_delta%*%t(beta)
layer_1_delta = layer_1_err*layer_1*(1-layer_1)
beta = beta - learning_rate*t(layer_1)%*%layer_2_delta
alpha = alpha - learning_rate*t(layer_0)%*%layer_1_delta
}
model = list(beta,alpha)
return(model)
}
accuracy <- function(p, y)
{
return(mean((p > 0.5) == (y == 1)))
}
getAccuracy <- function(model,X,Y)
{
beta = model[[1]]
alpha = model[[2]]
layer_1 = expit(X%*%alpha)
layer_2 = expit(layer_1%*%beta)
return(accuracy(layer_2, Y))
}
library(data.table) # allows us to use function fread,
# which quickly reads data from csv files
# load data
load_digits <- function(subset=NULL, normalize=TRUE) {
#Load digits and labels from digits.csv.
#Args:
#subset: A subset of digit from 0 to 9 to return.
#If not specified, all digits will be returned.
#normalize: Whether to normalize data values to between 0 and 1.
#Returns:
#digits: Digits data matrix of the subset specified.
#The shape is (n, p), where
#n is the number of examples,
#p is the dimension of features.
#labels: Labels of the digits in an (n, ) array.
#Each of label[i] is the label for data[i, :]
# load digits.csv, adopted from sklearn.
df <- fread("digits.csv")
df <- as.matrix(df)
## only keep the numbers we want.
if (length(subset)>0) {
c <- dim(df)[2]
l_col <- df[,c]
index = NULL
for (i in 1:length(subset)){
number = subset[i]
index = c(index,which(l_col == number))
}
sort(index)
df = df[index,]
}
# convert to arrays.
digits = df[,-1]
labels = df[,c]
# Normalize digit values to 0 and 1.
if (normalize == TRUE) {
digits = digits - min(digits)
digits = digits/max(digits)}
# Change the labels to 0 and 1.
for (i in 1:length(subset)) {
labels[labels == subset[i]] = i-1
}
return(list(digits, labels))
}
split_samples <- function(digits,labels) {
# Split the data into a training set (70%) and a testing set (30%).
num_samples <- dim(digits)[1]
num_training <- round(num_samples*0.7)
indices = sample(1:num_samples, size = num_samples)
training_idx <- indices[1:num_training]
testing_idx <- indices[-(1:num_training)]
return (list(digits[training_idx,], labels[training_idx],
digits[testing_idx,], labels[testing_idx]))
}
#====================================
# Load digits and labels.
result = load_digits(subset=c(3, 5), normalize=TRUE)
digits = result[[1]]
labels = result[[2]]
result = split_samples(digits,labels)
training_digits = result[[1]]
training_labels = result[[2]]
testing_digits = result[[3]]
testing_labels = result[[4]]
# print dimensions
length(training_digits)
length(testing_digits)
# Train a net and display training accuracy.
model1 = train(training_digits, training_labels)
trainingaccuracy = getAccuracy(model1, training_digits, training_labels)
testingaccuracy = getAccuracy(model1, testing_digits, testing_labels)

再附上逻辑斯谛回归的训练函数的代码

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
train <- function(X, Y, m=100, num_iterations=4000,
learning_rate=1e-1)
{
n = dim(X)[1]
p = dim(X)[2]+1
X1 = cbind(rep(1, n), X)
sigma = .1;
beta = matrix(rnorm(p)*sigma, nrow=p)
for(i in 1:num_iterations)
{
output = expit(X1%*%beta)
error = Y - output
beta = beta + learning_rate*t(X1)%*%error
}
return(beta)
}

这里神经网络的learning_rate之前我也设成learning_rate=1e-1, 结果训练一直出错,准确率一直是0.5左右,layer_2的结果都是0.999,就是学习步长太大的原因吧。