Python中的遇到的错误（持续更新）

发布时间：2024-11-10 11:00

1、TypeError: 'dict_keys' object does not support indexing

机器学习实战第三章决策树中遇到的，主要是Python的版本问题，下面这段是Python2的写法：

firstStr = myTree.keys()[0]

Python3：先转换成list

firstStr = list(myTree.keys())[0]

2、TypeError: write() argument must be str, not bytes

使用pickle存储的时候出现错误

错误代码：

try:

with open(fileName, 'w') as fw:

pickle.dump(inputTree, fw)

except IOError as e:

print("File Error : " + str(e))

错误原因：pickle的存储方式默认是二进制

修正：

try:

with open(fileName, 'wb') as fw:

pickle.dump(inputTree, fw)

except IOError as e:

print("File Error : " + str(e))

3、UnicodeDecodeError: 'gbk' codec can't decode byte 0xae in position 199: illegal multibyte sequence

文件中包含了非法字符，gbk无法解析

def spamTest():

docList = []

classList = []

fullList = []

for i in range(1, 26):

wordList = textParse(open('email/spam/%d.txt' % i).read())

docList.append(wordList)

fullList.extend(wordList)

classList.append(1)

wordList = textParse(open('email/ham/%d.txt' % i).read())

docList.append(wordList)

fullList.extend(wordList)

classList.append(0)

vocabList = bayes.createVocabList(docList)

trainingSet = list(range(50))

testSet = []

for i in range(10):

randIndex = int(random.uniform(0, len(trainingSet)))

testSet.append(trainingSet[randIndex])

del trainingSet[randIndex]

trainMat = []

trainClasses = []

for docIndex in trainingSet:

trainMat.append(bayes.setOfWords2Vec(vocabList, docList[docIndex]))

trainClasses.append(classList[docIndex])

p0V, p1V, pSpam = bayes.trainNB0(array(trainMat), array(trainClasses))

errorCount = 0

for docIndex in testSet:

wordVector = bayes.setOfWords2Vec(vocabList, docList[docIndex])

if bayes.classifyNB(array(wordVector), p0V, p1V, pSpam) != classList[docIndex]:

errorCount += 1

print('the error rate is:', float(errorCount) / len(testSet))

1、尝试使用比gbk包含字符更多的gb18030,卒

wordList = textParse(open('email/ham/%d.txt' % i, encoding='gb18030').read())

2、忽略错误，通过

wordList = textParse(open('email/ham/%d.txt' % i, encoding='gb18030', errors='ignore').read())

3、打开文件看看哪个是非法字符，我选择放弃

4、TypeError: 'range' object doesn't support item deletion

def spamTest():

docList = []

classList = []

fullList = []

for i in range(1, 26):

wordList = textParse(open('email/spam/%d.txt' % i, encoding='gb18030', errors='ignore').read())

docList.append(wordList)

fullList.extend(wordList)

classList.append(1)

wordList = textParse(open('email/ham/%d.txt' % i, encoding='gb18030', errors='ignore').read())

docList.append(wordList)

fullList.extend(wordList)

classList.append(0)

vocabList = bayes.createVocabList(docList)

trainingSet = range(50)

testSet = []

for i in range(10):

randIndex = int(random.uniform(0, len(trainingSet)))

testSet.append(trainingSet[randIndex])

del trainingSet[randIndex]

trainMat = []

trainClasses = []

for docIndex in trainingSet:

trainMat.append(bayes.setOfWords2Vec(vocabList, docList[docIndex]))

trainClasses.append(classList[docList])

p0V, p1V, pSpam = bayes.trainNB0(array(trainMat), array(trainClasses))

errorCount = 0

for docIndex in testSet:

wordVector = bayes.setOfWords2Vec(vocabList, docList[docIndex])

if bayes.classifyNB(array(wordVector), p0V, p1V, pSpam) != classList[docIndex]:

errorCount += 1

print('the error rate is:', float(errorCount) / len(testSet))

python3.x , 出现错误 'range' object doesn't support item deletion

原因：python3.x range返回的是range对象，不返回数组对象

解决方法：

把 trainingSet = range(50) 改为 trainingSet = list(range(50))

5、TypeError: 'numpy.float64' object cannot be interpreted as an integer

出错代码：随机梯度上升算法

def stocGradAscent0(dataMatrix, classLabels):

m, n = shape(dataMatrix)

alpha = 0.01

weights = ones(n)

for i in range(m):

h = sigmoid(sum(dataMatrix[i] * weights))

error = classLabels[i] - h

weights = weights + alpha * error * dataMatrix[i]

return weights

出错原因：error 是一个float64，

weights ：<class 'numpy.ndarray'>

dataMatrix[i] ：<class 'list'>

在Python中，如果是一个整型n乘以一个列表L，列表长度会变成n*len(L)，而当你用一个浮点数乘以一个列表，自然而然也就出错了，而且我们要的也不是这个结果，而是对于当前向量的每一位乘上一个error。

其实这地方就是Python 中的list和numpy的array混用的问题，对dataMatrix进行强制类型转换就行了(也可以在参数传递之前进行转换，吐槽Python的类型机制)

def stocGradAscent0(dataMatrix, classLabels):

dataMatrix = array(dataMatrix)

m, n = shape(dataMatrix)

alpha = 0.01

weights = ones(n)

for i in range(m):

h = sigmoid(sum(dataMatrix[i] * weights))

error = classLabels[i] - h

weights = weights + alpha * error * dataMatrix[i]

return weights

6. copy和copy.deepcopy

copy对于一个复杂对象的子对象并不会完全复制，什么是复杂对象的子对象呢？就比如序列里的嵌套序列，字典里的嵌套序列等都是复杂对象的子对象。对于子对象，python会把它当作一个公共镜像存储起来，所有对他的复制都被当成一个引用，所以说当其中一个引用将镜像改变了之后另一个引用使用镜像的时候镜像已经被改变了。

deepcopy的时候会将复杂对象的每一层复制一个单独的个体出来。