人工智能:OpenCV结合YOLO3目标检测自然世界中的物体,Python实现

AI:OpenCV结合YOLO3目标检测自然世界中的物体,Python实现

YOLO (You Only Look Once) 。YOLO的工作原理示意图:

 

使用yolo做AI目标检测,可使用yolo做好的模型和数据,首先需要到yolo v3的官方github下载cfg,weight,和name分类器。下载连接:

https://pjreddie.com/media/files/yolov3.weights
https://github.com/pjreddie/darknet/blob/master/cfg/yolov3.cfg
https://github.com/pjreddie/darknet/blob/master/data/coco.names

yolov3已经为开发者训练好一些数据模型了,直接yolov3.weights。coco.names已经做好的分类有80种物体。yolov3.cfg定义了网络结构,yolov3.weights是预训练权重。下载完成后,把上述三个文件放到一个目录文件夹下,接下来就可以用yolo v3训练好的数据进行目标检测了。

 

本例使用OpenCV的卷积神经网络(CNN)模型和一个预训练(pre-trained)模型,让机器检测人类世界中的80种类物体。

Python代码:

import numpy as np
import cv2 as cv


# 通常,序列CNN网络在最终只会给出一个输出结果,
# 在YOLO v3版本中,会输出多个预测层。每一个输出的预测层不与任何下一个层连接。
def get_output_layers(net):
    layer_names = net.getLayerNames()
    output_layers = [layer_names[arr[0] - 1] for arr in net.getUnconnectedOutLayers()]
    return output_layers


# 画框和分类文字。
def draw_rec(image, x, y, width, height, color, label, number):
    cv.rectangle(img=image, pt1=(x, y), pt2=(x + width, y + height), color=color[0], thickness=2)
    text = "{}:{:.3f}".format(label, number)
    cv.putText(image, text, (x, y - 5), cv.FONT_HERSHEY_COMPLEX, fontScale=0.5, color=color, thickness=1)


if __name__ == '__main__':
    weightsPath = "E:/code/python/yolo/yolov3.weights"
    configPath = "E:/code/python/yolo/yolov3.cfg"
    labelsPath = "E:/code/python/yolo/coco.names"
    imagePath = "./pic.jpg"

    conf_threshold = 0.5
    nms_threshold = 0.4

    LABELS = open(labelsPath).read().strip().split("\n")
    COLORS = np.random.randint(0, 255, size=(len(LABELS), 3), dtype="uint8")

    net = cv.dnn.readNetFromDarknet(configPath, weightsPath)
    net.setPreferableBackend(cv.dnn.DNN_BACKEND_OPENCV)
    net.setPreferableTarget(cv.dnn.DNN_TARGET_CPU)

    # 待检测的图像.
    image = cv.imread(imagePath)
    (H, W) = image.shape[:2]

    scale = 1 / 255

    blob = cv.dnn.blobFromImage(image, scale, size=(416, 416), swapRB=True, crop=False)
    net.setInput(blob)

    outputs = net.forward(get_output_layers(net))

    boxes = []
    confidences = []
    classIDs = []

    # 循环处理每个输出的预测层。
    for output in outputs:
        for detection in output:
            scores = detection[5:]
            classID = np.argmax(scores)
            confidence = scores[classID]

            # 过滤置信度较小的检测结果
            if confidence > 0.5:
                confidences.append(float(confidence))
                classIDs.append(classID)

                # 框的宽度和高度
                box = detection[0:4] * np.array([W, H, W, H])

                (centerX, centerY, width, height) = box.astype("int")

                x = int(centerX - (width / 2))
                y = int(centerY - (height / 2))

                boxes.append([x, y, int(width), int(height)])

    # 最大值抑制。
    idxs = cv.dnn.NMSBoxes(boxes, confidences, conf_threshold, nms_threshold)

    print(boxes)
    if len(idxs) > 0:
        for i in idxs.flatten():
            (x, y) = (boxes[i][0], boxes[i][1])
            (w, h) = (boxes[i][2], boxes[i][3])

            # 原图上绘制边框和分类.
            color = [int(c) for c in COLORS[classIDs[i]]]
            # cv.rectangle(img=image, pt1=(x, y), pt2=(x + w, y + h), color=color, thickness=1)
            # text = "{}:{:.3f}".format(LABELS[classIDs[i]], confidences[i])
            # cv.putText(image, text, (x, y - 5), cv.FONT_HERSHEY_COMPLEX, 0.5, color=color, thickness=1)
            draw_rec(image, x, y, w, h, color, LABELS[classIDs[i]], confidences[i])

    cv.imshow("image", image)
    cv.waitKey(0)

 

pic.jpg原图:

 

程序运行后,进行目标检测,把每一类检测结果用框框起来,并标出该类物体的名称,结果如下:

AI检测出三种物体:人(person),马(horse),和狗(dog)。

 

再来一张合照照片,原图:

 

运行程序,对上图进行目标检测,AI识别出两类物体,人(person)和领带(tie):

 

 

这里解释一下为什么要在程序代码中使用卷积神经网络(CNN)的最大值(极大值)抑制对输出层进行二次过滤:

    # 最大值抑制。
    idxs = cv.dnn.NMSBoxes(boxes, confidences, conf_threshold, nms_threshold)

虽然经过前期的目标检测,程序代码已经把置信度低于0.5的层去除掉,但是仍很大可能存在互相重叠的检测结果层。如图所示:

 

 

上面这两张图,左边:未经过CNN(卷积神经网络)做最大值抑制,右侧:经过CNN最大值抑制处理的。

本例基于Python3.7.4。OpenCV4.1.1.26。yolo v3。

何为coco.names?

This particular model is trained on COCO dataset (common objects in context) from Microsoft. It is capable of detecting 80 common objects. 

coco.names涵盖的80种分类:

person
bicycle
car
motorcycle
airplane
bus
train
truck
boat
traffic light
fire hydrant
stop sign
parking meter
bench
bird
cat
dog
horse
sheep
cow
elephant
bear
zebra
giraffe
backpack
umbrella
handbag
tie
suitcase
frisbee
skis
snowboard
sports ball
kite
baseball bat
baseball glove
skateboard
surfboard
tennis racket
bottle
wine glass
cup
fork
knife
spoon
bowl
banana
apple
sandwich
orange
broccoli
carrot
hot dog
pizza
donut
cake
chair
couch
potted plant
bed
dining table
toilet
tv
laptop
mouse
remote
keyboard
cell phone
microwave
oven
toaster
sink
refrigerator
book
clock
vase
scissors
teddy bear
hair drier
toothbrush

 

已标记关键词 清除标记
相关推荐
©️2020 CSDN 皮肤主题: 编程工作室 设计师:CSDN官方博客 返回首页