Python实现聚类算法的图像化处理

April 07, 2016

聚类算法参考这篇文章,本文是根据聚类算法得出的数据来绘制图像。

首先要对数据处理一下,在 DBSCAN 的算法中,我最后输出的 clusterID 不是连续的,为了方便做图我把所有点的 clusterID 从 0 开始按顺序排好,这段的代码是:

/*
deal.cpp


input: out.txt(x, y, clusterID)
 
9.000000 1.000000 5
10.000000 10.000000 16
2.100000 7.100000 1
1.100000 1.100000 1
1.100000 2.100000 1
1.100000 3.100000 1
1.100000 4.100000 1
8.000000 8.000000 6
9.000000 8.000000 6
1.100000 5.100000 1
2.100000 1.100000 1
2.100000 2.100000 1
2.100000 3.100000 1
2.100000 4.100000 1
2.100000 5.100000 1
2.100000 6.100000 1
8.000000 9.000000 6


output: pic.txt

2.1 7.1 0
1.1 1.1 0
1.1 2.1 0
1.1 3.1 0
1.1 4.1 0
2.1 6.1 0
2.1 5.1 0
1.1 5.1 0
2.1 1.1 0
2.1 2.1 0
2.1 3.1 0
2.1 4.1 0
9 1 1
8 9 2
9 8 2
8 8 2
10 10 3

*/
#include<cstring>
#include<cstdio>
#include<cstring>
#include<cmath>
#include<vector>
#include<string>
#include<iostream>
#include<algorithm>
using namespace std;

struct point{
    double a, b;
    int c;
}p[200000];
bool cmp(point x, point y){
    return x.c < y.c;
}
int main(){
    freopen("out.txt", "r", stdin);

    int id = 0;
    cout << id << endl;
    while(~scanf("%lf %lf %d", &p[id].a, &p[id].b, &p[id].c)){
        id++;
    }
    sort(p, p+id, cmp);
    int flag = -1, pre = -1;
    freopen("pic.txt", "w", stdout);

    for(int i = 0; i < id; i++){
        cout << p[i].a << " " << p[i].b << " ";
        if(p[i].c != pre){
            cout << ++flag << endl;
            pre = p[i].c;
        }else{
            cout << flag << endl;
        }
    }
    return 0;
}

然后用 python 做出图像就可以。

# coding=utf-8
import os
import sys
import matplotlib.pyplot as plt

# 支持8种不同颜色的点(0-7)
color_list = ['b', 'c', 'g', 'k', 'm', 'r', 'w', 'y']


def read_data(filename, xmax=11.0, ymax=11.0):
    try:
        with open(filename) as f:
            for row in f.readlines():
                x, y, n = row.split(' ')
                c = color_list[int(n)]
                draw_axes(xmax, ymax, x, y, c)

    except FileNotFoundError as e:
        print('No such file: ', e)
        sys.exit(-1)


def draw_axes(xmax, ymax, x, y, color):
    plt.axis((0, float(xmax), 0, float(ymax)))
    plt.scatter(x, y, c=color)


if __name__ == '__main__':
    filename = input('Please enter filename:')
    xmax = input('Please input xmax:')
    ymax = input('Please input ymax:')

    filename = os.getcwd() + '\\' + filename
    read_data(filename, xmax, ymax)

    plt.show()

效果:

做图的代码设置了 8 种颜色,刚才突然想到如果所需颜色很多的话,可以对 clusterID 取模来配色。这样就保证每个簇一个颜色但是会有重复的。如果这样处理的话就不需要 deal.cpp 来处理数据了。


Profile picture

Written by Armin Li , a venture capitalist. [Weibo] [Subscribe]