第四个lab,纠结了好久,仍然提高不多,欢迎拍砖.
解决思路:
rotate:
1.个人感觉写不命中的惩罚会更高一点,所以把按顺序读的逻辑改成了写按顺序;
2.为了充分利用一级缓存(32KB), 采用分块策略, 每一个块大小为32.
提升约6.5倍
/*
* rotate - Your current working version of rotate
* IMPORTANT: This is the version you will be graded on
* size of cache 1: 32K
* size of pixel : 6B
*/
char rotate_descr[] = "rotate: Current working version";
void rotate(int dim, pixel *src, pixel *dst)
{
int i, j, i1, j1, im, jm;
int block=32;//blocking the Matrix
for(i=0; i<dim; i+=block)
for(j=0; j<dim; j+=block)
{
//block*block mini matrix
im = i+block;
for(i1=i; i1<i+block; i1++) {
jm = j+block;
for(j1=j; j1<j+block; j1++)
dst[RIDX(i1, j1, dim)] = src[RIDX(j1, dim-i1-1, dim)];
}
}
}
smooth:
1.保存需要重复利用的计算的结果, 查表法
提升约12倍
char smooth_descr1[] = "smooth: Storing reused results.";
void smooth1(int dim, pixel *src, pixel *dst)
{
pixel_sum rowsum[530][530];
int i, j, snum;
for(i=0; i<dim; i++)
{
rowsum[i][0].red = (src[RIDX(i, 0, dim)].red+src[RIDX(i, 1, dim)].red);
rowsum[i][0].blue = (src[RIDX(i, 0, dim)].blue+src[RIDX(i, 1, dim)].blue);
rowsum[i][0].green = (src[RIDX(i, 0, dim)].green+src[RIDX(i, 1, dim)].green);
rowsum[i][0].num = 2;
for(j=1; j<dim-1; j++)
{
rowsum[i][j].red = (src[RIDX(i, j-1, dim)].red+src[RIDX(i, j, dim)].red+src[RIDX(i, j+1, dim)].red);
rowsum[i][j].blue = (src[RIDX(i, j-1, dim)].blue+src[RIDX(i, j, dim)].blue+src[RIDX(i, j+1, dim)].blue);
rowsum[i][j].green = (src[RIDX(i, j-1, dim)].green+src[RIDX(i, j, dim)].green+src[RIDX(i, j+1, dim)].green);
rowsum[i][j].num = 3;
}
rowsum[i][dim-1].red = (src[RIDX(i, dim-2, dim)].red+src[RIDX(i, dim-1, dim)].red);
rowsum[i][dim-1].blue = (src[RIDX(i, dim-2, dim)].blue+src[RIDX(i, dim-1, dim)].blue);
rowsum[i][dim-1].green = (src[RIDX(i, dim-2, dim)].green+src[RIDX(i, dim-1, dim)].green);
rowsum[i][dim-1].num = 2;
}
for(j=0; j<dim; j++)
{
snum = rowsum[0][j].num+rowsum[1][j].num;
dst[RIDX(0, j, dim)].red = (unsigned short)((rowsum[0][j].red+rowsum[1][j].red)/snum);
dst[RIDX(0, j, dim)].blue = (unsigned short)((rowsum[0][j].blue+rowsum[1][j].blue)/snum);
dst[RIDX(0, j, dim)].green = (unsigned short)((rowsum[0][j].green+rowsum[1][j].green)/snum);
for(i=1; i<dim-1; i++)
{
snum = rowsum[i-1][j].num+rowsum[i][j].num+rowsum[i+1][j].num;
dst[RIDX(i, j, dim)].red = (unsigned short)((rowsum[i-1][j].red+rowsum[i][j].red+rowsum[i+1][j].red)/snum);
dst[RIDX(i, j, dim)].blue = (unsigned short)((rowsum[i-1][j].blue+rowsum[i][j].blue+rowsum[i+1][j].blue)/snum);
dst[RIDX(i, j, dim)].green = (unsigned short)((rowsum[i-1][j].green+rowsum[i][j].green+rowsum[i+1][j].green)/snum);
}
snum = rowsum[dim-1][j].num+rowsum[dim-2][j].num;
dst[RIDX(dim-1, j, dim)].red = (unsigned short)((rowsum[dim-2][j].red+rowsum[dim-1][j].red)/snum);
dst[RIDX(dim-1, j, dim)].blue = (unsigned short)((rowsum[dim-2][j].blue+rowsum[dim-1][j].blue)/snum);
dst[RIDX(dim-1, j, dim)].green = (unsigned short)((rowsum[dim-2][j].green+rowsum[dim-1][j].green)/snum);
}
}
让我纠结的问题:
1.第二题昨天一开始就想到要把重复计算的部分保存起来,但是算法实现后一直是segmentation fault, 在这个系统里又没法调试,我实在不知道如何解决了,纠结了好长时间,不得不暂时放弃了. 今天突然发现原来的类型转换没有把后面整个表达式括起来,可能导致后面运算后结果仍为int型,赋值时发生了错误...
2第二题segmentation fault解决之后,又发现一个问题就是总是有好多算出来的结果是错误的,而且与正确结果只相差1. 想了好久终于明白了,是因为我把除法分到了两个部分分别计算(比如/4变成了两次/2),导致舍入的时候出现了误差.