Gluster 部署上的 unknown error 107 问题

在尝试使用两台旧机器搭建Gluster分布式文件系统时,遭遇了unknown error 107的问题。已安装Fedora 17 32位系统,并确保了24007端口开放。尽管在/etc/hosts中设置了主机名,日志显示entry为NULL,导致list_for_each_entry()循环未正常运行。考虑可能是内存溢出或entry未正确赋值,计划通过gdb调试glusterd进程以找出原因。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

今天拿了两台旧机器(PentiumD)想搭一个分布式文件系统来玩玩,看看和HDFS对比有哪些使用上的不同。

安装应该是很容易:操作系统Fedora17 32位,留了大片的磁盘空间不做分区(51GBout of 73GB)。


然后用yum安装;

然后发现需要将glusterd 这个daemon启动……

好了,到了peer probe。几个钟都卡在unknown error 107上。


[root@gluster0 sbin]# ./gluster peer probe gluster1
Probe unsuccessful
Probe returned with unknown errno 107

* 两台机器分别在/etc/hosts上命名为 gluster0和 gluster1


查了netstat,端口24007已经打开。没有理由的。没有用DNS但已经都在/etc/hosts文件上做了登记……


日志曰:

[2013-05-08 17:34:32.369306] I [glusterd-handler.c:685:glusterd_handle_cli_probe] 0-glusterd: Received CLI probe req gluster1 24007
[2013-05-08 17:34:32.371086] I [glusterd-handler.c:428:glusterd_friend_find] 0-glusterd: Unable to find hostname: gluster1
[2013-05-08 17:34:32.371129] I [glusterd-handler.c:2245:glusterd_probe_begin] 0-glusterd: Unable to find peerinfo for host: gluster1 (24007)
[2013-05-08 17:34:32.371776] I [rpc-clnt.c:968:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600
[2013-05-08 17:34:32.380750] I [glusterd-handler.c:2227:glusterd_friend_add] 0-management: connect returned 0
[2013-05-08 17:34:32.380917] E [socket.c:1715:socket_connect_finish] 0-management: connection to  failed (No route to host)
[2013-05-08 17:34:32.381070] I [glusterd-handler.c:2423:glusterd_xfer_cli_probe_resp] 0-glusterd: Responded to CLI, ret: 0

重点是:

0-glusterd: Unable to find hostname: gluster1

-------------------

靠,上代码。编译,调试——用gdb 挂上glusterd进程。

int
glusterd_friend_find_by_hostname (const char *hoststr,
                                  glusterd_peerinfo_t  **peerinfo)
{
        int                     ret = -1;
        glusterd_conf_t         *priv = NULL;
        glusterd_peerinfo_t     *entry = NULL;
        struct addrinfo         *addr = NULL;
        struct addrinfo         *p = NULL;
        char                    *host = NULL;
        struct sockaddr_in6     *s6 = NULL;
        struct sockaddr_in      *s4 = NULL;
        struct in_addr          *in_addr = NULL;
        char                    hname[1024] = {0,};
        xlator_t                *this  = NULL;


        this = THIS;
        GF_ASSERT (hoststr);
        GF_ASSERT (peerinfo);

        *peerinfo = NULL;
        priv    = this->private;

        GF_ASSERT (priv);

        list_for_each_entry (entry, &priv->peers, uuid_list) {
                if (!strncasecmp (entry->hostname, hoststr,
                                  1024)) {

                        gf_log (this->name, GF_LOG_DEBUG,
                                 "Friend %s found.. state: %d", hoststr,
                                  entry->state.state);
                        *peerinfo = entry;
                        return 0;
                }
        }

        ret = getaddrinfo (hoststr, NULL, NULL, &addr);
        if (ret != 0) {
                gf_log (this->name, GF_LOG_ERROR,
                        "error in getaddrinfo: %s\n",
                        gai_strerror(ret));
                goto out;
        }

        for (p = addr; p != NULL; p = p->ai_next) {
                switch (p->ai_family) {
                        case AF_INET:
                                s4 = (struct sockaddr_in *) p->ai_addr;
                                in_addr = &s4->sin_addr;
                                break;
                        case AF_INET6:
                                s6 = (struct sockaddr_in6 *) p->ai_addr;
                                in_addr =(struct in_addr *) &s6->sin6_addr;
                                break;
                       default: ret = -1;
                                goto out;
                }
                host = inet_ntoa(*in_addr);

                ret = getnameinfo (p->ai_addr, p->ai_addrlen, hname,
                                   1024, NULL, 0, 0);
                if (ret)
                        goto out;

                list_for_each_entry (entry, &priv->peers, uuid_list) {
                        if (!strncasecmp (entry->hostname, host,
                            1024) || !strncasecmp (entry->hostname,hname,
                            1024)) {
                                gf_log (this->name, GF_LOG_DEBUG,
                                        "Friend %s found.. state: %d",
                                        hoststr, entry->state.state);
                                *peerinfo = entry;
                                freeaddrinfo (addr);
                                return 0;
                        }
                }
        }

out:
        gf_log (this->name, GF_LOG_DEBUG, "Unable to find friend: %s", hoststr);
        if (addr)
                freeaddrinfo (addr);
        return -1;
}

跟了一下,发现奇怪问题:entry这个局部指针变量是在哪里赋值?

entry为NULL,第一次的list_for_each_entry() 循环,可是一次都没有进去。

而过了  

ret = getaddrinfo (hoststr, NULL, NULL, &addr);

entry就莫名其妙有了值,但这个值是有问题的。



猜测,可能漏了为entry赋值,而entry应该赋值为传入的peerinfo变量的头元素;有无内存溢出?

为了验证猜测,看看这个循环的原型:

google了一下:

http://lxr.free-electrons.com/source/include/linux/list.h#L418

/**
 * list_for_each_entry  -       iterate over list of given type
 * @pos:        the type * to use as a loop cursor.
 * @head:       the head for your list.
 * @member:     the name of the list_struct within the struct.
 */
#define list_for_each_entry(pos, head, member)                          \
        for (pos = list_entry((head)->next, typeof(*pos), member);      \
             &pos->member != (head);    \
             pos = list_entry(pos->member.next, typeof(*pos), member))

原来只是一个宏定义,本质是对成员做一个for循环。




评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值