今天拿了两台旧机器(PentiumD)想搭一个分布式文件系统来玩玩,看看和HDFS对比有哪些使用上的不同。
安装应该是很容易:操作系统Fedora17 32位,留了大片的磁盘空间不做分区(51GBout of 73GB)。
然后用yum安装;
然后发现需要将glusterd 这个daemon启动……
好了,到了peer probe。几个钟都卡在unknown error 107上。
[root@gluster0 sbin]# ./gluster peer probe gluster1 Probe unsuccessful Probe returned with unknown errno 107
* 两台机器分别在/etc/hosts上命名为 gluster0和 gluster1
查了netstat,端口24007已经打开。没有理由的。没有用DNS但已经都在/etc/hosts文件上做了登记……
日志曰:
[2013-05-08 17:34:32.369306] I [glusterd-handler.c:685:glusterd_handle_cli_probe] 0-glusterd: Received CLI probe req gluster1 24007
[2013-05-08 17:34:32.371086] I [glusterd-handler.c:428:glusterd_friend_find] 0-glusterd: Unable to find hostname: gluster1
[2013-05-08 17:34:32.371129] I [glusterd-handler.c:2245:glusterd_probe_begin] 0-glusterd: Unable to find peerinfo for host: gluster1 (24007)
[2013-05-08 17:34:32.371776] I [rpc-clnt.c:968:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600
[2013-05-08 17:34:32.380750] I [glusterd-handler.c:2227:glusterd_friend_add] 0-management: connect returned 0
[2013-05-08 17:34:32.380917] E [socket.c:1715:socket_connect_finish] 0-management: connection to failed (No route to host)
[2013-05-08 17:34:32.381070] I [glusterd-handler.c:2423:glusterd_xfer_cli_probe_resp] 0-glusterd: Responded to CLI, ret: 0
重点是:
0-glusterd: Unable to find hostname: gluster1
-------------------
靠,上代码。编译,调试——用gdb 挂上glusterd进程。
看
int
glusterd_friend_find_by_hostname (const char *hoststr,
glusterd_peerinfo_t **peerinfo)
{
int ret = -1;
glusterd_conf_t *priv = NULL;
glusterd_peerinfo_t *entry = NULL;
struct addrinfo *addr = NULL;
struct addrinfo *p = NULL;
char *host = NULL;
struct sockaddr_in6 *s6 = NULL;
struct sockaddr_in *s4 = NULL;
struct in_addr *in_addr = NULL;
char hname[1024] = {0,};
xlator_t *this = NULL;
this = THIS;
GF_ASSERT (hoststr);
GF_ASSERT (peerinfo);
*peerinfo = NULL;
priv = this->private;
GF_ASSERT (priv);
list_for_each_entry (entry, &priv->peers, uuid_list) {
if (!strncasecmp (entry->hostname, hoststr,
1024)) {
gf_log (this->name, GF_LOG_DEBUG,
"Friend %s found.. state: %d", hoststr,
entry->state.state);
*peerinfo = entry;
return 0;
}
}
ret = getaddrinfo (hoststr, NULL, NULL, &addr);
if (ret != 0) {
gf_log (this->name, GF_LOG_ERROR,
"error in getaddrinfo: %s\n",
gai_strerror(ret));
goto out;
}
for (p = addr; p != NULL; p = p->ai_next) {
switch (p->ai_family) {
case AF_INET:
s4 = (struct sockaddr_in *) p->ai_addr;
in_addr = &s4->sin_addr;
break;
case AF_INET6:
s6 = (struct sockaddr_in6 *) p->ai_addr;
in_addr =(struct in_addr *) &s6->sin6_addr;
break;
default: ret = -1;
goto out;
}
host = inet_ntoa(*in_addr);
ret = getnameinfo (p->ai_addr, p->ai_addrlen, hname,
1024, NULL, 0, 0);
if (ret)
goto out;
list_for_each_entry (entry, &priv->peers, uuid_list) {
if (!strncasecmp (entry->hostname, host,
1024) || !strncasecmp (entry->hostname,hname,
1024)) {
gf_log (this->name, GF_LOG_DEBUG,
"Friend %s found.. state: %d",
hoststr, entry->state.state);
*peerinfo = entry;
freeaddrinfo (addr);
return 0;
}
}
}
out:
gf_log (this->name, GF_LOG_DEBUG, "Unable to find friend: %s", hoststr);
if (addr)
freeaddrinfo (addr);
return -1;
}
跟了一下,发现奇怪问题:entry这个局部指针变量是在哪里赋值?
entry为NULL,第一次的list_for_each_entry() 循环,可是一次都没有进去。
而过了
ret = getaddrinfo (hoststr, NULL, NULL, &addr);
entry就莫名其妙有了值,但这个值是有问题的。
猜测,可能漏了为entry赋值,而entry应该赋值为传入的peerinfo变量的头元素;有无内存溢出?
为了验证猜测,看看这个循环的原型:
google了一下:
http://lxr.free-electrons.com/source/include/linux/list.h#L418
/**
* list_for_each_entry - iterate over list of given type
* @pos: the type * to use as a loop cursor.
* @head: the head for your list.
* @member: the name of the list_struct within the struct.
*/
#define list_for_each_entry(pos, head, member) \
for (pos = list_entry((head)->next, typeof(*pos), member); \
&pos->member != (head); \
pos = list_entry(pos->member.next, typeof(*pos), member))
原来只是一个宏定义,本质是对成员做一个for循环。