竞态通常是在对共享资源的访问的时候产生的。当两个执行线程需要访问相同的数据结构时,就会出现。
所以应该尽量少的使用共享数据。
临界区:在任意的时刻,代码只能被一个线程执行。
休眠:当linux进程到达某个时间点,此时他不能进行任何处理时,它将进入休眠状态,这将把处理器让给其他执行线程直到将来它能够继续完成自己的处理为止。
Linux信号量的实现
头文件<asm/semaphore.h>
信号量的声明和初始化:
直接创建信号量:
void sema_init(struct semaphore *sem, int val);
信号量通常被用于互斥模式:
DECLARE_MUTEX(name);初始化为1
DECLARE_MUTEX_LOCKED(name);初始化为0
在运行时初始化:
void init_MUTEX(struct semapthore *sem);
void init_MUTEX_LOCKED(struct semaphore *sem);
在linux世界中P操作被称为down,这里指的是减小信号量的值。
void dowm(struce semaphore *sem);减少信号量的值,并在必要时一直等待(进入休眠状态)。
int down_interruptible(struct semaphore *sem);允许等待某个信号量的进程可被用户中断。中断版本几乎是我们始终要用的版本。使用该函数时,如果操作被中断,该函数返回非零值,而调用者不会拥有该信号。对该函数的正确使用需要始终检查返回值,并作出相应的响应。
int down_trylock(struct semaphore *sem);如果信号量不可获得,函数立即返回非零值。
V操作,释放信号量
void up(struct semaphore *sem);
eg:scull中使用信号量
if (down_interruptible(&dev->sem))
return -ERESTARTSYS;
注意返回值的检查,如果它返回非零值,则说明操作被中断。在这种情况下,通常要做的工作是返回-ERESTARTSYS。在见到这个返回代码后,内核的高层代码要么会从头重新启动该调用,要么会将错误返回给用户。
自旋锁
<linux/spinlock.h>锁的类型为spinlock_t
初始化锁
spinlock_t my_lock = SPIN_LOCK_UNLOCKED;定义的时候初始化
void spin_lock_init(spinlock_t *lock);运行的时候初始化
void spin_lock(spinlock_t *lock);获得锁
void spin_unlock(spinlock_t *lock);释放已经获得锁
所有的自旋锁等待在本质上都是不可中断的。一旦调用了spin_lock,在获得锁之前将一直处于自旋状态。
自旋锁和原子上下文
适用于自旋锁的核心规则是:
1.任何拥有自旋锁的代码都必须是原子的,他不能休眠。
2.自旋锁必须在可能的最短时间内拥有,提高系统的效率。
自旋锁函数
void spin_lock(spinlock_t *lock);
void spin_lock_irqsave(spinlock_t *lock, unsigned long flags);
void spin_lock_irq(spinlock_t *lock);
void spin_lock_bh(spinlock_t *lock);
---------------------------------------
void spin_unlock(spinlock_t *lock);
void spin_unlock_irqsave(spinlock_t *lock, unsigned long flags);
void spin_unlock_irq(spinlock_t *lock);
void spin_unlock_bh(spinlock_t *lock);
如果自旋锁在中断处理函数中被用到,那么在获取该锁之前需要关闭本地中断,spin_lock_irqsave 只是下列动作的一个便利接口:
1 保存本地中断状态
2 关闭本地中断
3 获取自旋锁
解锁时通过 spin_unlock_irqrestore完成释放锁、恢复本地中断到之前的状态等工作
还有一对 spin_lock_irq 和 spin_unlock_irq
如果你确定在获取锁之前本地中断是开启的,那么就不需要保存中断状态,解锁的时候直接将本地中断启用就可以啦
锁陷阱
不明确的规则
1.如果某个获得锁的函数要调用其他同样试图获取这个锁的函数,代码就会死锁。不论是信号量还是自旋锁,都不允许拥有者第二次获得这个锁。
2.想象某个线程锁定了lock1,而其他线程锁定了lock2。这时他们都试图获得另外的那个锁,于是两个线程都将死锁。
解决办法是:在必须获得多个锁时,应该始终以相同的顺序获得。
有帮助的规则:
1.如果我们必须获得一个局部锁,以及一个属于内核更中心位置的锁,则应该首先获得自己的局部锁。
2.如果我们拥有信号量和自旋锁的组合,则必须首先获得信号量;因为拥有自旋锁时调用down导致休眠是个严重的错误。
A&Q:
1.为什么中断中可以使用自旋锁,而不能使用信号量。
自旋锁在自旋的时候不会休眠,不会引起进程调度。(所以可以在中断中使用)
而获取信号量时,再不能立即得到信号量时,进程会休眠,引起进程调度。(所以不能在中断中使用)
2.为什么在拥有自旋锁的时候不能休眠?而拥有信号量时可以休眠。
跟踪一下spin_lock(&mr_lock)的实现
#define spin_lock(lock) _spin_lock(lock)
#define _spin_lock(lock) __LOCK(lock)
#define __LOCK(lock) \
do { preempt_disable(); __acquire(lock); (void)(lock); } while (0)
留意到“preempt_disable()”,这个调用的功能是“关抢占”(在spin_unlock中会重新开启抢占功能)。从中可以看出,使用自旋锁保护的区域是工作在非抢占的状态;即使获取不到锁,在“自旋”状态也是禁止抢占的。自旋锁保护的区域是工作在非抢占的状态;当拥有自旋锁的进程休眠了,会发生系统调度。试想一下,假如在自旋锁保护的代码中间睡眠,此时发生进程调度,则可能另外一个进程会再次调用spinlock保护的这段代码。而我们现在知道了即使在获取不到锁的“自旋”状态,也是禁止抢占的,而“自旋”又是动态的,不会再睡眠了,也就是说在这个处理器上不会再有进程调度发生了,那么死锁自然就发生了。
同理,在信号量中休眠,另一个试图获得该信号量的进程会休眠,而不会占有CPU,所以可被唤醒休眠的进程得到CPU后唤醒。
/*
* main.c -- the bare scull char module
*
* Copyright (C) 2001 Alessandro Rubini and Jonathan Corbet
* Copyright (C) 2001 O'Reilly & Associates
*
* The source code in this file can be freely used, adapted,
* and redistributed in source or binary form, so long as an
* acknowledgment appears in derived source files. The citation
* should list that the code comes from the book "Linux Device
* Drivers" by Alessandro Rubini and Jonathan Corbet, published
* by O'Reilly & Associates. No warranty is attached;
* we cannot take responsibility for errors or fitness for use.
*
*/
#include <linux/config.h>
#include <linux/module.h>
#include <linux/moduleparam.h>
#include <linux/init.h>
#include <linux/kernel.h> /* printk() */
#include <linux/slab.h> /* kmalloc() */
#include <linux/fs.h> /* everything... */
#include <linux/errno.h> /* error codes */
#include <linux/types.h> /* size_t */
#include <linux/proc_fs.h>
#include <linux/fcntl.h> /* O_ACCMODE */
#include <linux/seq_file.h>
#include <linux/cdev.h>
#include <asm/system.h> /* cli(), *_flags */
#include <asm/uaccess.h> /* copy_*_user */
#include "scull.h" /* local definitions */
/*
* Our parameters which can be set at load time.
*/
int scull_major = SCULL_MAJOR;
int scull_minor = 0;
int scull_nr_devs = SCULL_NR_DEVS; /* number of bare scull devices */
int scull_quantum = SCULL_QUANTUM;
int scull_qset = SCULL_QSET;
module_param(scull_major, int, S_IRUGO);
module_param(scull_minor, int, S_IRUGO);
module_param(scull_nr_devs, int, S_IRUGO);
module_param(scull_quantum, int, S_IRUGO);
module_param(scull_qset, int, S_IRUGO);
MODULE_AUTHOR("Alessandro Rubini, Jonathan Corbet");
MODULE_LICENSE("Dual BSD/GPL");
struct scull_dev *scull_devices; /* allocated in scull_init_module */
/*
* Empty out the scull device; must be called with the device
* semaphore held.
*/
int scull_trim(struct scull_dev *dev)
{
struct scull_qset *next, *dptr;
int qset = dev->qset; /* "dev" is not-null */
int i;
for (dptr = dev->data; dptr; dptr = next) { /* all the list items */
if (dptr->data) {
for (i = 0; i < qset; i++)
kfree(dptr->data[i]);
kfree(dptr->data);
dptr->data = NULL;
}
next = dptr->next;
kfree(dptr);
}
dev->size = 0;
dev->quantum = scull_quantum;
dev->qset = scull_qset;
dev->data = NULL;
return 0;
}
#ifdef SCULL_DEBUG /* use proc only if debugging */
/*
* The proc filesystem: function to read and entry
*/
int scull_read_procmem(char *buf, char **start, off_t offset,
int count, int *eof, void *data)
{
int i, j, len = 0;
int limit = count - 80; /* Don't print more than this */
for (i = 0; i < scull_nr_devs && len <= limit; i++) {
struct scull_dev *d = &scull_devices[i];
struct scull_qset *qs = d->data;
if (down_interruptible(&d->sem))
return -ERESTARTSYS;
len += sprintf(buf+len,"\nDevice %i: qset %i, q %i, sz %li\n",
i, d->qset, d->quantum, d->size);
for (; qs && len <= limit; qs = qs->next) { /* scan the list */
len += sprintf(buf + len, " item at %p, qset at %p\n",
qs, qs->data);
if (qs->data && !qs->next) /* dump only the last item */
for (j = 0; j < d->qset; j++) {
if (qs->data[j])
len += sprintf(buf + len,
" % 4i: %8p\n",
j, qs->data[j]);
}
}
up(&scull_devices[i].sem);
}
*eof = 1;
return len;
}
/*
* For now, the seq_file implementation will exist in parallel. The
* older read_procmem function should maybe go away, though.
*/
/*
* Here are our sequence iteration methods. Our "position" is
* simply the device number.
*/
static void *scull_seq_start(struct seq_file *s, loff_t *pos)
{
if (*pos >= scull_nr_devs)
return NULL; /* No more to read */
return scull_devices + *pos;
}
static void *scull_seq_next(struct seq_file *s, void *v, loff_t *pos)
{
(*pos)++;
if (*pos >= scull_nr_devs)
return NULL;
return scull_devices + *pos;
}
static void scull_seq_stop(struct seq_file *s, void *v)
{
/* Actually, there's nothing to do here */
}
static int scull_seq_show(struct seq_file *s, void *v)
{
struct scull_dev *dev = (struct scull_dev *) v;
struct scull_qset *d;
int i;
if (down_interruptible(&dev->sem))
return -ERESTARTSYS;
seq_printf(s, "\nDevice %i: qset %i, q %i, sz %li\n",
(int) (dev - scull_devices), dev->qset,
dev->quantum, dev->size);
for (d = dev->data; d; d = d->next) { /* scan the list */
seq_printf(s, " item at %p, qset at %p\n", d, d->data);
if (d->data && !d->next) /* dump only the last item */
for (i = 0; i < dev->qset; i++) {
if (d->data[i])
seq_printf(s, " % 4i: %8p\n",
i, d->data[i]);
}
}
up(&dev->sem);
return 0;
}
/*
* Tie the sequence operators up.
*/
static struct seq_operations scull_seq_ops = {
.start = scull_seq_start,
.next = scull_seq_next,
.stop = scull_seq_stop,
.show = scull_seq_show
};
/*
* Now to implement the /proc file we need only make an open
* method which sets up the sequence operators.
*/
static int scull_proc_open(struct inode *inode, struct file *file)
{
return seq_open(file, &scull_seq_ops);
}
/*
* Create a set of file operations for our proc file.
*/
static struct file_operations scull_proc_ops = {
.owner = THIS_MODULE,
.open = scull_proc_open,
.read = seq_read,
.llseek = seq_lseek,
.release = seq_release
};
/*
* Actually create (and remove) the /proc file(s).
*/
static void scull_create_proc(void)
{
struct proc_dir_entry *entry;
create_proc_read_entry("scullmem", 0 /* default mode */,
NULL /* parent dir */, scull_read_procmem,
NULL /* client data */);
entry = create_proc_entry("scullseq", 0, NULL);
if (entry)
entry->proc_fops = &scull_proc_ops;
}
static void scull_remove_proc(void)
{
/* no problem if it was not registered */
remove_proc_entry("scullmem", NULL /* parent dir */);
remove_proc_entry("scullseq", NULL);
}
#endif /* SCULL_DEBUG */
/*
* Open and close
*/
int scull_open(struct inode *inode, struct file *filp)
{
struct scull_dev *dev; /* device information */
dev = container_of(inode->i_cdev, struct scull_dev, cdev);
filp->private_data = dev; /* for other methods */
/* now trim to 0 the length of the device if open was write-only */
if ( (filp->f_flags & O_ACCMODE) == O_WRONLY) {
if (down_interruptible(&dev->sem))
return -ERESTARTSYS;
scull_trim(dev); /* ignore errors */
up(&dev->sem);
}
return 0; /* success */
}
int scull_release(struct inode *inode, struct file *filp)
{
return 0;
}
/*
* Follow the list
*/
struct scull_qset *scull_follow(struct scull_dev *dev, int n)
{
struct scull_qset *qs = dev->data;
/* Allocate first qset explicitly if need be */
if (! qs) {
qs = dev->data = kmalloc(sizeof(struct scull_qset), GFP_KERNEL);
if (qs == NULL)
return NULL; /* Never mind */
memset(qs, 0, sizeof(struct scull_qset));
}
/* Then follow the list */
while (n--) {
if (!qs->next) {
qs->next = kmalloc(sizeof(struct scull_qset), GFP_KERNEL);
if (qs->next == NULL)
return NULL; /* Never mind */
memset(qs->next, 0, sizeof(struct scull_qset));
}
qs = qs->next;
continue;
}
return qs;
}
/*
* Data management: read and write
*/
ssize_t scull_read(struct file *filp, char __user *buf, size_t count,
loff_t *f_pos)
{
struct scull_dev *dev = filp->private_data;
struct scull_qset *dptr; /* the first listitem */
int quantum = dev->quantum, qset = dev->qset;
int itemsize = quantum * qset; /* how many bytes in the listitem */
int item, s_pos, q_pos, rest;
ssize_t retval = 0;
if (down_interruptible(&dev->sem))
return -ERESTARTSYS;
if (*f_pos >= dev->size)
goto out;
if (*f_pos + count > dev->size)
count = dev->size - *f_pos;
/* find listitem, qset index, and offset in the quantum */
item = (long)*f_pos / itemsize;
rest = (long)*f_pos % itemsize;
s_pos = rest / quantum; q_pos = rest % quantum;
/* follow the list up to the right position (defined elsewhere) */
dptr = scull_follow(dev, item);
if (dptr == NULL || !dptr->data || ! dptr->data[s_pos])
goto out; /* don't fill holes */
/* read only up to the end of this quantum */
if (count > quantum - q_pos)
count = quantum - q_pos;
if (copy_to_user(buf, dptr->data[s_pos] + q_pos, count)) {
retval = -EFAULT;
goto out;
}
*f_pos += count;
retval = count;
out:
up(&dev->sem);
return retval;
}
ssize_t scull_write(struct file *filp, const char __user *buf, size_t count,
loff_t *f_pos)
{
struct scull_dev *dev = filp->private_data;
struct scull_qset *dptr;
int quantum = dev->quantum, qset = dev->qset;
int itemsize = quantum * qset;
int item, s_pos, q_pos, rest;
ssize_t retval = -ENOMEM; /* value used in "goto out" statements */
if (down_interruptible(&dev->sem))
return -ERESTARTSYS;
/* find listitem, qset index and offset in the quantum */
item = (long)*f_pos / itemsize;
rest = (long)*f_pos % itemsize;
s_pos = rest / quantum; q_pos = rest % quantum;
/* follow the list up to the right position */
dptr = scull_follow(dev, item);
if (dptr == NULL)
goto out;
if (!dptr->data) {
dptr->data = kmalloc(qset * sizeof(char *), GFP_KERNEL);
if (!dptr->data)
goto out;
memset(dptr->data, 0, qset * sizeof(char *));
}
if (!dptr->data[s_pos]) {
dptr->data[s_pos] = kmalloc(quantum, GFP_KERNEL);
if (!dptr->data[s_pos])
goto out;
}
/* write only up to the end of this quantum */
if (count > quantum - q_pos)
count = quantum - q_pos;
if (copy_from_user(dptr->data[s_pos]+q_pos, buf, count)) {
retval = -EFAULT;
goto out;
}
*f_pos += count;
retval = count;
/* update the size */
if (dev->size < *f_pos)
dev->size = *f_pos;
out:
up(&dev->sem);
return retval;
}
/*
* The ioctl() implementation
*/
int scull_ioctl(struct inode *inode, struct file *filp,
unsigned int cmd, unsigned long arg)
{
int err = 0, tmp;
int retval = 0;
/*
* extract the type and number bitfields, and don't decode
* wrong cmds: return ENOTTY (inappropriate ioctl) before access_ok()
*/
if (_IOC_TYPE(cmd) != SCULL_IOC_MAGIC) return -ENOTTY;
if (_IOC_NR(cmd) > SCULL_IOC_MAXNR) return -ENOTTY;
/*
* the direction is a bitmask, and VERIFY_WRITE catches R/W
* transfers. `Type' is user-oriented, while
* access_ok is kernel-oriented, so the concept of "read" and
* "write" is reversed
*/
if (_IOC_DIR(cmd) & _IOC_READ)
err = !access_ok(VERIFY_WRITE, (void __user *)arg, _IOC_SIZE(cmd));
else if (_IOC_DIR(cmd) & _IOC_WRITE)
err = !access_ok(VERIFY_READ, (void __user *)arg, _IOC_SIZE(cmd));
if (err) return -EFAULT;
switch(cmd) {
case SCULL_IOCRESET:
scull_quantum = SCULL_QUANTUM;
scull_qset = SCULL_QSET;
break;
case SCULL_IOCSQUANTUM: /* Set: arg points to the value */
if (! capable (CAP_SYS_ADMIN))
return -EPERM;
retval = __get_user(scull_quantum, (int __user *)arg);
break;
case SCULL_IOCTQUANTUM: /* Tell: arg is the value */
if (! capable (CAP_SYS_ADMIN))
return -EPERM;
scull_quantum = arg;
break;
case SCULL_IOCGQUANTUM: /* Get: arg is pointer to result */
retval = __put_user(scull_quantum, (int __user *)arg);
break;
case SCULL_IOCQQUANTUM: /* Query: return it (it's positive) */
return scull_quantum;
case SCULL_IOCXQUANTUM: /* eXchange: use arg as pointer */
if (! capable (CAP_SYS_ADMIN))
return -EPERM;
tmp = scull_quantum;
retval = __get_user(scull_quantum, (int __user *)arg);
if (retval == 0)
retval = __put_user(tmp, (int __user *)arg);
break;
case SCULL_IOCHQUANTUM: /* sHift: like Tell + Query */
if (! capable (CAP_SYS_ADMIN))
return -EPERM;
tmp = scull_quantum;
scull_quantum = arg;
return tmp;
case SCULL_IOCSQSET:
if (! capable (CAP_SYS_ADMIN))
return -EPERM;
retval = __get_user(scull_qset, (int __user *)arg);
break;
case SCULL_IOCTQSET:
if (! capable (CAP_SYS_ADMIN))
return -EPERM;
scull_qset = arg;
break;
case SCULL_IOCGQSET:
retval = __put_user(scull_qset, (int __user *)arg);
break;
case SCULL_IOCQQSET:
return scull_qset;
case SCULL_IOCXQSET:
if (! capable (CAP_SYS_ADMIN))
return -EPERM;
tmp = scull_qset;
retval = __get_user(scull_qset, (int __user *)arg);
if (retval == 0)
retval = put_user(tmp, (int __user *)arg);
break;
case SCULL_IOCHQSET:
if (! capable (CAP_SYS_ADMIN))
return -EPERM;
tmp = scull_qset;
scull_qset = arg;
return tmp;
/*
* The following two change the buffer size for scullpipe.
* The scullpipe device uses this same ioctl method, just to
* write less code. Actually, it's the same driver, isn't it?
*/
case SCULL_P_IOCTSIZE:
scull_p_buffer = arg;
break;
case SCULL_P_IOCQSIZE:
return scull_p_buffer;
default: /* redundant, as cmd was checked against MAXNR */
return -ENOTTY;
}
return retval;
}
/*
* The "extended" operations -- only seek
*/
loff_t scull_llseek(struct file *filp, loff_t off, int whence)
{
struct scull_dev *dev = filp->private_data;
loff_t newpos;
switch(whence) {
case 0: /* SEEK_SET */
newpos = off;
break;
case 1: /* SEEK_CUR */
newpos = filp->f_pos + off;
break;
case 2: /* SEEK_END */
newpos = dev->size + off;
break;
default: /* can't happen */
return -EINVAL;
}
if (newpos < 0) return -EINVAL;
filp->f_pos = newpos;
return newpos;
}
struct file_operations scull_fops = {
.owner = THIS_MODULE,
.llseek = scull_llseek,
.read = scull_read,
.write = scull_write,
.ioctl = scull_ioctl,
.open = scull_open,
.release = scull_release,
};
/*
* Finally, the module stuff
*/
/*
* The cleanup function is used to handle initialization failures as well.
* Thefore, it must be careful to work correctly even if some of the items
* have not been initialized
*/
void scull_cleanup_module(void)
{
int i;
dev_t devno = MKDEV(scull_major, scull_minor);
/* Get rid of our char dev entries */
if (scull_devices) {
for (i = 0; i < scull_nr_devs; i++) {
scull_trim(scull_devices + i);
cdev_del(&scull_devices[i].cdev);
}
kfree(scull_devices);
}
#ifdef SCULL_DEBUG /* use proc only if debugging */
scull_remove_proc();
#endif
/* cleanup_module is never called if registering failed */
unregister_chrdev_region(devno, scull_nr_devs);
/* and call the cleanup functions for friend devices */
scull_p_cleanup();
scull_access_cleanup();
}
/*
* Set up the char_dev structure for this device.
*/
static void scull_setup_cdev(struct scull_dev *dev, int index)
{
int err, devno = MKDEV(scull_major, scull_minor + index);
cdev_init(&dev->cdev, &scull_fops);
dev->cdev.owner = THIS_MODULE;
dev->cdev.ops = &scull_fops;
err = cdev_add (&dev->cdev, devno, 1);
/* Fail gracefully if need be */
if (err)
printk(KERN_NOTICE "Error %d adding scull%d", err, index);
}
int scull_init_module(void)
{
int result, i;
dev_t dev = 0;
/*
* Get a range of minor numbers to work with, asking for a dynamic
* major unless directed otherwise at load time.
*/
if (scull_major) {
dev = MKDEV(scull_major, scull_minor);
result = register_chrdev_region(dev, scull_nr_devs, "scull");
} else {
result = alloc_chrdev_region(&dev, scull_minor, scull_nr_devs,
"scull");
scull_major = MAJOR(dev);
}
if (result < 0) {
printk(KERN_WARNING "scull: can't get major %d\n", scull_major);
return result;
}
/*
* allocate the devices -- we can't have them static, as the number
* can be specified at load time
*/
scull_devices = kmalloc(scull_nr_devs * sizeof(struct scull_dev), GFP_KERNEL);
if (!scull_devices) {
result = -ENOMEM;
goto fail; /* Make this more graceful */
}
memset(scull_devices, 0, scull_nr_devs * sizeof(struct scull_dev));
/* Initialize each device. */
for (i = 0; i < scull_nr_devs; i++) {
scull_devices[i].quantum = scull_quantum;
scull_devices[i].qset = scull_qset;
init_MUTEX(&scull_devices[i].sem);
scull_setup_cdev(&scull_devices[i], i);
}
/* At this point call the init function for any friend device */
dev = MKDEV(scull_major, scull_minor + scull_nr_devs);
dev += scull_p_init(dev);
dev += scull_access_init(dev);
#ifdef SCULL_DEBUG /* only when debugging */
scull_create_proc();
#endif
return 0; /* succeed */
fail:
scull_cleanup_module();
return result;
}
module_init(scull_init_module);
module_exit(scull_cleanup_module);