根据源码梳理Redisson的可重入、锁重试以及看门狗机制原理

Redisson可重入的原理

在上篇文章中我们已经知道了除了需要存储线程标识外，会额外存储一个锁重入次数。那么接下来我们查看使用Redisson时，Redisson的加锁与释放锁流程图。

当开始获取锁时，会先判断锁是否存在，如果存在再进行判断锁标识是否是当前线程，如果是那么value值 +1 代表锁重入次数加 1 并重新设置过期时间，如果不存在，那么直接获取锁并存储在Redis中，设置超时时间。如果需要释放锁，仍然是先获取锁标识是否和当前线程一致，如果不一致那么说明锁已经超时释放，如果一致则对value值-1后再判断value值是否为0，如果不是说明进行了锁重入，那么重置锁的超时时间即可。如果已经是0了，那么直接释放锁即可。

以上复杂的业务使用Java已经不能满足了，因此这部分业务在Redisson中使用了Lua脚本实现。

获取锁的Lua脚本

释放锁的Lua脚本

接下来我们可以进行源码跟踪查看可重入锁的实现

@Test
public void test01() throws Exception {
    //获取锁，指定锁名称，可重入
    RLock lock = redissonClient.getLock("lock");
    //三个参数分别是，最大获取锁等待时间（期间会重试），锁自动释放时间，时间单位
    boolean flag = lock.tryLock(1, 10, TimeUnit.SECONDS);
    if (flag){
        try{
            System.out.println("获取锁成功");
        }finally {
            lock.unlock();
        }
    }
}

查看最基本的tryLock()方法

接下来我会分批给出tryLock()方法的源码

public boolean tryLock(long waitTime, long leaseTime, TimeUnit unit) throws InterruptedException {
    //将最大获取锁等待时间转化为毫秒
    long time = unit.toMillis(waitTime);
    //获取当前时间的毫秒值
    long current = System.currentTimeMillis();
    //获取线程标识
    long threadId = Thread.currentThread().getId();
    //尝试获取锁
    Long ttl = tryAcquire(waitTime, leaseTime, unit, threadId);
    // lock acquired
    // ...
}

接下来查看tryAcquire()方法，查看如何获取到锁

这里做了leaseTime值判断，实际上是判断调用者是否传递了锁释放时间，如果没有传入锁施放时间那么默认值为-1，这里我们查看任意一个tryLockInnerAsync()方法都可以。

可以看到tryLockInnerAsync()方法就是通过lua脚本来实现获取锁并进行重入的，有一点是和我们上面图片中的不同，那就是获取锁失败后会返回当前锁的过期时间。

接下来查看unlock()方法的源码

public void unlock() {
    try {
        get(unlockAsync(Thread.currentThread().getId()));
    } catch (RedisException e) {
        if (e.getCause() instanceof IllegalMonitorStateException) {
            throw (IllegalMonitorStateException) e.getCause();
        } else {
            throw e;
        }
    }
}

查看unlockAsync()方法，追踪到最底层代码如下

至此，我们可以总结出，Redisson的可重入原理核心就是利用哈希结构去记录获取锁的线程与重入次数。

Redisson锁重试机制

理解锁重试机制之前，我们先查看释放锁的Lua脚本中有这么一行代码，释放锁的同时还会发布一条释放锁信息，方便其他线程开始获取锁。待会追踪源码时我们会需要用到该信息。

那么接下来依然是追踪源码，上文中获取锁失败的源码如下

<T> RFuture<T> tryLockInnerAsync(long waitTime, long leaseTime, TimeUnit unit, long threadId, RedisStrictCommand<T> command) {
    return evalWriteAsync(getRawName(), LongCodec.INSTANCE, command,
            "if ((redis.call('exists', KEYS[1]) == 0) " +
                        "or (redis.call('hexists', KEYS[1], ARGV[2]) == 1)) then " +
                    "redis.call('hincrby', KEYS[1], ARGV[2], 1); " +
                    "redis.call('pexpire', KEYS[1], ARGV[1]); " +
                    "return nil; " +
                "end; " +
                "return redis.call('pttl', KEYS[1]);",
            Collections.singletonList(getRawName()), unit.toMillis(leaseTime), getLockName(threadId));
}

这里会返回一个当前锁的施放时间，那么回退上一层查看我们得到过期释放时间后要去做什么

private <T> RFuture<Long> tryAcquireAsync(long waitTime, long leaseTime, TimeUnit unit, long threadId) {
    //返回的值赋值给该变量，该变量是一个Future，因为是异步执行lua脚本，因此无法立刻拿到返回值
    RFuture<Long> ttlRemainingFuture;
    if (leaseTime > 0) {
        ttlRemainingFuture = tryLockInnerAsync(waitTime, leaseTime, unit, threadId, RedisCommands.EVAL_LONG);
    } else {
        ttlRemainingFuture = tryLockInnerAsync(waitTime, internalLockLeaseTime,
                TimeUnit.MILLISECONDS, threadId, RedisCommands.EVAL_LONG);
    }
    //暂时先不用理解下述代码，只需要知道这里返回了一个剩余过期时间即可。
    CompletionStage<Long> f = ttlRemainingFuture.thenApply(ttlRemaining -> {
        // lock acquired
        if (ttlRemaining == null) {
            if (leaseTime > 0) {
                internalLockLeaseTime = unit.toMillis(leaseTime);
            } else {
                scheduleExpirationRenewal(threadId);
            }
        }
        return ttlRemaining;
    });
    
    return new CompletableFutureWrapper<>(f);
}

接着回退查看上一层代码接着做了什么事情

private Long tryAcquire(long waitTime, long leaseTime, TimeUnit unit, long threadId) {
    return get(tryAcquireAsync(waitTime, leaseTime, unit, threadId));
}

这里get()方法获取到了返回的有效期值。接着回退上一层代码就到达了tryLock()方法，具体代码注释如下

public boolean tryLock(long waitTime, long leaseTime, TimeUnit unit) throws InterruptedException {
    //将最大获取锁等待时间转化为毫秒
    long time = unit.toMillis(waitTime);
    //获取当前时间的毫秒值
    long current = System.currentTimeMillis();
    //获取线程标识
    long threadId = Thread.currentThread().getId();
    //尝试获取锁,如果获取锁失败ttl应该是一个具体值，而不是null
    Long ttl = tryAcquire(waitTime, leaseTime, unit, threadId);
    // lock acquired
    if (ttl == null) {
        //获取到锁，直接返回true
        return true;
    }

    //计算最大等待时间减去第一次尝试获取锁的时间，得到剩余等待时间
    time -= System.currentTimeMillis() - current;
    if (time <= 0) {
        //如果不存在剩余时间
        acquireFailed(waitTime, unit, threadId);
        return false;
    }
    
    //如果还存在剩余时间，接着获取当前时间
    current = System.currentTimeMillis();
    //订阅释放锁的信号（就是开头说的释放锁时会发布的那条信息），这里也是异步执行，因此返回类型为Future
    CompletableFuture<RedissonLockEntry> subscribeFuture = subscribe(threadId);
    try {
        //获取等待结果，如果超过了剩余最大等待时间会抛出异常，执行TimeOutException中的catch代码
        subscribeFuture.get(time, TimeUnit.MILLISECONDS);
    } catch (TimeoutException e) {
        if (!subscribeFuture.completeExceptionally(new RedisTimeoutException(
                "Unable to acquire subscription lock after " + time + "ms. " +
                        "Try to increase 'subscriptionsPerConnection' and/or 'subscriptionConnectionPoolSize' parameters."))) {
            subscribeFuture.whenComplete((res, ex) -> {
                if (ex == null) {
                    unsubscribe(res, threadId);
                }
            });
        }
        acquireFailed(waitTime, unit, threadId);
        return false;
    } catch (ExecutionException e) {
        acquireFailed(waitTime, unit, threadId);
        return false;
    }
    //走到这一步说明还存在剩余时间并获取到了锁释放信息
    try {
        //更新剩余时间
        time -= System.currentTimeMillis() - current;
        if (time <= 0) {
            acquireFailed(waitTime, unit, threadId);
            return false;
        }

        while (true) {
            long currentTime = System.currentTimeMillis();
            //再次尝试获取锁，如果失败获取到ttl存活时间
            ttl = tryAcquire(waitTime, leaseTime, unit, threadId);
            // lock acquired
            if (ttl == null) {
                return true;
            }
            //执行到这说明又没有获取到锁，更新剩余时间
            time -= System.currentTimeMillis() - currentTime;
            if (time <= 0) {
                acquireFailed(waitTime, unit, threadId);
                return false;
            }

            // waiting for message
            currentTime = System.currentTimeMillis();
            if (ttl >= 0 && ttl < time) {
                //如果锁剩余时间小于当前线程剩余等待时间，再次获取锁，最大等待时间为锁的释放时间
                commandExecutor.getNow(subscribeFuture).getLatch().tryAcquire(ttl, TimeUnit.MILLISECONDS);
            } else {
                //如果锁的剩余时间大于当前线程剩余等待时间，再次获取锁，最大等待时间为当前线程的剩余时间
                commandExecutor.getNow(subscribeFuture).getLatch().tryAcquire(time, TimeUnit.MILLISECONDS);
            }
            //如果还没有获取到锁，while循环执行上述代码
            time -= System.currentTimeMillis() - currentTime;
            if (time <= 0) {
                acquireFailed(waitTime, unit, threadId);
                return false;
            }
        }
    } finally {
        unsubscribe(commandExecutor.getNow(subscribeFuture), threadId);
    }
}

这里采用了Future来接收锁释放信息，对于CPU比较友好，不是持续不断的尝试获取锁，没有造成资源浪费。

Redisson看门狗机制

在刚刚锁重试机制中，在tryLock()方法中存在一个重要变量ttl，该变量记录了锁剩余存活时间。其他线程会根据ttl到期后开始尝试获取锁，那么这就存在一个问题，如果获取到锁的线程阻塞，导致ttl到期被删除，此时就会有两个线程同时获取到了锁。为了解决这个问题，Redisson存在一个看门狗机制。在叙述锁重试机制时，有一段代码我们没有进行解释，具体代码如下

private <T> RFuture<Long> tryAcquireAsync(long waitTime, long leaseTime, TimeUnit unit, long threadId) {
    //返回的值赋值给该变量，该变量是一个Future，因为是异步执行lua脚本，因此无法立刻拿到返回值
    RFuture<Long> ttlRemainingFuture;
    if (leaseTime > 0) {
        ttlRemainingFuture = tryLockInnerAsync(waitTime, leaseTime, unit, threadId, RedisCommands.EVAL_LONG);
    } else {
        ttlRemainingFuture = tryLockInnerAsync(waitTime, internalLockLeaseTime,
                TimeUnit.MILLISECONDS, threadId, RedisCommands.EVAL_LONG);
    }
    //回调函数，当拿到返回值ttlRemaining
    CompletionStage<Long> f = ttlRemainingFuture.thenApply(ttlRemaining -> {
        // lock acquired
        //说明获取锁成功
        if (ttlRemaining == null) {
            //如果我们配置了锁的过期时间，那么将其转化为毫秒后覆盖掉默认的锁释放时间（同时也会取消看门狗机制）
            if (leaseTime > 0) {
                internalLockLeaseTime = unit.toMillis(leaseTime);
            } else {
                //如果没有指定锁的过期施放时间，那么定时将锁的有效时间进行更新
                scheduleExpirationRenewal(threadId);
            }
        }
        return ttlRemaining;
    });
    
    return new CompletableFutureWrapper<>(f);
}

internalLockLeaseTime属性存在一个默认值，如果我们不指定锁的过期时间，那么就是使用Redisson中的默认值，具体源码如下

public class RedissonLock extends RedissonBaseLock {

    protected long internalLockLeaseTime;

    protected final LockPubSub pubSub;

    final CommandAsyncExecutor commandExecutor;

    public RedissonLock(CommandAsyncExecutor commandExecutor, String name) {
        super(commandExecutor, name);
        this.commandExecutor = commandExecutor;
        //获取默认的锁过期时间，又叫获取看门狗的过期时间，默认是30s
        this.internalLockLeaseTime = commandExecutor.getConnectionManager().getCfg().getLockWatchdogTimeout();
        this.pubSub = commandExecutor.getConnectionManager().getSubscribeService().getLockPubSub();
    }
}

接着我们查看定期更新锁的有效期方法scheduleExpirationRenewal()，具体源码如下

protected void scheduleExpirationRenewal(long threadId) {
	//该对象主要存储了两个属性，线程标识，Timeout对象（一个定时任务），我们可以理解为一个锁对象
    ExpirationEntry entry = new ExpirationEntry();
	//MAP对象存储的是不同业务中的不同锁对象。getEntryName()实际上获取到的是getLock(name)方法中的name
	//如果第一次获取，返回值为null，如果map中已经存在该业务类型的锁那么返回的是entry对象
    ExpirationEntry oldEntry = EXPIRATION_RENEWAL_MAP.putIfAbsent(getEntryName(), entry);
    if (oldEntry != null) {
        //说明map中以及存在该业务类型的锁了，更新该业务锁的线程标识id
        oldEntry.addThreadId(threadId);
    } else {
        //第一次向map中存放该业务类型的锁，更新该业务锁的线程标识id
        entry.addThreadId(threadId);
        try {
            //续约方法
            renewExpiration();
        } finally {
            if (Thread.currentThread().isInterrupted()) {
                cancelExpirationRenewal(threadId);
            }
        }
    }
}

接下来我们查看续约方法renewExpiration()

private void renewExpiration() {
    //获取业务锁对象
    ExpirationEntry ee = EXPIRATION_RENEWAL_MAP.get(getEntryName());
    if (ee == null) {
        return;
    }
    //这里有三个参数，第一个是定时任务需要执行的逻辑代码，第二个是延时执行时间，第三个延时执行时间单位
    //延时执行时间是锁的过期释放时间的三分之一
    Timeout task = commandExecutor.getConnectionManager().newTimeout(new TimerTask() {
        @Override
        public void run(Timeout timeout) throws Exception {
            //获取锁对象
            ExpirationEntry ent = EXPIRATION_RENEWAL_MAP.get(getEntryName());
            if (ent == null) {
                return;
            }
            //获取锁对象中的线程标识
            Long threadId = ent.getFirstThreadId();
            if (threadId == null) {
                return;
            }
            //刷新锁的有效时间
            CompletionStage<Boolean> future = renewExpirationAsync(threadId);
            //刷新锁的有效时间结束后，调用下面方法
            future.whenComplete((res, e) -> {
                //如果刷新锁的有效时间抛出异常，抛出日志并将锁对象从map中移除
                if (e != null) {
                    log.error("Can't update lock {} expiration", getRawName(), e);
                    EXPIRATION_RENEWAL_MAP.remove(getEntryName());
                    return;
                }
                //如果刷新成功
                if (res) {
                    // 递归调用本方法
                    renewExpiration();
                } else {
                    //如果返回值为null，那么就取消定时任务
                    cancelExpirationRenewal(null);
                }
            });
        }
    }, internalLockLeaseTime / 3, TimeUnit.MILLISECONDS);
    
    ee.setTimeout(task);
}

renewExpirationAsync()方法作用就是刷新锁的有效时间，具体源码如下

protected CompletionStage<Boolean> renewExpirationAsync(long threadId) {
    return evalWriteAsync(getRawName(), LongCodec.INSTANCE, RedisCommands.EVAL_BOOLEAN,
            "if (redis.call('hexists', KEYS[1], ARGV[2]) == 1) then " +
                    "redis.call('pexpire', KEYS[1], ARGV[1]); " +
                    "return 1; " +
                    "end; " +
                    "return 0;",
            Collections.singletonList(getRawName()),
            internalLockLeaseTime, getLockName(threadId));
}

以上就是在获取锁成功时，Redisson内部维护一个map集合存储当前系统中的锁对象，并向锁对象中设置锁的有效期更新的定时任务。在释放锁的时候，需要对该定时任务取消，接下来我们查看unlock()方法中取消定时任务的代码

public RFuture<Void> unlockAsync(long threadId) {
    RFuture<Boolean> future = unlockInnerAsync(threadId);
	//当取消锁成功时，执行该回调方法
    CompletionStage<Void> f = future.handle((opStatus, e) -> {
        //取消续约定时任务
        cancelExpirationRenewal(threadId);

        if (e != null) {
            throw new CompletionException(e);
        }
        if (opStatus == null) {
            IllegalMonitorStateException cause = new IllegalMonitorStateException("attempt to unlock lock, not locked by current thread by node id: "
                    + id + " thread-id: " + threadId);
            throw new CompletionException(cause);
        }

        return null;
    });

    return new CompletableFutureWrapper<>(f);
}

取消续约定时任务的具体代码如下

protected void cancelExpirationRenewal(Long threadId) {
	//根据key获取到锁对象
    ExpirationEntry task = EXPIRATION_RENEWAL_MAP.get(getEntryName());
	//如果不存在锁，那么直接返回
    if (task == null) {
        return;
    }
    
    if (threadId != null) {
        //移除锁对象中的线程标识
        task.removeThreadId(threadId);
    }

    if (threadId == null || task.hasNoThreads()) {
        Timeout timeout = task.getTimeout();
        if (timeout != null) {
            //如果锁对象中的定时任务不为空，那么就取消
            timeout.cancel();
        }
        //移除map中的锁对象
        EXPIRATION_RENEWAL_MAP.remove(getEntryName());
    }
}

对锁重试与看门狗机制进行一个流程总结