背景
我们平台系统的微信支付突然不可用,用户点击支付都提示错误“系统繁忙”。
排查
查看日志,发现“支付聚合服务”调用“微信支付服务”的http请求返回read timeout,问题很显然出在“微信支付服务”。http请求报read timeout,说明能建立connection,应用没有死亡,只是响应慢。
一个应用响应慢,要么是请求流量大被“压死”,要么是依赖组件慢被“拖死”。
通过日志量分析,并没有突发的流量,那只有可能是被“拖死”了。
被“拖死”的情况,应用web容器线程会表现出所有线程都阻塞在某个操作。
我们马上通过jstack命令dump出应用的线程栈信息,发现一个问题:所有的web容器线程都阻塞在com.wechat.pay.contrib.apache.httpclient.cert.CertificatesManager.putMerchant方法
"XNIO-1 task-8" #298 prio=5 os_prio=0 tid=0x00007f33b0072800 nid=0x12b waiting for monitor entry [0x00007f342051a000]
java.lang.Thread.State: BLOCKED (on object monitor)
at com.wechat.pay.contrib.apache.httpclient.cert.CertificatesManager.putMerchant(CertificatesManager.java:142)
- waiting to lock <0x00000000da83be48> (a com.wechat.pay.contrib.apache.httpclient.cert.CertificatesManager)
原因分析
什么原因引起阻塞?
com.wechat.pay.contrib.apache.httpclient.cert.CertificatesManager属于wechatpay-apache-httpclient包,是微信支付开源的官方依赖包,项目地址:https://github.com/wechatpay-apiv3/wechatpay-apache-httpclient
CertificatesManager.putMerchant是向微信请求证书,通过查看源码,发现CertificatesManager是个单例,putMerchant是synchronized同步方法,内部最终通过httpclient发起http请求从微信支付平台拉取证书。这个http请求没有设置超时时间,默认不超时,如果微信提供证书的服务稍微抖动不响应一下,这里就会阻塞住。
代码如下,
/**
* 增加需要自动更新平台证书的商户信息
*
* @param merchantId 商户号
* @param credentials 认证器
* @param apiV3Key APIv3密钥
* @throws IOException IO错误
* @throws GeneralSecurityException 通用安全错误
* @throws HttpCodeException HttpCode错误
*/
public synchronized void putMerchant(String merchantId, Credentials credentials, byte[] apiV3Key)
throws IOException, GeneralSecurityException, HttpCodeException {
......
initCertificates(merchantId, credentials, apiV3Key);
......
}
/**
* 下载和更新平台证书
*
* @param merchantId 商户号
* @param verifier 验签器
* @param credentials 认证器
* @param apiV3Key apiv3密钥
* @throws HttpCodeException Http返回码异常
* @throws IOException IO异常
* @throws GeneralSecurityException 通用安全性异常
*/
private synchronized void downloadAndUpdateCert(String merchantId, Verifier verifier, Credentials credentials,
byte[] apiV3Key) throws HttpCodeException, IOException, GeneralSecurityException {
try (CloseableHttpClient httpClient = WechatPayHttpClientBuilder.create()
.withCredentials(credentials)
.withValidator(verifier == null ? (response) -> true
: new WechatPay2Validator(verifier))
.withProxy(proxy)
.build()) {
HttpGet httpGet = new HttpGet(CERT_DOWNLOAD_PATH);
httpGet.addHeader(ACCEPT, APPLICATION_JSON.toString());
try (CloseableHttpResponse response = httpClient.execute(httpGet)) {
int statusCode = response.getStatusLine().getStatusCode();
String body = EntityUtils.toString(response.getEntity());
if (statusCode == SC_OK) {
Map<BigInteger, X509Certificate> newCertList = CertSerializeUtil.deserializeToCerts(apiV3Key, body);
if (newCertList.isEmpty()) {
log.warn("Cert list is empty");
return;
}
ConcurrentHashMap<BigInteger, X509Certificate> merchantCertificates = certificates.get(merchantId);
merchantCertificates.clear();
merchantCertificates.putAll(newCertList);
} else {
log.error("Auto update cert failed, statusCode = {}, body = {}", statusCode, body);
throw new HttpCodeException("下载平台证书返回状态码异常,状态码为:" + statusCode);
}
}
}
}
为什么会所有线程都阻塞?
原因是我们使用CertificatesManager.putMerchant的用法错误。
我们的代码直接抄了wechatpay-apache-httpclient包样例代码,其实是每次支付请求,都向微信获取了一次证书。在微信支付平台证书服务抖动的情况下,只要同时有足够的支付请求,就会把“微信支付服务”所有容器线程给阻塞住。微信样例代码如下图,我们的代码如下,
public static Verifier createVerifier(WechatPayMerchant wechatPayMerchant) {
Objects.requireNonNull(wechatPayMerchant, "商户配置不能为空");
try {
PrivateKey merchantPrivateKey = PemUtil.loadPrivateKey(
new ByteArrayInputStream(wechatPayMerchant.getMerchantPrivateKey().getBytes(StandardCharsets.UTF_8)));
// 获取证书管理器实例
CertificatesManager certificatesManager = CertificatesManager.getInstance();
// 向证书管理器增加需要自动更新平台证书的商户信息
certificatesManager.putMerchant(wechatPayMerchant.getPayUsedMchId(), new WechatPay2Credentials(wechatPayMerchant.getPayUsedMchId(),
new PrivateKeySigner(wechatPayMerchant.getMerchantSerialNumber(), merchantPrivateKey)), wechatPayMerchant.getApiV3Key().getBytes(StandardCharsets.UTF_8));
Verifier verifier = certificatesManager.getVerifier(wechatPayMerchant.getPayUsedMchId());
return verifier;
} catch (Exception e) {
log.error("createVerifier报错", e);
throw new ServiceException("创建WechatPay Verifier出错");
}
}
参考
issue链接
优化
调整代码,利用CertificatesManager的缓存和自动更新策略,只在第一次加载证书,之后依赖CertificatesManager每24小时的自动更新机制。
调整后代码如下,
public static Verifier createVerifier(WechatPayMerchant wechatPayMerchant) {
Objects.requireNonNull(wechatPayMerchant, "商户配置不能为空");
try {
// 获取证书管理器实例
CertificatesManager certificatesManager = CertificatesManager.getInstance();
try{
//先从缓存找证书
Verifier verifier = certificatesManager.getVerifier(wechatPayMerchant.getPayUsedMchId());
log.debug("从缓存获取证书:{}", wechatPayMerchant.getPayUsedMchId());
return verifier;
}catch (Exception e){
log.warn("获取证书报错:{}, {}", wechatPayMerchant.getPayUsedMchId(), e.getMessage());
if(e instanceof NotFoundException){
// 证书不存在
PrivateKey merchantPrivateKey = PemUtil.loadPrivateKey(
new ByteArrayInputStream(wechatPayMerchant.getMerchantPrivateKey().getBytes(StandardCharsets.UTF_8)));
//向证书管理器增加需要自动更新平台证书的商户信息
certificatesManager.putMerchant(wechatPayMerchant.getPayUsedMchId(), new WechatPay2Credentials(wechatPayMerchant.getPayUsedMchId(),
new PrivateKeySigner(wechatPayMerchant.getMerchantSerialNumber(), merchantPrivateKey)), wechatPayMerchant.getApiV3Key().getBytes(StandardCharsets.UTF_8));
Verifier verifier = certificatesManager.getVerifier(wechatPayMerchant.getPayUsedMchId());
log.info("实时获取一次证书:{}", wechatPayMerchant.getPayUsedMchId());
return verifier;
}else{
throw e;
}
}
} catch (Exception e) {
log.error("createVerifier报错", e);
throw new ServiceException("创建WechatPay Verifier出错");
}
}