结论先行
【结论】
SkyWalking通过字节码增强技术实现,结合依赖注入和控制反转思想,以SkyWalking方式将追踪身份traceId
编织到链路追踪上下文TraceContext
中。
是不是很有趣,很有意思!!!
【收获】
skywalking-agent
启用的插件列表plugins/
要有所取舍与衡量,组件开启的越多对链路追踪和拓扑的越复杂,影响面越大,未知不可控的因素也会增多。
背景
发现问题
生产环境,发现同一个链路追踪traceId
出现在不同时间段的N个请求,都串在一起,影响链路追踪复原和拓扑展示。
@Configuration
public class ThreadPoolConfig {
@Bean(name = "eventThreadPool")
public ThreadPoolExecutor commonThreadPool() {
// int corePoolSize = Runtime.getRuntime().availableProcessors();
ThreadPoolExecutor executor = new ThreadPoolExecutor(
1, // 分析问题时有意设置的,让问题能100%复现
1,
1,
TimeUnit.SECONDS,
new ArrayBlockingQueue<>(50000),
new NamedThreadFactory("wanda_event"),
new ThreadPoolExecutor.CallerRunsPolicy());
return executor;
}
}
分析问题
我们需要找出线程池的线程中的追踪身份traceId
是怎么生成的?
【说明】
- 使用的
skywalking-agent.jar
版本是8.13.0
,使用默认的插件列表plugins/
配置,包括apm-guava-eventbus-plugin
; - 没有启用引导插件列表
bootstrap-plugins/
,将其复制到plugins/
,包括apm-jdk-threadpool-plugin
,SkyWalking默认不启用引导插件列表,因为其影响面较大,对应用性能和追踪数据都可能产生较大影响;
【思考】
- 追踪身份
traceId
是在请求根节点创建,且不可变,后续在请求生命周期中都是透传。所以,抓住生成traceId
的源头很关键; - 生成
traceId
的源头在哪里?需要从实现层面掌握traceId
生成逻辑; - 一个应用实例中包含很多线程,还需考虑生成
traceId
的线程名称;
综上所述,以新的追踪身份traceId生成 + 线程名称
作为核心排查思路。
追踪身份traceId生成的实现原理剖析
org.apache.skywalking:java-agent:9.1.0
以当前最新版本v9.1.0
源代码作为剖析对象,两个版本的代码几乎一样。
TraceContext.traceId()
org.apache.skywalking.apm.toolkit.trace.TraceContext#traceId
请求链路追踪上下文TraceContext
,调用TraceContext.traceId()
获取追踪身份traceId
package org.apache.skywalking.apm.toolkit.trace;
import java.util.Optional;
/**
* Try to access the sky-walking tracer context. The context is not existed, always. only the middleware, component, or
* rpc-framework are supported in the current invoke stack, in the same thread, the context will be available.
* <p>
*/
public class TraceContext {
/**
* Try to get the traceId of current trace context.
* 尝试获取当前追踪上下文的追踪身份traceId
*
* @return traceId, if it exists, or empty {@link String}.
*/
public static String traceId() {
return "";
}
/**
* Try to get the segmentId of current trace context.
*
* @return segmentId, if it exists, or empty {@link String}.
*/
public static String segmentId() {
return "";
}
/**
* Try to get the spanId of current trace context. The spanId is a negative number when the trace context is
* missing.
*
* @return spanId, if it exists, or empty {@link String}.
*/
public static int spanId() {
return -1;
}
/**
* Try to get the custom value from trace context.
*
* @return custom data value.
*/
public static Optional<String> getCorrelation(String key) {
return Optional.empty();
}
/**
* Put the custom key/value into trace context.
*
* @return previous value if it exists.
*/
public static Optional<String> putCorrelation(String key, String value) {
return Optional.empty();
}
}
1.链路追踪上下文的traceId是如何设置进去的?
在GitHub skywalking:java-agent
项目仓库里搜索org.apache.skywalking.apm.toolkit.trace.TraceContext
repo:apache/skywalking-java org.apache.skywalking.apm.toolkit.trace.TraceContext language:Java
在IDEA skywalking:java-agent
项目源代码里搜索org.apache.skywalking.apm.toolkit.trace.TraceContext
【结论】
SkyWalking通过字节码增强技术实现,结合依赖注入和控制反转思想,以SkyWalking方式将追踪身份traceId
编织到链路追踪上下文TraceContext
中。
数据更新是不是又多了一种实现方式。。。
TraceContextActivation
org.apache.skywalking.apm.toolkit.activation.trace.TraceContextActivation
链路追踪上下文激活TraceContextActivation
,通过TraceIDInterceptor
拦截TraceContext.traceId()
,将追踪身份traceId
设置到链路追踪上下文TraceContext
中
package org.apache.skywalking.apm.toolkit.activation.trace;
import net.bytebuddy.description.method.MethodDescription;
import net.bytebuddy.matcher.ElementMatcher;
import org.apache.skywalking.apm.agent.core.plugin.interceptor.enhance.ClassStaticMethodsEnhancePluginDefine;
import org.apache.skywalking.apm.agent.core.plugin.match.ClassMatch;
import org.apache.skywalking.apm.agent.core.plugin.match.NameMatch;
import org.apache.skywalking.apm.agent.core.plugin.interceptor.StaticMethodsInterceptPoint;
import static net.bytebuddy.matcher.ElementMatchers.named;
/**
* Active the toolkit class "TraceContext". Should not dependency or import any class in
* "skywalking-toolkit-trace-context" module. Activation's classloader is diff from "TraceContext", using direct will
* trigger classloader issue.
* <p>
*/
public class TraceContextActivation extends ClassStaticMethodsEnhancePluginDefine {
// 追踪身份traceId拦截类
public static final String TRACE_ID_INTERCEPT_CLASS = "org.apache.skywalking.apm.toolkit.activation.trace.TraceIDInterceptor";
public static final String SEGMENT_ID_INTERCEPT_CLASS = "org.apache.skywalking.apm.toolkit.activation.trace.SegmentIDInterceptor";
public static final String SPAN_ID_INTERCEPT_CLASS = "org.apache.skywalking.apm.toolkit.activation.trace.SpanIDInterceptor";
// 增强类-追踪上下文
public static final String ENHANCE_CLASS = "org.apache.skywalking.apm.toolkit.trace.TraceContext";
// 获取追踪身份traceId的静态方法名称
public static final String ENHANCE_TRACE_ID_METHOD = "traceId";
public static final String ENHANCE_SEGMENT_ID_METHOD = "segmentId";
public static final String ENHANCE_SPAN_ID_METHOD = "spanId";
public static final String ENHANCE_GET_CORRELATION_METHOD = "getCorrelation";
public static final String INTERCEPT_GET_CORRELATION_CLASS = "org.apache.skywalking.apm.toolkit.activation.trace.CorrelationContextGetInterceptor";
public static final String ENHANCE_PUT_CORRELATION_METHOD = "putCorrelation";
public static final String INTERCEPT_PUT_CORRELATION_CLASS = "org.apache.skywalking.apm.toolkit.activation.trace.CorrelationContextPutInterceptor";
/**
* @return the target class, which needs active.
*/
@Override
protected ClassMatch enhanceClass() {
// 增强类
return NameMatch.byName(ENHANCE_CLASS);
}
/**
* @return the collection of {@link StaticMethodsInterceptPoint}, represent the intercepted methods and their
* interceptors.
*/
@Override
public StaticMethodsInterceptPoint[] getStaticMethodsInterceptPoints() {
// 静态方法拦截点
return new StaticMethodsInterceptPoint[] {
new StaticMethodsInterceptPoint() {
@Override
public ElementMatcher<MethodDescription> getMethodsMatcher() {
// 获取追踪身份traceId的静态方法名称
return named(ENHANCE_TRACE_ID_METHOD);
}
@Override
public String getMethodsInterceptor() {
// 追踪身份traceId拦截类
return TRACE_ID_INTERCEPT_CLASS;
}
@Override
public boolean isOverrideArgs() {
return false;
}
},
// ...
};
}
}
TraceIDInterceptor
org.apache.skywalking.apm.toolkit.activation.trace.TraceIDInterceptor
追踪身份拦截器TraceIDInterceptor
,调用ContextManager.getGlobalTraceId()
获取追踪身份traceId
,将其返回给TraceContext.traceId()
package org.apache.skywalking.apm.toolkit.activation.trace;
import java.lang.reflect.Method;
import org.apache.skywalking.apm.agent.core.logging.api.ILog;
import org.apache.skywalking.apm.agent.core.logging.api.LogManager;
import org.apache.skywalking.apm.agent.core.plugin.interceptor.enhance.StaticMethodsAroundInterceptor;
import org.apache.skywalking.apm.agent.core.context.ContextManager;
import org.apache.skywalking.apm.agent.core.plugin.interceptor.enhance.MethodInterceptResult;
public class TraceIDInterceptor implements StaticMethodsAroundInterceptor {
private static final ILog LOGGER = LogManager.getLogger(TraceIDInterceptor.class);
@Override
public void beforeMethod(Class clazz, Method method, Object[] allArguments, Class<?>[] parameterTypes,
MethodInterceptResult result) {
// 获取第一个全局追踪身份traceId,将其定义为方法返回值
result.defineReturnValue(ContextManager.getGlobalTraceId());
}
@Override
public Object afterMethod(Class clazz, Method method, Object[] allArguments, Class<?>[] parameterTypes,
Object ret) {
// 返回追踪身份traceId
return ret;
}
@Override
public void handleMethodException(Class clazz, Method method, Object[] allArguments, Class<?>[] parameterTypes,
Throwable t) {
LOGGER.error("Failed to getDefault trace Id.", t);
}
}
ContextManager.getGlobalTraceId()
org.apache.skywalking.apm.agent.core.context.ContextManager#getGlobalTraceId
链路追踪上下文管理器ContextManager
ContextManager.getGlobalTraceId()
是获取第一个全局追踪身份traceId
,其调用AbstractTracerContext.getReadablePrimaryTraceId()
获取全局追踪身份traceId
package org.apache.skywalking.apm.agent.core.context;
import java.util.Objects;
import org.apache.skywalking.apm.agent.core.boot.BootService;
import org.apache.skywalking.apm.agent.core.boot.ServiceManager;
import org.apache.skywalking.apm.agent.core.context.trace.AbstractSpan;
import org.apache.skywalking.apm.agent.core.context.trace.TraceSegment;
import org.apache.skywalking.apm.agent.core.logging.api.ILog;
import org.apache.skywalking.apm.agent.core.logging.api.LogManager;
import org.apache.skywalking.apm.agent.core.sampling.SamplingService;
import org.apache.skywalking.apm.util.StringUtil;
import static org.apache.skywalking.apm.agent.core.conf.Config.Agent.OPERATION_NAME_THRESHOLD;
/**
* {@link ContextManager} controls the whole context of {@link TraceSegment}. Any {@link TraceSegment} relates to
* single-thread, so this context use {@link ThreadLocal} to maintain the context, and make sure, since a {@link
* TraceSegment} starts, all ChildOf spans are in the same context. <p> What is 'ChildOf'?
* https://github.com/opentracing/specification/blob/master/specification.md#references-between-spans
*
* <p> Also, {@link ContextManager} delegates to all {@link AbstractTracerContext}'s major methods.
*/
public class ContextManager implements BootService {
private static final String EMPTY_TRACE_CONTEXT_ID = "N/A";
private static final ILog LOGGER = LogManager.getLogger(ContextManager.class);
// 追踪上下文的线程本地变量
private static ThreadLocal<AbstractTracerContext> CONTEXT = new ThreadLocal<AbstractTracerContext>();
private static ThreadLocal<RuntimeContext> RUNTIME_CONTEXT = new ThreadLocal<RuntimeContext>();
private static ContextManagerExtendService EXTEND_SERVICE;
private static AbstractTracerContext getOrCreate(String operationName, boolean forceSampling) {
AbstractTracerContext context = CONTEXT.get();
if (context == null) {
if (StringUtil.isEmpty(operationName)) {
if (LOGGER.isDebugEnable()) {
LOGGER.debug("No operation name, ignore this trace.");
}
context = new IgnoredTracerContext();
} else {
if (EXTEND_SERVICE == null) {
EXTEND_SERVICE = ServiceManager.INSTANCE.findService(ContextManagerExtendService.class);
}
context = EXTEND_SERVICE.createTraceContext(operationName, forceSampling);
}
CONTEXT.set(context);
}
return context;
}
/**
* 获取第一个全局追踪身份traceId
* @return the first global trace id when tracing. Otherwise, "N/A".
*/
public static String getGlobalTraceId() {
// 追踪上下文
AbstractTracerContext context = CONTEXT.get();
// 获取全局追踪身份traceId
return Objects.nonNull(context) ? context.getReadablePrimaryTraceId() : EMPTY_TRACE_CONTEXT_ID;
}
/**
* @return the current segment id when tracing. Otherwise, "N/A".
*/
public static String getSegmentId() {
AbstractTracerContext context = CONTEXT.get();
return Objects.nonNull(context) ? context.getSegmentId() : EMPTY_TRACE_CONTEXT_ID;
}
/**
* @return the current span id when tracing. Otherwise, the value is -1.
*/
public static int getSpanId() {
AbstractTracerContext context = CONTEXT.get();
return Objects.nonNull(context) ? context.getSpanId() : -1;
}
// ...
}
AbstractTracerContext.getReadablePrimaryTraceId()
org.apache.skywalking.apm.agent.core.context.AbstractTracerContext#getReadablePrimaryTraceId
追踪上下文定义接口AbstractTracerContext
本方法获取全局追踪身份traceId
package org.apache.skywalking.apm.agent.core.context;
import org.apache.skywalking.apm.agent.core.context.trace.AbstractSpan;
/**
* The <code>AbstractTracerContext</code> represents the tracer context manager.
* 表示追踪上下文管理器
*/
public interface AbstractTracerContext {
/**
* Get the global trace id, if needEnhance. How to build, depends on the implementation.
* 获取全局追踪身份traceId
*
* @return the string represents the id.
*/
String getReadablePrimaryTraceId();
/**
* Prepare for the cross-process propagation. How to initialize the carrier, depends on the implementation.
*
* @param carrier to carry the context for crossing process.
*/
void inject(ContextCarrier carrier);
/**
* Build the reference between this segment and a cross-process segment. How to build, depends on the
* implementation.
*
* @param carrier carried the context from a cross-process segment.
*/
void extract(ContextCarrier carrier);
/**
* Capture a snapshot for cross-thread propagation. It's a similar concept with ActiveSpan.Continuation in
* OpenTracing-java How to build, depends on the implementation.
*
* @return the {@link ContextSnapshot} , which includes the reference context.
*/
ContextSnapshot capture();
/**
* Build the reference between this segment and a cross-thread segment. How to build, depends on the
* implementation.
*
* @param snapshot from {@link #capture()} in the parent thread.
*/
void continued(ContextSnapshot snapshot);
/**
* Get the current segment id, if needEnhance. How to build, depends on the implementation.
*
* @return the string represents the id.
*/
String getSegmentId();
/**
* Get the active span id, if needEnhance. How to build, depends on the implementation.
*
* @return the string represents the id.
*/
int getSpanId();
/**
* Create an entry span
*
* @param operationName most likely a service name
* @return the span represents an entry point of this segment.
*/
AbstractSpan createEntrySpan(String operationName);
/**
* Create a local span
*
* @param operationName most likely a local method signature, or business name.
* @return the span represents a local logic block.
*/
AbstractSpan createLocalSpan(String operationName);
/**
* Create an exit span
*
* @param operationName most likely a service name of remote
* @param remotePeer the network id(ip:port, hostname:port or ip1:port1,ip2,port, etc.). Remote peer could be set
* later, but must be before injecting.
* @return the span represent an exit point of this segment.
*/
AbstractSpan createExitSpan(String operationName, String remotePeer);
/**
* @return the active span of current tracing context(stack)
*/
AbstractSpan activeSpan();
/**
* Finish the given span, and the given span should be the active span of current tracing context(stack)
*
* @param span to finish
* @return true when context should be clear.
*/
boolean stopSpan(AbstractSpan span);
/**
* Notify this context, current span is going to be finished async in another thread.
*
* @return The current context
*/
AbstractTracerContext awaitFinishAsync();
/**
* The given span could be stopped officially.
*
* @param span to be stopped.
*/
void asyncStop(AsyncSpan span);
/**
* Get current correlation context
*/
CorrelationContext getCorrelationContext();
/**
* Get current primary endpoint name
*/
String getPrimaryEndpointName();
}
AbstractTracerContext
有两个子类IgnoredTracerContext
和TracingContext
。
IgnoredTracerContext.getReadablePrimaryTraceId()
org.apache.skywalking.apm.agent.core.context.IgnoredTracerContext#getReadablePrimaryTraceId
可忽略的追踪上下文IgnoredTracerContext
本方法返回"Ignored_Trace"
package org.apache.skywalking.apm.agent.core.context;
import java.util.LinkedList;
import java.util.List;
import org.apache.skywalking.apm.agent.core.context.trace.AbstractSpan;
import org.apache.skywalking.apm.agent.core.context.trace.NoopSpan;
import org.apache.skywalking.apm.agent.core.profile.ProfileStatusContext;
/**
* The <code>IgnoredTracerContext</code> represent a context should be ignored. So it just maintains the stack with an
* integer depth field.
* <p>
* All operations through this will be ignored, and keep the memory and gc cost as low as possible.
*/
public class IgnoredTracerContext implements AbstractTracerContext {
private static final NoopSpan NOOP_SPAN = new NoopSpan();
private static final String IGNORE_TRACE = "Ignored_Trace";
private final CorrelationContext correlationContext;
private final ExtensionContext extensionContext;
private final ProfileStatusContext profileStatusContext;
private int stackDepth;
public IgnoredTracerContext() {
this.stackDepth = 0;
this.correlationContext = new CorrelationContext();
this.extensionContext = new ExtensionContext();
this.profileStatusContext = ProfileStatusContext.createWithNone();
}
// ...
@Override
public String getReadablePrimaryTraceId() {
// 获取全局追踪身份traceId
return IGNORE_TRACE;
}
@Override
public String getSegmentId() {
return IGNORE_TRACE;
}
@Override
public int getSpanId() {
return -1;
}
// ...
}
TracingContext.getReadablePrimaryTraceId()
org.apache.skywalking.apm.agent.core.context.TracingContext#getReadablePrimaryTraceId
链路追踪上下文TracingContext
本方法返回DistributedTraceId
的id
字段属性
package org.apache.skywalking.apm.agent.core.context;
import java.util.LinkedList;
import java.util.List;
import java.util.concurrent.atomic.AtomicIntegerFieldUpdater;
import java.util.concurrent.locks.ReentrantLock;
import org.apache.skywalking.apm.agent.core.boot.ServiceManager;
import org.apache.skywalking.apm.agent.core.conf.Config;
import org.apache.skywalking.apm.agent.core.conf.dynamic.watcher.SpanLimitWatcher;
import org.apache.skywalking.apm.agent.core.context.ids.DistributedTraceId;
import org.apache.skywalking.apm.agent.core.context.ids.PropagatedTraceId;
import org.apache.skywalking.apm.agent.core.context.trace.AbstractSpan;
import org.apache.skywalking.apm.agent.core.context.trace.AbstractTracingSpan;
import org.apache.skywalking.apm.agent.core.context.trace.EntrySpan;
import org.apache.skywalking.apm.agent.core.context.trace.ExitSpan;
import org.apache.skywalking.apm.agent.core.context.trace.ExitTypeSpan;
import org.apache.skywalking.apm.agent.core.context.trace.LocalSpan;
import org.apache.skywalking.apm.agent.core.context.trace.NoopExitSpan;
import org.apache.skywalking.apm.agent.core.context.trace.NoopSpan;
import org.apache.skywalking.apm.agent.core.context.trace.TraceSegment;
import org.apache.skywalking.apm.agent.core.context.trace.TraceSegmentRef;
import org.apache.skywalking.apm.agent.core.logging.api.ILog;
import org.apache.skywalking.apm.agent.core.logging.api.LogManager;
import org.apache.skywalking.apm.agent.core.profile.ProfileStatusContext;
import org.apache.skywalking.apm.agent.core.profile.ProfileTaskExecutionService;
import org.apache.skywalking.apm.util.StringUtil;
import static org.apache.skywalking.apm.agent.core.conf.Config.Agent.CLUSTER;
/**
* The <code>TracingContext</code> represents a core tracing logic controller. It build the final {@link
* TracingContext}, by the stack mechanism, which is similar with the codes work.
* <p>
* In opentracing concept, it means, all spans in a segment tracing context(thread) are CHILD_OF relationship, but no
* FOLLOW_OF.
* <p>
* In skywalking core concept, FOLLOW_OF is an abstract concept when cross-process MQ or cross-thread async/batch tasks
* happen, we used {@link TraceSegmentRef} for these scenarios. Check {@link TraceSegmentRef} which is from {@link
* ContextCarrier} or {@link ContextSnapshot}.
*/
public class TracingContext implements AbstractTracerContext {
private static final ILog LOGGER = LogManager.getLogger(TracingContext.class);
private long lastWarningTimestamp = 0;
/**
* @see ProfileTaskExecutionService
*/
private static ProfileTaskExecutionService PROFILE_TASK_EXECUTION_SERVICE;
/**
* The final {@link TraceSegment}, which includes all finished spans.
* 追踪段,同一线程内的所有调用
*/
private TraceSegment segment;
/**
* Active spans stored in a Stack, usually called 'ActiveSpanStack'. This {@link LinkedList} is the in-memory
* storage-structure. <p> I use {@link LinkedList#removeLast()}, {@link LinkedList#addLast(Object)} and {@link
* LinkedList#getLast()} instead of {@link #pop()}, {@link #push(AbstractSpan)}, {@link #peek()}
*/
private LinkedList<AbstractSpan> activeSpanStack = new LinkedList<>();
/**
* @since 8.10.0 replace the removed "firstSpan"(before 8.10.0) reference. see {@link PrimaryEndpoint} for more details.
*/
private PrimaryEndpoint primaryEndpoint = null;
/**
* A counter for the next span.
*/
private int spanIdGenerator;
/**
* The counter indicates
*/
@SuppressWarnings("unused") // updated by ASYNC_SPAN_COUNTER_UPDATER
private volatile int asyncSpanCounter;
private static final AtomicIntegerFieldUpdater<TracingContext> ASYNC_SPAN_COUNTER_UPDATER =
AtomicIntegerFieldUpdater.newUpdater(TracingContext.class, "asyncSpanCounter");
private volatile boolean isRunningInAsyncMode;
private volatile ReentrantLock asyncFinishLock;
private volatile boolean running;
private final long createTime;
/**
* profile status
*/
private final ProfileStatusContext profileStatus;
@Getter(AccessLevel.PACKAGE)
private final CorrelationContext correlationContext;
@Getter(AccessLevel.PACKAGE)
private final ExtensionContext extensionContext;
//CDS watcher
private final SpanLimitWatcher spanLimitWatcher;
/**
* Initialize all fields with default value.
*/
TracingContext(String firstOPName, SpanLimitWatcher spanLimitWatcher) {
this.segment = new TraceSegment();
this.spanIdGenerator = 0;
isRunningInAsyncMode = false;
createTime = System.currentTimeMillis();
running = true;
// profiling status
if (PROFILE_TASK_EXECUTION_SERVICE == null) {
PROFILE_TASK_EXECUTION_SERVICE = ServiceManager.INSTANCE.findService(ProfileTaskExecutionService.class);
}
this.profileStatus = PROFILE_TASK_EXECUTION_SERVICE.addProfiling(
this, segment.getTraceSegmentId(), firstOPName);
this.correlationContext = new CorrelationContext();
this.extensionContext = new ExtensionContext();
this.spanLimitWatcher = spanLimitWatcher;
}
/**
* 获取全局追踪身份traceId
* @return the first global trace id.
*/
@Override
public String getReadablePrimaryTraceId() {
// 获取分布式的追踪身份的id字段属性
return getPrimaryTraceId().getId();
}
private DistributedTraceId getPrimaryTraceId() {
// 获取追踪段相关的分布式的追踪身份
return segment.getRelatedGlobalTrace();
}
@Override
public String getSegmentId() {
return segment.getTraceSegmentId();
}
@Override
public int getSpanId() {
return activeSpan().getSpanId();
}
// ...
}
DistributedTraceId
org.apache.skywalking.apm.agent.core.context.ids.DistributedTraceId#id
分布式的追踪身份DistributedTraceId
,表示一个分布式调用链路。
package org.apache.skywalking.apm.agent.core.context.ids;
import lombok.EqualsAndHashCode;
import lombok.Getter;
import lombok.RequiredArgsConstructor;
import lombok.ToString;
/**
* The <code>DistributedTraceId</code> presents a distributed call chain.
* 表示一个分布式调用链路。
* <p>
* This call chain has a unique (service) entrance,
* <p>
* such as: Service : http://www.skywalking.com/cust/query, all the remote, called behind this service, rest remote, db
* executions, are using the same <code>DistributedTraceId</code> even in different JVM.
* <p>
* The <code>DistributedTraceId</code> contains only one string, and can NOT be reset, creating a new instance is the
* only option.
*/
@RequiredArgsConstructor
@ToString
@EqualsAndHashCode
public abstract class DistributedTraceId {
@Getter
private final String id;
}
DistributedTraceId
有两个子类PropagatedTraceId
和NewDistributedTraceId
。
PropagatedTraceId
org.apache.skywalking.apm.agent.core.context.ids.PropagatedTraceId
传播的追踪身份PropagatedTraceId
,表示从对等端传播的DistributedTraceId
。
package org.apache.skywalking.apm.agent.core.context.ids;
/**
* The <code>PropagatedTraceId</code> represents a {@link DistributedTraceId}, which is propagated from the peer.
*/
public class PropagatedTraceId extends DistributedTraceId {
public PropagatedTraceId(String id) {
// 透传追踪身份traceId
super(id);
}
}
NewDistributedTraceId
org.apache.skywalking.apm.agent.core.context.ids.NewDistributedTraceId
新的分布式的追踪身份NewDistributedTraceId
,是具有新生成的id的DistributedTraceId
。
默认构造函数调用GlobalIdGenerator.generate()
生成新的全局id,即追踪身份traceId
package org.apache.skywalking.apm.agent.core.context.ids;
/**
* The <code>NewDistributedTraceId</code> is a {@link DistributedTraceId} with a new generated id.
*/
public class NewDistributedTraceId extends DistributedTraceId {
public NewDistributedTraceId() {
// 生成新的全局id,即追踪身份traceId
super(GlobalIdGenerator.generate());
}
}
GlobalIdGenerator.generate()
org.apache.skywalking.apm.agent.core.context.ids.GlobalIdGenerator#generate
全局id生成器GlobalIdGenerator
本方法用于生成一个新的全局id,是真正生成追踪身份traceId
的地方。
package org.apache.skywalking.apm.agent.core.context.ids;
import java.util.UUID;
import org.apache.skywalking.apm.util.StringUtil;
public final class GlobalIdGenerator {
// 应用实例进程身份id
private static final String PROCESS_ID = UUID.randomUUID().toString().replaceAll("-", "");
// 线程的id序列号的上下文
private static final ThreadLocal<IDContext> THREAD_ID_SEQUENCE = ThreadLocal.withInitial(
() -> new IDContext(System.currentTimeMillis(), (short) 0));
private GlobalIdGenerator() {
}
/**
* 生成一个新的id。
* Generate a new id, combined by three parts.
* <p>
* The first one represents application instance id.
* 第一部分,表示应用实例进程身份id
* <p>
* The second one represents thread id.
* 第二部分,表示线程身份id
* <p>
* The third one also has two parts, 1) a timestamp, measured in milliseconds 2) a seq, in current thread, between
* 0(included) and 9999(included)
* 第三部分,也有两个部分, 1) 一个时间戳,单位是毫秒ms 2) 在当前线程中的一个序列号,位于[0,9999]之间
*
* @return unique id to represent a trace or segment
* 表示追踪或追踪段的唯一id
*/
public static String generate() {
return StringUtil.join(
'.',
PROCESS_ID,
String.valueOf(Thread.currentThread().getId()),
String.valueOf(THREAD_ID_SEQUENCE.get().nextSeq())
);
}
private static class IDContext {
private long lastTimestamp;
private short threadSeq;
// Just for considering time-shift-back only.
private long lastShiftTimestamp;
private int lastShiftValue;
private IDContext(long lastTimestamp, short threadSeq) {
this.lastTimestamp = lastTimestamp;
this.threadSeq = threadSeq;
}
private long nextSeq() {
return timestamp() * 10000 + nextThreadSeq();
}
private long timestamp() {
long currentTimeMillis = System.currentTimeMillis();
if (currentTimeMillis < lastTimestamp) {
// Just for considering time-shift-back by Ops or OS. @hanahmily 's suggestion.
if (lastShiftTimestamp != currentTimeMillis) {
lastShiftValue++;
lastShiftTimestamp = currentTimeMillis;
}
return lastShiftValue;
} else {
lastTimestamp = currentTimeMillis;
return lastTimestamp;
}
}
private short nextThreadSeq() {
if (threadSeq == 10000) {
threadSeq = 0;
}
return threadSeq++;
}
}
}
案例实战
实践出真知识!!!
若不了解其底层实现原理,是很难想到这些切面的拦截点。
monitor/watch/trace 相关 - Arthas 命令列表
// 【切面的拦截点】生成新的追踪身份traceId + wanda_event开头的线程
stack org.apache.skywalking.apm.agent.core.context.ids.NewDistributedTraceId <init> '@java.lang.Thread@currentThread().getName().startsWith("wanda_event")'
watch org.apache.skywalking.apm.agent.core.context.ids.NewDistributedTraceId <init> '{target, returnObj}' '@java.lang.Thread@currentThread().getName().startsWith("wanda_event")' -x 6
// 【切面的拦截点】获取全局追踪身份traceId + wanda_event开头的线程
stack org.apache.skywalking.apm.agent.core.context.AbstractTracerContext getReadablePrimaryTraceId '@java.lang.Thread@currentThread().getName().startsWith("wanda_event")'
watch org.apache.skywalking.apm.agent.core.context.AbstractTracerContext getPrimaryTraceId '{target, returnObj}' '@java.lang.Thread@currentThread().getName().startsWith("wanda_event")' -x 6
【案例1】wanda事件线程的traceId是谁新生成的?
这些操作是否合理?
使用Arthas的stack
命令,可以查看生成新的全局traceId
的调用栈。
通过调用栈,traceId
是由guava事件总线的订阅者Subscriber.invokeSubscriberMethod
触发生成的。
[arthas@7]$ stack org.apache.skywalking.apm.agent.core.context.ids.NewDistributedTraceId <init> '@java.lang.Thread@currentThread().getName().startsWith("wanda_event")'
Press Q or Ctrl+C to abort.
Affect(class count: 1 , method count: 1) cost in 432 ms, listenerId: 5
ts=2024-03-05 11:52:45;thread_name=wanda_event-thread-1;id=f6;is_daemon=false;priority=5;TCCL=org.springframework.boot.loader.LaunchedURLClassLoader@8dfe921
@org.apache.skywalking.apm.agent.core.context.ids.NewDistributedTraceId.<init>()
at org.apache.skywalking.apm.agent.core.context.trace.TraceSegment.<init>(TraceSegment.java:74)
at org.apache.skywalking.apm.agent.core.context.TracingContext.<init>(TracingContext.java:122)
at org.apache.skywalking.apm.agent.core.context.ContextManagerExtendService.createTraceContext(ContextManagerExtendService.java:91)
at org.apache.skywalking.apm.agent.core.context.ContextManager.getOrCreate(ContextManager.java:60)
at org.apache.skywalking.apm.agent.core.context.ContextManager.createLocalSpan(ContextManager.java:123)
// guava-eventbus-plugin
// 调用方法拦截器
at org.apache.skywalking.apm.plugin.guava.eventbus.EventBusSubscriberInterceptor.beforeMethod(EventBusSubscriberInterceptor.java:38)
at org.apache.skywalking.apm.agent.core.plugin.interceptor.enhance.InstMethodsInterWithOverrideArgs.intercept(InstMethodsInterWithOverrideArgs.java:75)
// 原生方法
at com.google.common.eventbus.Subscriber.invokeSubscriberMethod(Subscriber.java:-1)
at com.google.common.eventbus.Subscriber$SynchronizedSubscriber.invokeSubscriberMethod(Subscriber.java:145)
at com.google.common.eventbus.Subscriber$1.run(Subscriber.java:73)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
其是由apm-guava-eventbus-plugin
插件的EventBusSubscriberInstrumentation
操作改变字节码。
【案例2】在wanda事件线程追踪段中,查看在哪些地方获取traceId?
这些操作是否合理?
LeoaoJsonLayout.addCustomDataToJsonMap(LeoaoJsonLayout.java:29)
方法中有调用TraceContext.traceId()
[arthas@7]$ stack org.apache.skywalking.apm.agent.core.context.AbstractTracerContext getReadablePrimaryTraceId '@java.lang.Thread@currentThread().getName().startsWith("wanda_event")'
Press Q or Ctrl+C to abort.
Affect(class count: 2 , method count: 1) cost in 423 ms, listenerId: 3
ts=2024-03-04 21:03:59;thread_name=wanda_event-thread-1;id=140;is_daemon=false;priority=5;TCCL=org.springframework.boot.loader.LaunchedURLClassLoader@67fe380b
@org.apache.skywalking.apm.agent.core.context.TracingContext.getReadablePrimaryTraceId()
at org.apache.skywalking.apm.agent.core.context.ContextManager.getGlobalTraceId(ContextManager.java:77)
at org.apache.skywalking.apm.toolkit.activation.trace.TraceIDInterceptor.beforeMethod(TraceIDInterceptor.java:35)
at org.apache.skywalking.apm.agent.core.plugin.interceptor.enhance.StaticMethodsInter.intercept(StaticMethodsInter.java:73)
at org.apache.skywalking.apm.toolkit.trace.TraceContext.traceId(TraceContext.java:-1)
// SkyWalking核心链路是上面👆🏻
// 调用TraceContext.traceId()
at com.leoao.lpaas.logback.LeoaoJsonLayout.addCustomDataToJsonMap(LeoaoJsonLayout.java:29)
at ch.qos.logback.contrib.json.classic.JsonLayout.toJsonMap(null:-1)
at ch.qos.logback.contrib.json.classic.JsonLayout.toJsonMap(null:-1)
at ch.qos.logback.contrib.json.JsonLayoutBase.doLayout(null:-1)
at ch.qos.logback.core.encoder.LayoutWrappingEncoder.encode(LayoutWrappingEncoder.java:115)
at ch.qos.logback.core.OutputStreamAppender.subAppend(OutputStreamAppender.java:230)
at ch.qos.logback.core.rolling.RollingFileAppender.subAppend(RollingFileAppender.java:235)
at ch.qos.logback.core.OutputStreamAppender.append(OutputStreamAppender.java:102)
at ch.qos.logback.core.UnsynchronizedAppenderBase.doAppend(UnsynchronizedAppenderBase.java:84)
at ch.qos.logback.core.spi.AppenderAttachableImpl.appendLoopOnAppenders(AppenderAttachableImpl.java:51)
at ch.qos.logback.classic.Logger.appendLoopOnAppenders(Logger.java:270)
at ch.qos.logback.classic.Logger.callAppenders(Logger.java:257)
at ch.qos.logback.classic.Logger.buildLoggingEventAndAppend(Logger.java:421)
at ch.qos.logback.classic.Logger.filterAndLog_1(Logger.java:398)
at ch.qos.logback.classic.Logger.info(Logger.java:583)
// 输出打印日志
// log.info("receive event persistUserPositionEvent=[{}]", event);
at com.lefit.wanda.domain.event.listener.PersistUserPositionEventListener.change(PersistUserPositionEventListener.java:23)
at sun.reflect.NativeMethodAccessorImpl.invoke0(NativeMethodAccessorImpl.java:-2)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at com.google.common.eventbus.Subscriber.invokeSubscriberMethod$original$ToNcZpNk(Subscriber.java:88)
at com.google.common.eventbus.Subscriber.invokeSubscriberMethod$original$ToNcZpNk$accessor$utMvob4N(Subscriber.java:-1)
at com.google.common.eventbus.Subscriber$auxiliary$8fYqzzq0.call(null:-1)
at org.apache.skywalking.apm.agent.core.plugin.interceptor.enhance.InstMethodsInterWithOverrideArgs.intercept(InstMethodsInterWithOverrideArgs.java:85)
at com.google.common.eventbus.Subscriber.invokeSubscriberMethod(Subscriber.java:-1)
at com.google.common.eventbus.Subscriber$SynchronizedSubscriber.invokeSubscriberMethod(Subscriber.java:145)
at com.google.common.eventbus.Subscriber$1.run(Subscriber.java:73)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
【收获】
skywalking-agent
启用的插件列表plugins/
要有所取舍与衡量,组件开启的越多对链路追踪和拓扑的越复杂,影响面越大,未知不可控的因素也会增多。
参考引用
祝大家玩得开心!ˇˍˇ
简放,杭州