什么是分布式跟踪? (What is Distributed Tracing?)
Logging is an important aspect of software development. It not only helps in troubleshooting the issues but also help us in understanding the behavior of our softwares. With multiple microservices, logging becomes a multifold challenge.
日志记录是软件开发的重要方面。 它不仅有助于解决问题,而且有助于我们了解软件的行为。 使用多个微服务,日志记录成为一个多重挑战。
Consider the example we discussed in our previous exercise — Developing Service Discovery. In this exercise, Our Shopping Cart Service needs to call Product Catalog Service to get the product details. The request is distributed across multiple services (2 in our case). If each of the services has multiple instances running, request will follow a path determined at run time.
考虑我们在上一个练习中讨论的示例- 开发服务发现 。 在本练习中,我们的购物车服务需要致电产品目录服务以获取产品详细信息。 该请求分布在多个服务中(本例中为2个)。 如果每个服务都在运行多个实例,则请求将遵循在运行时确定的路径。
The picture below shows just the 9 user requests getting distributed across 6 instances and it has already started looking complex. Imagine a situation when millions of transactions are getting processed with multiple participating services.
下图仅显示了9个用户请求分布在6个实例上,并且已经开始看起来很复杂。 想象一下使用多个参与服务处理数百万笔交易的情况。

If we try to trouble shoot an issue in this environment, our monolith style logging will be as useful as nothing. The Session, Thread, or the Http Request references do not help much when the request is getting processed across multiple services. We will be lost, digging the logs.
如果我们尝试在这种环境下解决问题,那么整体式日志记录将一无所用。 当跨多个服务处理请求时,Session,Thread或Http Request引用没有太大帮助。 我们将迷路,挖掘日志。
We must have a unique request/transaction reference which should be passed across all the calls, to all the dependent microservices, in the call chain. This unique reference is usually referred as Correlation-id or Trace Id.
我们必须有一个唯一的请求/事务引用,该引用应该在整个调用链中传递给所有依赖的微服务。 此唯一引用通常称为Correlation-id或Trace ID 。
The Trace Id is created when the request hits the very first service in the chain. For subsequent calls, already existing Trace Id is passed along, typically as an http header attribute. This pattern is called — Distributed Tracing.
当请求到达链中的第一个服务时,将创建跟踪ID 。 对于后续调用,将传递已经存在的跟踪ID ,通常作为http标头属性。 这种模式称为“ 分布式跟踪” 。

Now that we understood the need of Distributed Tracing, lets see how we can implement this pattern.
现在,我们了解了分布式跟踪的必要性,让我们看看如何实现此模式。
我们如何实现分布式跟踪? (How are we implementing Distributed Tracing?)
Our sample implementation consists of two parts
我们的示例实现包括两部分
Implementing Tracing — In this part we will focus on pattern implementation to the core including Trace Id generation, passing it along the service calls and including it in the logs. We will be using Spring Cloud Sleuth library to achieve this.
实施跟踪 —在这一部分中,我们将重点关注对内核的模式实现,包括跟踪ID的生成,将其传递给服务调用并将其包括在日志中。 我们将使用Spring Cloud Sleuth库来实现这一目标。
Enabling Tracing System — In this part we will focus on how the traces are collected and visualized to get better insights into the service interactions. This reduces time in triage by contextualizing errors and delays. We will be using Zipkin to implement this.
启用跟踪系统 -在这一部分中,我们将重点介绍如何收集和可视化跟踪以更好地了解服务交互。 通过将错误和延迟相关联,从而减少了分类的时间。 我们将使用Zipkin来实现这一点。
Other than this we will also discuss patterns like “log aggregation” and “centralized logging” as these patterns are related and further enhances the trouble-shooting capabilities.
除此之外,我们还将讨论诸如“ 日志聚合 ”和“ 集中式日志记录 ”之类的模式,因为这些模式是相关的,并进一步增强了故障排除功能。
实施追踪 (Implementing Tracing)
We already developed our Product Catalog Service and Shopping Cart Service in our previous exercise. We do not need to revisit the exercise, but a brief recall is needed to set the grounds for further discussion.
在之前的练习中,我们已经开发了产品目录服务和购物车服务 。 我们不需要重新审查该练习,但是需要简短回顾一下,以为进一步讨论奠定基础。
Our Product Catalog Service is responsible to manage the product and provides api to create, update, delete and read the product (source code can be referred here).
我们的产品目录服务负责管理产品,并提供用于创建,更新,删除和读取产品的api(可在此处参考源代码)。
Our Shopping Cart Service is responsible to manage the Shopping Cart. We will be focussing on
addItem
method of this service (Source code can be referred here).我们的购物车服务负责管理购物车。 我们将专注于此服务的
addItem
方法(源代码可在此处参考 )。
Both these services are based on Spring Boot. We will be updating these services to introduce the Trace Id.
这两个服务都是基于Spring Boot的。 我们将更新这些服务以引入跟踪ID 。
Step 1: Lets add a dependency of spring-cloud-starter-sleuth to the Product Catalog Service’s pom.xml
步骤1:让spring-cloud-starter-sleuth依赖项添加到产品目录服务的 pom.xml
<project>
... <dependencies>
...
<dependency>
<groupId>org.springframework.cloud</groupId>
<artifactId>spring-cloud-starter-sleuth</artifactId>
</dependency>
</dependencies>
...</project>
Step 2: Update getProductDetails
method of ProductCatalogService.java
and add some sample log statements. We are using slf4j to get the Logger instance.
步骤2:更新ProductCatalogService.java
getProductDetails
方法,并添加一些示例日志语句。 我们正在使用slf4j来获取Logger实例。
@RestController
public class ProductCatalogService {@Autowired
private MongoTemplate mongoTemplate; private Logger logger = LoggerFactory.getLogger(ProductCatalogService.class);
@GetMapping("/product/{id}")
public Product getProductDetails(@PathVariable String id) { logger.info("get product details - process started"); Product product = mongoTemplate.findById(id, Product.class);
if (product != null) logger.info("get product details - product found"); else logger.info("get product details - product not found");return product;
}}
Step 3: Update src/main/resources/application.properties
to use the log level as TRACE and to include the service name as product_catalog.
步骤3:更新src/main/resources/application.properties
以将日志级别用作TRACE,并在服务名称中包含product_catalog。
spring.application.name=product_catalogspring.data.mongodb.uri=mongodb+srv://xxx-user:xxx-pwd@cluster0.nrsv6.gcp.mongodb.net/ecommercelogging.level.org.springframework.web.servlet.DispatcherServlet=TRACE
Thats all we need, to add Trace Id in our requests and logs. Spring Cloud Sleuth is a layer over Brave, which is a distributed tracing instrumentation library. We do not need to write any custom code to create or propagate the trace contexts as it is done automatically by Brave. It typically intercepts the requests to do it. It also configures the logging context to include Trace Id and other variables.
这就是我们所需要的,在我们的请求和日志中添加跟踪ID 。 Spring Cloud Sleuth是Brave之上的一层, Brave是一个分布式跟踪工具库。 我们不需要编写任何自定义代码来创建或传播跟踪上下文,因为Brave是自动完成的。 它通常拦截执行该请求的请求。 它还将日志记录上下文配置为包括跟踪ID和其他变量。
Lets start the Product Catalog Service by running mvn spring-boot:run
and access the get product details api. You can use browser or the api testing tool like Postman. I am using the rest url — http://localhost:8080/product/test-product-123 (test-product-123 is my sample product id) to get the product details. Lets see the generated logs —
让我们通过运行mvn spring-boot:run
启动产品目录服务 ,并访问获取产品详细信息api 。 您可以使用浏览器或Postman之类的api测试工具。 我正在使用其余网址-http:// localhost:8080 / product / test-product-123 ( test-product-123是我的示例产品ID )来获取产品详细信息。 让我们查看生成的日志-

If you see the highlighted part, log statement is carrying Service Name, Trace Id & Span Id in the respective order. The corresponding values are present in Trace log as well as other logs.
如果您看到突出显示的部分,则日志语句将按照相应的顺序携带Service Name , Trace Id和Span Id 。 跟踪日志和其他日志中都存在相应的值。
All these values are auto populated by Spring Cloud Sleuth. Trace Id is created as we accessed the rest api. This also created Span Id which has a scope limited to service only. In this particular case both Trace Id and Span Id are same, as the request goes through only one service (Product Catalog Service).
所有这些值都由Spring Cloud Sleuth自动填充。 在访问其余API时即创建了跟踪ID 。 这也创建了跨度ID ,其范围仅限于服务。 在此特定情况下, 跟踪ID和跨度ID相同,因为请求仅通过一项服务( 产品目录服务 )。
Lets make the similar changes in the Shopping Cart Service.
让我们在购物车服务中进行类似的更改。
Step 1: Add Spring Cloud Sleuth as a dependency.
步骤1:将Spring Cloud Sleuth添加为依赖项。
Step 2: Update the addItem
method with some sample log statements.
步骤2:使用一些示例日志语句更新addItem
方法。
@RestController
public class ShoppingCartService {
@Autowired
RestTemplate restTemplate;
@Autowired
ShoppingCartDao shoppingCartDao;
Logger logger = LoggerFactory.getLogger(ShoppingCartService.class);
@PostMapping("/cart/{cartId}/item")
public Cart addItem(@PathVariable String cartId, @RequestBody CartItem item) {
logger.info("add item - process started");
if (cartId != null && item != null && item.getProductId() != null) {
logger.info("add item - calling product catalog service");
Product itemProduct = restTemplate.getForObject("http://localhost:8080/product/" + item.getProductId(),
Product.class);
if (itemProduct != null && itemProduct.id != null) {
// adding total item price in the shopping cart item
item.setTotalItemPrice(itemProduct.getUnitPrice() * item.quantity);
logger.info("add item - process completed successfully");
return shoppingCartDao.addItem(cartId, item);
}
logger.warn("add item - item product not found");
throw new ResponseStatusException(HttpStatus.NOT_FOUND, "item product not found");
}
logger.error("add item - cart or item missing");
throw new ResponseStatusException(HttpStatus.NOT_FOUND, "cart or item missing");
}
}
Step 3: Update the src/main/resources/application.properties
. Apart from updating the log level and service name, we will also update the port to avoid any conflicts on the local environment.
步骤3:更新src/main/resources/application.properties
。 除了更新日志级别和服务名称之外,我们还将更新端口,以避免在本地环境中发生任何冲突。
server.port: 8081spring.application.name=shopping_cartspring.data.mongodb.uri=mongodb+srv://xxx-user:xxx-pwd@cluster0.nrsv6.gcp.mongodb.net/ecommercelogging.level.org.springframework.web.servlet.DispatcherServlet=TRACE
Start the Shopping Cart Service and trigger its Add Item Api. Lets see how the logs of Shopping Cart Service look.
启动Shopping Cart Service并触发其Add Item Api 。 让我们看一下购物车服务的日志。

Again the highlighted part displays Service Name, Trace Id and Span Id. Again the Trace Id and Span Id are same, as this is the first service. This behavior will change with the subsequent call(s). Lets see the logs associated with this request in Product Catalog Service.
再次突出显示的部分显示Service Name,Trace Id和Span Id 。 跟踪ID和跨度ID再次相同,因为这是第一个服务。 此行为将随后续调用而改变。 让我们在产品目录服务中查看与此请求关联的日志。

We can see the magic now. The Trace id for Shopping Catalog Service and Product Catalog Service are same — 3193308f99a2516a
. With this id we can track all the relevant logs from both the services. We can troubleshoot any issues navigating the logs based on Trace Id.
我们现在可以看到魔术。 购物目录服务和产品目录服务的跟踪ID相同— 3193308f99a2516a
。 使用此ID,我们可以跟踪两个服务中的所有相关日志。 我们可以解决任何基于跟踪ID导航日志的问题。
Spring Cloud Sleuth helped us in implementing the tracing with almost zero additional effort. Though we can generate and pass the trace contexts now, but we still need to navigate to each of the service logs separately to connect the traces. Next section will try to address this concern through the Tracing System based on Zipkin.
Spring Cloud Sleuth帮助我们以几乎为零的额外努力实现了跟踪。 尽管我们现在可以生成并传递跟踪上下文,但是我们仍然需要分别导航到每个服务日志以连接跟踪。 下一节将尝试通过基于 Zipkin的跟踪系统解决此问题。
Complete source code of the examples can be accessed at Github.
启用追踪系统 (Enabling Tracing System)
In this section, we will see how Zipkin can provide a better view on the end to end communications. Zipkin receives the traces from all the different services. It aggregates them based on the Trace Id and provide multiple views for lookup. We can leverage it for multiple purposes —
在本节中,我们将了解Zipkin如何提供端到端通信的更好视图。 Zipkin从所有不同的服务接收跟踪。 它基于跟踪ID汇总它们,并提供多个视图进行查找。 我们可以将其用于多种目的-
- To identify requests status 识别请求状态
- To identify service dependencies for a particular request 识别特定请求的服务依赖性
- To identify failure point(s) in the request 识别请求中的失败点
- To identify calls to deprecated services 识别对已弃用服务的呼叫
- To identify service latency 识别服务延迟
All these visuals and metrics help in trouble shooting and improving our system availability.
所有这些视觉效果和指标均有助于排除故障并改善我们的系统可用性。
Enabling Zipkin is even easier than the enabling tracing in the code. We need to download the zipkin archive, from its website, and run it through the Java command. Other options are also available including Docker and Running from source.
启用Zipkin比在代码中启用跟踪更加容易。 我们需要从其网站下载zipkin存档,并通过Java命令运行它。 其他选项也可用,包括Docker和从源代码运行。
curl -sSL https://zipkin.io/quickstart.sh | bash -s
java -jar zipkin.jar
This will start the zipkin server at http://localhost:9411. Now we need to ensure our services (Shopping Cart & Product Catalog) can send the traces to it. To enable this, we need to add a new dependency — spring-cloud-sleuth-zipkin in our services. Update pom.xml
with the following change —
这将在http:// localhost:9411启动zipkin服务器。 现在,我们需要确保我们的服务(“ 购物车和产品目录” )可以将跟踪发送给它。 为此,我们需要在服务中添加一个新的依赖项-spring-cloud-sleuth-zipkin 。 使用以下更改更新pom.xml
—
<dependency>
<groupId>org.springframework.cloud</groupId>
<artifactId>spring-cloud-sleuth-zipkin</artifactId>
</dependency>
With this library on classpath, when the span is closed against a request, the trace information is sent to Zipkin over HTTP. By default it connects to the local address — http://localhost:9411/. You can update its location in application.properties
as below
使用此库在classpath上时,针对某个请求关闭跨度时,跟踪信息将通过HTTP发送到Zipkin。 默认情况下,它连接到本地地址— http:// localhost:9411 / 。 您可以在application.properties
更新其位置,如下所示
spring.zipkin.baseUrl=http://sample-hostname:sample-port/
Lets start our services and trigger addItem
Api, as we did in the last section. Go to zipkin, and refresh you will see something like this.
让我们启动服务并触发addItem
Api,就像在上一节中所做的那样。 转到zipkin,然后刷新,您将看到类似这样的内容。

The first entry reflects our recent request for addItem
api. This request is visible with the specific Trace Id, Start Time and Total Processing Time of request. If the request is failed, you will know here itself. The entry includes all the services getting called in the request processing. You can click on the row and see the nested path like this —
第一个条目反映了我们最近对addItem
api的请求。 该请求与请求的特定跟踪ID , 开始时间和总处理时间一起可见。 如果请求失败,您将在这里自己知道。 该条目包括在请求处理中被调用的所有服务。 您可以点击该行,然后看到这样的嵌套路径-

This displays the response status and the latency figures for each of the services. If you want to track a particular request, just type in the trace id, and you will know the exact status, failure point. You can also have a service dependency graph to visualize the interactions.
这将显示每个服务的响应状态和等待时间数字。 如果要跟踪特定的请求,只需键入跟踪ID,您将知道确切的状态,故障点。 您还可以使用服务依赖关系图来可视化交互。
From operation perspective, this is very helpful. This can provide a quick insight into the services performance, reliability and availability.
从操作角度来看,这非常有帮助。 这可以提供对服务性能,可靠性和可用性的快速了解。
日志汇总 (Log Aggregation)
Zipkin helps in aggregating the traces. Operation engineers or the SREs (Site Reliability Engineer) can really benefit from this. But a detailed investigation is more efficient, if all the logs are aggregated (and not only the traces).
Zipkin帮助汇总痕迹。 操作工程师或SRE(站点可靠性工程师)可以真正从中受益。 但是,如果汇总了所有日志(而不仅仅是跟踪),则进行详细调查会更有效。
Also microservice instances are dynamic, which means new instances will be created and the unused instances will be destroyed. We cannot rely on the local file system to keep the logs. We must externalize and aggregate them at a central location. Also as logs pile up in big numbers, we must have an advanced search mechanism built, to provide efficient scanning of logs.
微服务实例也是动态的,这意味着将创建新实例,而未使用的实例将被销毁。 我们不能依靠本地文件系统来保存日志。 我们必须将它们外部化并聚集在一个中心位置。 另外,随着日志的大量堆积,我们必须建立高级搜索机制,以提供对日志的有效扫描。
The whole mechanism of externalizing, aggregating and making them available for search is referred to as “log aggregation” pattern in Microservices Architecture. Technologies like Splunk and ELK (Elastic, Logstash, Kibana) are the most popular choices to implement this pattern.
外部化,聚合和使其可用于搜索的整个机制在微服务体系结构中称为“ 日志聚合”模式。 Splunk和ELK(弹性,Logstash,Kibana)等技术是实现此模式的最受欢迎的选择。
This provides a robust platform for trouble shooting issues. Unfortunately its implementation is beyond the scope. But you must keep in mind that the puzzle of, trouble shooting microservices, is incomplete without this piece.
这为解决问题提供了一个强大的平台。 不幸的是,它的实现超出了范围。 但是,您必须记住,没有这一部分,解决微服务麻烦的难题是不完整的。
扩展追踪能力 (Extending Tracing Capabilities)
With the sample implementation, we tried to understand the concept of Distributed Tracing, but the learning does not stop here.
通过示例实现,我们试图理解“ 分布式跟踪”的概念,但学习并不止于此。
Sleuth provides a rich feature-set to extend the tracing capabilities. You can add custom fields, referred as baggage, to be passed along with all the service calls. You can use the tags feature to search the traces based on custom fields.
Sleuth提供了丰富的功能集以扩展跟踪功能。 您可以添加自定义字段,称为baggage ,与所有服务调用一起传递。 您可以使用标签功能根据自定义字段搜索跟踪。
By default, Sleuth integrates with Zipkin through HTTP protocol. This can be customized in multiple ways too. You can use RabbitMQ or Kafka instead of HTTP. You can access the Zipkin server through Service Discovery, if you want. You can also integrate with other tracing systems, which suits your needs.
默认情况下,Sleuth通过HTTP协议与Zipkin集成。 也可以通过多种方式自定义。 您可以使用RabbitMQ或Kafka代替HTTP。 如果需要,您可以通过Service Discovery访问Zipkin服务器。 您还可以与其他满足您需求的跟踪系统集成。
You can visit https://docs.spring.io/spring-cloud-sleuth to get a view on its full set of features and options to customize.
您可以访问https://docs.spring.io/spring-cloud-sleuth以获得其全部功能和自定义选项的视图。
Similarly Zipkin also provides options, to customize its behavior. You can visit https://zipkin.io/ to get more details.
同样,Zipkin还提供了用于自定义其行为的选项。 您可以访问https://zipkin.io/以获取更多详细信息。
您可以浏览 (You Can Browse)
Complete Series: Spring Boot Microservices — Learning through Examples
完整系列 : Spring Boot微服务—通过示例学习
Previous Exercise: Spring Boot Microservices — Developing Service Discovery
上一练习 : Spring Boot微服务—开发服务发现