You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/ai-ml/guide/rag/rag-llm-evaluation-phase.md
+3-3Lines changed: 3 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -22,7 +22,7 @@ This article is part of a series. Read the [introduction](./rag-solution-design-
22
22
23
23
## Language model evaluation metrics
24
24
25
-
There are several metrics that you should use to evaluate the language model's response, including groundedness, completeness, utilization, relevancy, and correctness. Because the overall goal of the RAG pattern is to provide relevant data as context to a language model when generating a response, ideally, each of the above metrics should score highly. However, depending on your workload, you may need to prioritize one over another.
25
+
There are several metrics that you should use to evaluate the language model's response, including groundedness, completeness, utilization, relevancy, and correctness. Because the overall goal of the RAG pattern is to provide relevant data as context to a language model when generating a response, ideally, each of the above metrics should score highly. However, depending on your workload, you might need to prioritize one over another.
26
26
27
27
> [!IMPORTANT]
28
28
> Language model responses are nondeterministic, which means that the same prompt to a language model often returns different results. This concept is important to understand when you use a language model as part of your evaluation process. Consider using a target range instead of a single target when you evaluate language model use.
@@ -121,7 +121,7 @@ There are several ways to evaluate correctness, including:
121
121
122
122
When correctness is low, do the following tasks:
123
123
124
-
1. Ensure that the chunks provided to the language model are factually correct and there's no data bias. You may need to correct any issues in the source documents or content.
124
+
1. Ensure that the chunks provided to the language model are factually correct and there's no data bias. You might need to correct any issues in the source documents or content.
125
125
1. If the chunks are factually correct, evaluate your prompt.
126
126
1. Evaluate if there are inherit inaccuracies in the model that needs to be overcome with additional factual grounding data or fine-tuning.
127
127
@@ -161,7 +161,7 @@ This metric combination is one where prioritizing one over the other could be ve
161
161
162
162
### Utilization and completeness
163
163
164
-
Utilization and completeness metrics together help evaluate the effectiveness of the retrieval system. High utilization (0.9) with low completeness (0.3) indicates the system retrieves accurate but incomplete information. For instance, when asked about World War II causes, the system might perfectly retrieve information about the invasion of Poland but miss other crucial factors. This scenario may indicate that there are chunks with relevant information that weren't used as part of the context. To address this scenario, consider returning more chunks, evaluating your chunk ranking strategy, and evaluating your prompt.
164
+
Utilization and completeness metrics together help evaluate the effectiveness of the retrieval system. High utilization (0.9) with low completeness (0.3) indicates the system retrieves accurate but incomplete information. For instance, when asked about World War II causes, the system might perfectly retrieve information about the invasion of Poland but miss other crucial factors. This scenario might indicate that there are chunks with relevant information that weren't used as part of the context. To address this scenario, consider returning more chunks, evaluating your chunk ranking strategy, and evaluating your prompt.
Copy file name to clipboardExpand all lines: docs/antipatterns/busy-database/index.md
+4-4Lines changed: 4 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -18,10 +18,10 @@ Offloading processing to a database server can cause it to spend a significant p
18
18
19
19
Many database systems can run code. Examples include stored procedures and triggers. Often, it's more efficient to perform this processing close to the data, rather than transmitting the data to a client application for processing. However, overusing these features can hurt performance, for several reasons:
20
20
21
-
- The database server may spend too much time processing, rather than accepting new client requests and fetching data.
21
+
- The database server might spend too much time processing, rather than accepting new client requests and fetching data.
22
22
- A database is usually a shared resource, so it can become a bottleneck during periods of high use.
23
-
- Runtime costs may be excessive if the data store is metered. That's particularly true of managed database services. For example, Azure SQL Database charges for [Database Transaction Units (DTUs)][dtu].
24
-
- Databases have finite capacity to scale up, and it's not trivial to scale a database horizontally. Therefore, it may be better to move processing into a compute resource, such as a VM or App Service app, that can easily scale out.
23
+
- Runtime costs might be excessive if the data store is metered. That's particularly true of managed database services. For example, Azure SQL Database charges for [Database Transaction Units (DTUs)][dtu].
24
+
- Databases have finite capacity to scale up, and it's not trivial to scale a database horizontally. Therefore, it might be better to move processing into a compute resource, such as a VM or App Service app, that can easily scale out.
25
25
26
26
This antipattern typically occurs because:
27
27
@@ -212,7 +212,7 @@ using (var command = new SqlCommand(...))
Copy file name to clipboardExpand all lines: docs/antipatterns/busy-front-end/index.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -121,7 +121,7 @@ public async Task RunAsync(CancellationToken cancellationToken)
121
121
- This approach adds some additional complexity to the application. You must handle queuing and dequeuing safely to avoid losing requests in the event of a failure.
122
122
- The application takes a dependency on an additional service for the message queue.
123
123
- The processing environment must be sufficiently scalable to handle the expected workload and meet the required throughput targets.
124
-
- While this approach should improve overall responsiveness, the tasks that are moved to the back end may take longer to complete.
124
+
- While this approach should improve overall responsiveness, the tasks that are moved to the back end might take longer to complete.
125
125
- Consider combining this with the [Throttling pattern](/azure/architecture/patterns/throttling) to avoid overwhelming backend systems. Prioritize certain clients. For example, if the application has free and paid tiers, throttle customers on the free tier, but not paid customers. See [Priority queue pattern](/azure/architecture/patterns/priority-queue).
- When writing data, avoid locking resources for longer than necessary, to reduce the chances of contention during a lengthy operation. If a write operation spans multiple data stores, files, or services, then adopt an eventually consistent approach. See [Data Consistency guidance][data-consistency-guidance].
219
219
220
-
- If you buffer data in memory before writing it, the data is vulnerable if the process crashes. If the data rate typically has bursts or is relatively sparse, it may be safer to buffer the data in an external durable queue such as [Event Hubs](https://azure.microsoft.com/services/event-hubs).
220
+
- If you buffer data in memory before writing it, the data is vulnerable if the process crashes. If the data rate typically has bursts or is relatively sparse, it might be safer to buffer the data in an external durable queue such as [Event Hubs](https://azure.microsoft.com/services/event-hubs).
221
221
222
222
- Consider caching data that you retrieve from a service or a database. This can help to reduce the volume of I/O by avoiding repeated requests for the same data. For more information, see [Caching best practices][caching-guidance].
Copy file name to clipboardExpand all lines: docs/antipatterns/extraneous-fetching/index.md
+6-6Lines changed: 6 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -23,7 +23,7 @@ Antipatterns are common design flaws that can break your software or application
23
23
24
24
## Examples of extraneous fetching antipattern
25
25
26
-
This antipattern can occur if the application tries to minimize I/O requests by retrieving all of the data that it *might* need. This is often a result of overcompensating for the [Chatty I/O][chatty-io] antipattern. For example, an application might fetch the details for every product in a database. But the user may need just a subset of the details (some may not be relevant to customers), and probably doesn't need to see *all* of the products at once. Even if the user is browsing the entire catalog, it would make sense to paginate the results—showing 20 at a time, for example.
26
+
This antipattern can occur if the application tries to minimize I/O requests by retrieving all of the data that it *might* need. This is often a result of overcompensating for the [Chatty I/O][chatty-io] antipattern. For example, an application might fetch the details for every product in a database. But the user might need just a subset of the details (some might not be relevant to customers), and probably doesn't need to see *all* of the products at once. Even if the user is browsing the entire catalog, it would make sense to paginate the results—showing 20 at a time, for example.
27
27
28
28
Another source of this problem is following poor programming or design practices. For example, the following code uses Entity Framework to fetch the complete details for every product. Then it filters the results to return only a subset of the fields, discarding the rest.
29
29
@@ -75,7 +75,7 @@ The call to `AsEnumerable` is a hint that there is a problem. This method conver
75
75
76
76
## How to fix extraneous fetching antipattern
77
77
78
-
Avoid fetching large volumes of data that may quickly become outdated or might be discarded, and only fetch the data needed for the operation being performed.
78
+
Avoid fetching large volumes of data that might quickly become outdated or might be discarded, and only fetch the data needed for the operation being performed.
79
79
80
80
Instead of getting every column from a table and then filtering them, select the columns that you need from the database.
81
81
@@ -107,7 +107,7 @@ public async Task<IHttpActionResult> AggregateOnDatabaseAsync()
107
107
}
108
108
```
109
109
110
-
When using Entity Framework, ensure that LINQ queries are resolved using the `IQueryable` interface and not `IEnumerable`. You may need to adjust the query to use only functions that can be mapped to the data source. The earlier example can be refactored to remove the `AddDays` method from the query, allowing filtering to be done by the database.
110
+
When using Entity Framework, ensure that LINQ queries are resolved using the `IQueryable` interface and not `IEnumerable`. You might need to adjust the query to use only functions that can be mapped to the data source. The earlier example can be refactored to remove the `AddDays` method from the query, allowing filtering to be done by the database.
111
111
112
112
```csharp
113
113
DateTimedateSince=DateTime.Now.AddDays(-7); // AddDays has been factored out.
- In some cases, you can improve performance by partitioning data horizontally. If different operations access different attributes of the data, horizontal partitioning may reduce contention. Often, most operations are run against a small subset of the data, so spreading this load may improve performance. See [Data partitioning][data-partitioning].
123
+
- In some cases, you can improve performance by partitioning data horizontally. If different operations access different attributes of the data, horizontal partitioning might reduce contention. Often, most operations are run against a small subset of the data, so spreading this load might improve performance. See [Data partitioning][data-partitioning].
124
124
125
125
- For operations that have to support unbounded queries, implement pagination and only fetch a limited number of entities at a time. For example, if a customer is browsing a product catalog, you can show one page of results at a time.
- If you see that requests are retrieving a large number of fields, examine the source code to determine whether all of these fields are necessary. Sometimes these requests are the result of poorly designed `SELECT *` query.
132
132
133
-
- Similarly, requests that retrieve a large number of entities may be sign that the application is not filtering data correctly. Verify that all of these entities are needed. Use database-side filtering if possible, for example, by using `WHERE` clauses in SQL.
133
+
- Similarly, requests that retrieve a large number of entities might be sign that the application is not filtering data correctly. Verify that all of these entities are needed. Use database-side filtering if possible, for example, by using `WHERE` clauses in SQL.
134
134
135
135
- Offloading processing to the database is not always the best option. Only use this strategy when the database is designed or optimized to do so. Most database systems are highly optimized for certain functions, but are not designed to act as general-purpose application engines. For more information, see the [Busy Database antipattern][BusyDatabase].
136
136
@@ -191,7 +191,7 @@ For each data source, instrument the system to capture the following:
191
191
192
192
Compare this information against the volume of data being returned by the application to the client. Track the ratio of the volume of data returned by the data store against the volume of data returned to the client. If there is any large disparity, investigate to determine whether the application is fetching data that it doesn't need.
193
193
194
-
You may be able to capture this data by observing the live system and tracing the lifecycle of each user request, or you can model a series of synthetic workloads and run them against a test system.
194
+
You might be able to capture this data by observing the live system and tracing the lifecycle of each user request, or you can model a series of synthetic workloads and run them against a test system.
195
195
196
196
The following graphs show telemetry captured using [New Relic APM][new-relic] during a load test of the `GetAllFieldsAsync` method. Note the difference between the volumes of data received from the database and the corresponding HTTP responses.
Copy file name to clipboardExpand all lines: docs/antipatterns/improper-instantiation/index.md
+3-3Lines changed: 3 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -16,7 +16,7 @@ keywords:
16
16
17
17
# Improper Instantiation antipattern
18
18
19
-
Sometimes new instances of a class are continually created, when it is meant to be created once and then shared. This behavior can hurt performance, and is called an *improper instantiation antipattern*. An antipattern is a common response to a recurring problem that is usually ineffective and may even be counter-productive.
19
+
Sometimes new instances of a class are continually created, when it is meant to be created once and then shared. This behavior can hurt performance and is called an *improper instantiation antipattern*. An antipattern is a common response to a recurring problem that is usually ineffective and might be counter-productive.
20
20
21
21
## Problem description
22
22
@@ -45,7 +45,7 @@ public class NewHttpClientInstancePerRequestController : ApiController
45
45
}
46
46
```
47
47
48
-
In a web application, this technique is not scalable. A new `HttpClient` object is created for each user request. Under heavy load, the web server may exhaust the number of available sockets, resulting in `SocketException` errors.
48
+
In a web application, this technique is not scalable. A new `HttpClient` object is created for each user request. Under heavy load, the web server might exhaust the number of available sockets, resulting in `SocketException` errors.
49
49
50
50
This problem is not restricted to the `HttpClient` class. Other classes that wrap resources or are expensive to create might cause similar issues. The following example creates an instance of the `ExpensiveToCreateService` class. Here the issue is not necessarily socket exhaustion, but simply how long it takes to create each instance. Continually creating and destroying instances of this class might adversely affect the scalability of the system.
51
51
@@ -106,7 +106,7 @@ public class SingleHttpClientInstanceController : ApiController
106
106
107
107
- Be careful about setting properties on shared objects, as this can lead to race conditions. For example, setting `DefaultRequestHeaders` on the `HttpClient` class before each request can create a race condition. Set such properties once (for example, during startup), and create separate instances if you need to configure different settings.
108
108
109
-
- Some resource types are scarce and should not be held onto. Database connections are an example. Holding an open database connection that is not required may prevent other concurrent users from gaining access to the database.
109
+
- Some resource types are scarce and should not be held onto. Database connections are an example. Holding an open database connection that is not required might prevent other concurrent users from gaining access to the database.
110
110
111
111
- In the .NET Framework, many objects that establish connections to external resources are created by using static factory methods of other classes that manage these connections. These objects are intended to be saved and reused, rather than disposed and re-created. For example, in Azure Service Bus, the `QueueClient` object is created through a `MessagingFactory` object. Internally, the `MessagingFactory` manages connections. For more information, see [Best Practices for performance improvements using Service Bus Messaging][service-bus-messaging].
0 commit comments