fix mstest sdk eval timeout#558
Conversation
The fixture csproj had MSTest.Sdk/3.8.0 while the prompt claimed the user already upgraded to 4.1.0. This mismatch caused the model to edit the csproj and run dotnet build, triggering slow MSTest SDK NuGet downloads that exceeded the 240s timeout on CI. - Update fixture csproj to MSTest.Sdk/4.1.0 to match the prompt - Add specific CS error codes to the prompt so the model provides direct fixes instead of building - Update rubric item to reflect the project already being on 4.1.0
Reduce scope of the 'Fix multiple v4 breaking changes' scenario: - Remove 3 lower-signal breaking changes from the fixture (AreEqual params, IsInstanceOfType out param, Properties.Contains) to cut agent work - Add max_turns: 5 to cap agent iterations - Lower timeout from 240 to 180 seconds
|
/evaluate |
Skill Validation Results
[1]
Model: claude-opus-4.6 | Judge: claude-opus-4.6 🔍 Full Results - additional metrics and failure investigation steps ▶ Sessions Visualisation -- interactive replay of all evaluation sessions |
Skill Coverage Report
Uncovered:
|
|
/evaluate |
Skill Validation Results
[1]
Model: claude-opus-4.6 | Judge: claude-opus-4.6 🔍 Full Results - additional metrics and failure investigation steps ▶ Sessions Visualisation -- interactive replay of all evaluation sessions |
There was a problem hiding this comment.
Pull request overview
This PR streamlines the MSTest v3→v4 migration eval scenario to reduce complexity and help avoid evaluation timeouts by removing some fixture breakages and tightening the scenario prompt/rubric.
Changes:
- Simplifies the
v3-assert-changesfixture by removing some Assert/TestContext-related examples and replacing a TestContext check with a trivial passing test. - Updates the Goal 3 eval prompt/rubric/assertions to focus on fewer MSTest v4 breaking changes and reduces the scenario timeout/turns.
Show a summary per file
| File | Description |
|---|---|
| tests/dotnet-test/migrate-mstest-v3-to-v4/fixtures/v3-assert-changes/CalculatorTests.cs | Removes some previously-included breaking-change examples and simplifies a test method. |
| tests/dotnet-test/migrate-mstest-v3-to-v4/eval.yaml | Narrows the Goal 3 prompt/rubric/assertions and adjusts max turns + timeout. |
Copilot's findings
Tip
Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Files reviewed: 2/2 changed files
- Comments generated: 2
| pattern: "(ClassCleanup|ClassCleanupBehavior)" | ||
| - type: "output_matches" | ||
| pattern: "(ContainsKey|IDictionary)" | ||
| pattern: "(TestTimeout|int\\.MaxValue|Infinite)" |
There was a problem hiding this comment.
The output_matches regex includes TestTimeout and Infinite, which are present in the broken fixture code and may also appear in compiler errors. This can let the scenario pass even if the agent never replaces TestTimeout.Infinite with the MSTest v4-compatible form (e.g., Timeout(int.MaxValue)). Tighten the assertion to only match the expected replacement (such as int\.MaxValue or a numeric timeout) and avoid matching the deprecated symbol names.
| pattern: "(TestTimeout|int\\.MaxValue|Infinite)" | |
| pattern: "(int\\.MaxValue|Timeout\\(\\s*\\d+\\s*\\))" |
| - name: "Fix multiple v4 breaking changes: Assert, ClassCleanup, TestContext, Timeout" | ||
| prompt: | | ||
| I upgraded from MSTest 3.8 to MSTest 4.0 and my test project has many compilation errors. | ||
| The errors mention ThrowsException, ClassCleanupBehavior, Contains, TestTimeout, and | ||
| IsInstanceOfType. Can you review my code and fix all breaking changes? | ||
| I upgraded from MSTest 3.8 to MSTest 4.0 and my test project has compilation errors. | ||
| The errors mention ThrowsException, ClassCleanupBehavior, and TestTimeout. | ||
| Can you review my code and fix the breaking changes? |
There was a problem hiding this comment.
The scenario name still mentions TestContext, but the updated prompt no longer lists TestContext-related compilation errors and the fixture no longer exercises a TestContext breaking change. Consider renaming this scenario to remove TestContext (or reintroduce a concrete TestContext breaking-change example) so the name stays aligned with what the scenario actually validates.
No description provided.