DEV Community

Ethan Walker profile picture

Ethan Walker

404 bio not found

Joined Joined on 
The stale eval fixture that passed a broken model

The stale eval fixture that passed a broken model

Comments
4 min read

Want to connect with Ethan Walker?

Create an account to connect with Ethan Walker. You can also sign in below to proceed if you already have an account.

Already have an account? Sign in
My eval handed me a 0.62 and no idea why. The fix was not a better eval.

My eval handed me a 0.62 and no idea why. The fix was not a better eval.

Comments
7 min read
91% pass rate. Gate green. Shipped. Worst regression we had all quarter.

91% pass rate. Gate green. Shipped. Worst regression we had all quarter.

1
Comments 1
2 min read
We stopped writing eval cases by hand. Now every prod incident becomes one.

We stopped writing eval cases by hand. Now every prod incident becomes one.

Comments
2 min read
Your eval criteria are code. Version them like code.

Your eval criteria are code. Version them like code.

Comments
3 min read
Datadog dashboards for prompt regression: the panels we actually keep

Datadog dashboards for prompt regression: the panels we actually keep

Comments
8 min read
Switching our LLM-as-judge from 5-class to binary in CI: the patterns we kept

Switching our LLM-as-judge from 5-class to binary in CI: the patterns we kept

Comments
3 min read
Promptfoo is a CI gate, not an eval framework. Treating it like one cost us $4,200

Promptfoo is a CI gate, not an eval framework. Treating it like one cost us $4,200

Comments 1
4 min read
loading...