Ever ask an AI to pull out key facts – like a name and email – hoping for neat, usable data like name: Jane Doe, email: jane@example.com? Instead, you often get... a rambling paragraph with the info buried inside. Sound familiar? It's like asking a chatty assistant for just a phone number and getting their life story! Trying to reliably parse that mess is frustrating. But what if you could easily get the clean output every time (like we do in *
In your experience, is generated yaml more likely to be 'correct' than generated json (guided by a json schema, then run through a json repair library like https://github.com/mangiucugna/json_repair)? I ask because I have reasonably good results with json schema+repair, but if I can truly get yet better results with yaml that would be quite valuable.
Whats the advantage of YAML + Prompt based vs something like instructor library? Most of the large LLM providers already support some form of structured output/tool calling which instructor seems to use.
I noticed that your article was improperly forwarded by CSDN and paid for reading: https://blog.csdn.net/llm_way/article/details/147583065
Thank you! Not much I can do though 😔
It resolved a question by me.
Why are there so many people I saw using YAML to get their LLM answers.
In your experience, is generated yaml more likely to be 'correct' than generated json (guided by a json schema, then run through a json repair library like https://github.com/mangiucugna/json_repair)? I ask because I have reasonably good results with json schema+repair, but if I can truly get yet better results with yaml that would be quite valuable.
If the json repair is for parsing issues, yaml is definitely better
Whats the advantage of YAML + Prompt based vs something like instructor library? Most of the large LLM providers already support some form of structured output/tool calling which instructor seems to use.
This post is about how structured output can be implemented. Instructor provides one specific implementation that is suboptimal. For example, it uses JSON: https://github.com/instructor-ai/instructor/blob/4779f25a28227e6975d77904f30173a7800ff963/instructor/process_response.py#L310, which causes a lot of troubles.
So this post is like telling you how the vehicle engine works. Instructor provides a built car, but not a perfect one.