Engineering Better, Faster, Stronger Prompts (Continuously) with Libretto

Today, LLMs feel equal parts magical and mystifying. They can do things that only seemed possible in science fiction, yet fumble problems an eight-year-old could solve. 

Cracking this case – so that gen AI apps and tools can serve up consistently correct, high quality responses – has been one of the hardest challenges to solve. So you can imagine how excited we are to support the team at Libretto as they take an incredibly strong approach to it. When they succeed, the ability to build AI applications won’t just be the province of the very few and deeply technical, it’ll be possible (and generative AI will become a real resource) for more people everywhere. 

This is a mission we knew we had to get behind. Today, we couldn’t be more thrilled to partner with Founder & CEO Sasha Aickin, and co-lead Libretto’s $3.7 million Seed round with our friends at The General Partnership. 

This team is taking on a tough status quo. Some of the most brilliant people in the world are still wrestling with bad prompts. We’re all so used to computers doing exactly what we tell them, it’s pretty jarring to find ourselves in tedious negotiations with our machine helpers, just to persuade them toward what we want. Libretto is launching its Beta program to change this, starting now.

So, what is their approach? Making the empirical facet of prompt engineering easy for the first time. It’s their hypothesis that enabling testing, monitoring in production, and continuous optimization will help app makers across sectors and stages finally make AI work well for them and their users. This combination automates the grunt work out of getting to the best possible prompt predictably and immediately.

Testing: Libretto takes on the labor of pairing models, prompt texts, and instructive examples to produce better and better outputs. As it goes, it tracks all test cases for all prompts and keeps a large library of results so you can see how they’ve performed. And, most importantly, it provides many options for evaluating LLM responses so you can tailor your goals – sentiment analysis, JSON structure, embedding similarity, et al. – to your prompts. 

Improving: Libretto’s most impressive feature is Experiments. It can create dozens to hundreds of variants of your prompts (and great few-shot examples at the same time) to determine which ones work best. As they’ve proven: “In the time it takes to grab a cup of coffee, Libretto can power through a week’s worth of your prompt engineering to-do list.” Basically, let the tool brainstorm all the ways to argue and plead with your LLM for you. 🤖 🗯️ 

Monitoring & optimizing: Once your best prompt is in production, Libretto can still make it better. As your users throw all kinds of edge cases at your app, it’s there in the background, watching, recording feedback, debugging where it can, and surfacing issues to you (and great examples to use in further tests) where it can’t. 

At the end of the day, there’s a concrete mission behind Libretto’s technical achievement here. As we all step into this new era of LLMs, Sasha and the team want to unleash their creative and productive power for more people. Today represents a leap forward in this direction, but there’s still so much to learn and distance to cover. 

If you’d like to help Libretto get there (and get regular sneak peeks of what the future of this space looks like), we highly recommend joining their Beta program. It’s one of those rare opportunities to start on the ground floor with a team really making a difference in how gen AI will look and work for all of us.


P.S. For far more in-depth detail on how Libretto works and what they’re hacking on next, check out their Beta launch blog post here, or watch their <2 min demo here 👀▶️

Previous
Previous

Cape: Secure 5G Cellular Connection without Compromise 

Next
Next

TurbineOne: Maintaining Tactical Edge with Frontline Perception