Video: Spec-Driven AI Development – Episode 2 | Duration: 4043s | Summary: Spec-Driven AI Development – Episode 2 | Chapters: Welcome and Introduction (11.599999s), Spec Driven Development (108.115005s), AI Code Challenges (227.04001s), Spec-Driven Development Overview (707.97003s), Introducing Spec-Driven Development (1238.5s), Coding's Future Outlook (2263.53s), Engineering with Agents (2464.8s), Specification-Driven Development (2627.4102s), AI Code Analysis (3111.465s), Human in Loop (3318.605s), Agent-Driven Development Challenges (3586.67s), AI-Powered IDE Features (3837.0151s), Concluding Remarks (3986.0051s)
Transcript for "Spec-Driven AI Development – Episode 2":
Hello everyone, and welcome to today's live stream hosted by JetBrains. I'm Paul Everett, Developer Advocate. We're going to have a lot of fun today. This is a follow on to our previous live stream about spec driven development. And I'm joined by a colleague of mine that is I'm really happy to have him, Dmitry Jimirov. Hello, Dima. Hello, Paul. Nice to see everyone here and looking forward to doing some fun AI development today. Yes. In fact, as background for everybody, this session and Dmitry joining us is because we've talked a lot about this topic. You have a bunch of ideas about where this is all going, but everyone knows you as the creator of PieCharm. Or if they don't know that, they should know that. But you've actually done a whole bunch over the years in kind of an architect role. Can you talk a little bit about that? Yeah, so basically I spent a lot of time working on the Intelligy platform and doing the work that was required to turn a single Java IDE into the product line that we have now by Charm, Ruby, Mine, then all the other IDs that were built on top of that. A lot of this was my work. I was also involved in the early days of Kotlin and I wrote a book about Kotlin together with my colleagues, Svetlana, and I did a bunch of other stuff over the years. Yeah. So, we have someone who helped design a language with us today. So that's a pretty cool start. As a reminder for how we're going to operate this, we want this to be less like a presentation. We want it to be more like a dialogue, especially a dialogue between me and Dimitri, but also a dialogue with all of you. You're going to see me look down and see that someone just posted from Belgium and The Netherlands and Wales and stuff like that. So please talk back with us. We'll bring up your questions at different points. We'll put them on the screen as we get to different points. What do we want to get out of today? So the last session introduced us to some of the why and a little bit of the what. We're going do a lot more of the what today. We're going to show spec driven development in action on a real project. So we want to teach you something. But we also want to have a conversation. Dmitry and I are going to have a conversation because he has exactly what we need, the human in the loop experience. But we also want to talk with you and have you talk with each other. Why? Because nobody has the answer. We're all learning and discovering this as we go. And one of the points I want to make is you might wind up making your own process for spec driven development. This all ties back to the software development life cycle. What we want to show is the best of both worlds. The agent can go create a bunch of code, but you're not going to let the agent commit to main. There's a human in the loop that now has this big problem about being overwhelmed reviewing all of this code. So we wanna give the human in the loop the best tool ever created for working with large software development projects. Dmitry, I want to go ahead and start getting your opinion on this because just this morning in Slack, you brought up a point about AI fatigue and the human in the loop. Can you talk a little bit about that? Yeah, this is actually, I think this is a big challenge that many of the people who start to actively adopt AI are facing. So with the speed that the agents can deliver code, basically they can generate as much code as you want to, like generating code is no longer the bottleneck in terms of cost or in terms of like time consumption. You can have hundreds of thousands of line of code generated for basically just the cost of tokens without spending any effort. Now, the challenge is figure whether those are the lines of code that meet your needs, whether they are appropriate for your application. And one thing that I've seen the AI do quite a lot is basically thinking up concerns. So what about this? What about zero with non breaking joiner spaces in your Unicode strings? What happens if you have them? Like, have you considered that? Probably not. Do you actually need to worry about things like that? Also not always. And the AI generates a lot of code to handle edge cases that may or may not exist in your code. And if you want to keep your code slim and manageable, you actually want to get rid of those. You want to tell it, no, you should not think of these edge cases, they are not relevant. But when the volume of code increases, you just start getting tired of making all these decisions. Basically let the AI do whatever. And then you end up with hundreds of thousands of lines of code for a very simple task that can be solved with like maybe one tenth of the amount of code if it was written by a human. Just to set the stage correctly, Dmitry is helping to advise on a code base that's 25 years old. It's colossal cross platform, has all kinds of features. And so it's like this perfect test case for the kinds of things that we're going to be talking about today. How do you get the agent? How do you steer the agent? You generate the code you would have written if you would have written it yourself. Dmitry, does that sound a little bit like what you'd like to cover? Yeah, very much so. And also another thing that I think remains for to be, that we all still need to figure out is how to manage the volume of specifications that arises when you work on over a project for many years. So for example, with IntelliJ IDEA, we have just a number of issues in the issue tracker that have been resolved over the years is like a couple 100,000, I think. And if you turn this into the amount of specifications for all of the things that have been done, which were not always covered by issues in the issue tracker, this will be like maybe half a million of individual specifications. And like, if you had this, suppose you follow the process, suppose you have half a million markdown documents documenting every single decision ever made in your project. How do you manage this half a million documents? How do you make this fit in the How do you manage the context of an agent? How do you make sure that the agent looks at the specification documents that are relevant? And I think this is still something that no one knows the answers for. Indeed. I'm very curious to explore. Yeah, and I'm curious to explore with all of you in chat. Hey, Okay. We know IntelliJ does have bugs and that's what we're talking about today. Everyone in chat, I'm going to do a little bit of a poll kind of thing. Are any of you right now letting your agent resulting code get committed directly with no human in the loop? There are people saying that that's coming this year. I'm just curious, are any of you doing that right now? Okay, if we could go ahead and bring my screen on And just for fun, if you want to see the PyCharm fifteenth anniversary video and see Dimitri live, we have it on our YouTube channel. Today we're going to talk about spec driven development and I'm not going to pitch you on this is the one way to do it. I'm going to be showing you a framework that I use only because I want to show you some of these ideas and get you thinking for yourself. This is a pattern that is appearing in lots of places. For example, GitHub has SpecKit, which was one of the first to kind of really get this idea out in the mainstream. I think it's how it got on my radar too. Kiro from Amazon is an editor IDE built around this idea, kind of a spec driven development first. Says so on the 10, but that's what it's about. I believe, Anti Gravity from Google has a focus on a plan mode as a first class citizen in the user experience in the UI. So I'm going to be talking about Agent OS from a guy named Brian Castle. It is a set of commands and skills and automations for Claude Code for doing his philosophy of spec driven development. If you scroll down, he talks about the workflow where you talk about a three layer context. All the decisions your company has made, where is this product going? And then for each one of your features, what is it about the feature that we need to capture? I got to admit, the reason I got interested in it is because he makes great videos explaining the why behind Spec Driven Development more so than just his own product. Finally, to set the scene, I'll talk in a second about, actual project that we will be using today instead of a dashboard demo or something like that. But I'd like to talk, with Dimitri a little bit and with you in the audience about that last question I asked, the human in the loop. Gastown has gotten big recently. Ralph has gotten big recently. And they're thinking about there's no human in the loop. A bunch of agents are just going to walk around until they get it right. Dmitry, you've probably seen this idea. Do you have a hot take on it? So it's interesting that the places where people have seen like the most, the largest projects generated by such agents are the projects where the specification is very well known. So for example, Endtopic had their agents build a C compiler which can compile the Linux kernel. It's very well known what the C compiler needs to do. It's very easy to check if it's correct or not. It's very easy to check it's efficient or not. OpenAI folks built a web browser and the Java language server, which is by the way, the number of lines of code is like an order of magnitude more than I would have expected the Java language server to have knowing how Java support is implemented in IntelliJ. But that's a different issue. But no one has let agents build something for which it's not known what we need to build actually. And I think this is the much bigger issue with a lot of the projects is just understanding the right way to build things, right approach, the right abstractions that you need to have. And when I was head of product for the Intelligent Department, this was my primary concern. How do we make sure that we build the right thing? How do we make sure that the features we built are in a single conceptual framework that are discoverable by our users, that do not confuse the user by using different paradigms for different things. And I think no one has ever seen the results of an agent like making all of these decisions of this type just yet. Not only making decisions, but fitting it into a context window. I agree with exactly what you just said. And because I'm in charge of the questions, I get to choose the answers that validate what we want to believe. Here's a first answer saying, Hey, I'm always going to be in the loop. Another one about always being in the loop. Another one about always being in the loop. Yep. And I could go on, but here's one, starting the process. No, but maybe this might change in the future. And then there's another one about, let's see, I can't quite get it right now, but it is about these specifications and what are we really talking about here for writing it down? First, we don't yet know and we don't yet know how models are going to react to things like specifications. But we're trying to capture the way the company works, way the company's vision is on this and try and steer to the right conclusion. What does that mean in practice? I'll show you one way today, but you make your own decisions. Dmitry, I cut you off. You were about to say something? No. Let's see if we've got a quote from Corey Doctorow, I'm going to show it. He gave an incredible PICON keynote this year. And then this was the question I was able to find it now. Sorry about that. What's the amount of specification we're talking about? As Dmitry said, if you wrote specifications for all of IntelliJ, you'd have just megabytes, at least maybe gigabytes of markdown files too big to fit in the context window and quite a maintenance burden. Dmitry, am I getting you right on that? Yep. So we've got to get better systems, better processes, but we've been here before with software development life cycles. So we're going to switch into doing it mode. And I'm going to try and go kind of fast to use our time wisely and to listen to you. I'm going to switch over to the best editor in the world, PyCharm. Thank you for creating it, Dmitry. I'm going to start with showing you a project that I already created using Agent OS and every single feature went in through spec driven development. And so this project has a number of cloud commands that implement a workflow. You might use this workflow. You might use a different workflow. But this is the glue that connects together all the agent machinery to get it to do things your way and to write down the results in the place you want them the way you want them. Pretty clever approach. Really lightweight too, so you could do your own. Then we've got this folder with the specifications for this project. This is the important first part. What we'll show today is we're going to have a conversation about the product as a whole, a dialogue with the agent. And right now, what is this product? Who is it for? All that stuff. We're going to record some decisions about a tech stack and we're going to make a provisional roadmap about the features, the directions we're going. Then I'm gonna read these markdown files, ask some more questions, and then me and the agent have an agreement. And we can proceed on each feature. Each feature looks like a planning step, an implementation step, and some artifacts that get written down afterwards. If a human walked up afterwards, they could recover all the decisions that were made. And that in the software development life cycle is an important point, especially if you're working on a team with other people. Dmitry, you got any comments on this before we start coding? Yeah, I think it's like this document that you were showing just now actually raises a very important concern, like how much specification do you need Like, to for example, for me, this looks like this is actually more text than it would take for me to just write this as code. For example, like, add two optional keyword on the parameters. I just go in and add the parameters. I don't want to explain the AI to the AI to do this. Sure. And also another issue is that the more spec you have, the harder it is to review because the AI can either easily generate like a 100 item checklist where 98 items are very obvious and two are completely wrong and will ruin your project if you let them implement that. And finding those two items in the list of 100 is a challenge. Sure. I think it's important to be minimal in how much spec you write and don't try to like explain the things that are obvious. Indeed. I will say I find it valuable to scan that occasionally because agents make the same mistakes over and over. And it shows me a place where I can refine the skills or the rules or whatever like that. But you're right. The document for each feature that people are going to be interested in the most is the task list as we tell it to implement one step at a time. So you're exactly right on that. Let's see if we have any questions to bring up. This is an interesting point. I'm going to bring this up, Dmitry, because maybe you've seen this in the past. If you go back to my time in the nineties, we had these things called case tools where you would get into the rational unified process and write it all down and generate code. Dmitry, have you seen this movie before? Have you seen this pattern before, spec? Yeah, I mean, generating code from UML has been like a hot topic back when I was beginning my career, like in the early oughts, and this didn't really seem to catch up. But I think the interesting issue that it connects to is basically the non text representations of code. So for a long time, we ended up just working with code as text because that's the easiest to store in Git. That's what all tools work with. But now if you don't actually type the code yourself, you don't need to worry about how the typing process works and reading can be much easier if, can possibly be much easier with the aid of like visual, more visual means of expression than just text. Sure. And I'm very interested in how this will in Upwork. I think this is very much unexplored area like how, with all this code that agent generate for you, can you use non text representations to visualize its structure, to visualize its contents and to make it easier to review and understand? Sean Grove from OpenAI had a good talk last week. Spec is the last year, Spec is the new code going into that. Let me bring up some points. I'm going to bring this one up because you used the word engineering and that's what I am interested in bringing the engineering back into the agent first through specs, but through some other techniques as well. Agent OS is just a framework that as long as you've got commands and skills, any of your agents such as Junie could support it. So those are two different things. Junie is an agent. Agent OS and SpecKit are systems to do spec driven development. Why do I prefer Agent OS over SpecKit? Because he makes good videos. I'm not going to lie. That's really the only reason. But as I got to know it, I felt that his philosophy matched my thinking a little bit. And then this is exactly what Dmitry was just talking about. A higher order language. I'm a Python guy. I don't know anything about managing memory because Python works at such a high level. Maybe there's a level up from that where the Python code is just an artifact. I hope I retire before that because I still like my Python code. Okay. Now we're going to get into what we're actually going to show. I have a package I've already published out to the Python world. It's about testing HTML, But it was not written with spec driven development. So I'm going to introduce spec driven development into an existing project. And this is the project that I'm going to be introducing it into. I care about this project. I want it to look like code that I wrote that I would stand behind. And so I'm going to bring engineering principles into it. It's going to have a lot of tests and documentation examples and stuff like that. One key challenge: this is written with a Python feature in Python 3.14 after the training cutoff date. So I have to do a lot of work to teach the model about this new feature. Whenever I walk back up to a project after some period of time, I do the same thing. I go to look, did I leave any work uncommitted? Am I on a branch and I need to deal with it? I need a clean slate when I walk back up. And maybe I'll go look at the Git history or something like that and also get an idea. But, I'm about to start a unit of work and I want a clean slate. What I'll do is I'll actually start. I'm going to show you Claude code in PyCharm in two ways. I'm going to show it to you in our AI chat. And I'm going to show it to you in the terminal. As a note, we're doing a lot of work like right now in the next couple of months about this user experience. So stay tuned. In Agent OS, when you walk up to something, you can discover standards and the way you work. But then the first real step in the workflow is to have a conversation. What is this product? What am I actually making? So we're going to run that slash command. And away it goes, that slash command is nothing more than, this. It is a Clog Code command that has some principles in it that reflect Brian Castle's opinion of how to do spec driven development. And then it tells you to talk to you a little bit, ask follow-up questions, things like that. So as it's going along, I'm in SONNET. I could have switched to Opus for planning mode and then switched back to SONNET for implementation mode. It does some analysis. It sees that I've got some information already. In Agent OS, I've got a mission, a roadmap, and a tech stack. I'll do is, let's see, do we have enough time? Will, you know what? I'll start over. I'm going to delete that. We moved along fast enough that I can start a new plan product. Do you want to create a branch first before you put this? This is a good question about branch creation for this step, for the plan product step. Since it doesn't generate any code, I generally don't create a branch for it. It's just generating three Markdown files. Really useful point though, because I might iterate with it and make changes with separate commits. So let's do that. I'll make a new branch for 'Plan Product'. So I'm now on the 'Plan Product' branch. There's another point that Dmitry is kind of getting at here is when I go back to the commit history and stuff like that, it'll make it really clear where I'm doing things in these different workflow steps. So away it goes. And this time it's not going to know that that exists. Let's see. I've got a couple Did you actually delete the things? Oh, you okay. Yeah. I deleted it. Yep. Okay. Let's see. You need to allow it to run the command. It sees that they are still lying around. All right. And you're right. I will go into brave mode. That was interesting. It's hard to get live streams on this thing because it behaves differently each time. This time it detected I deleted the files, but it solved that they were unstaged commits and wanted to look at them anyway. So it's starting the planning process. What does this product solve? Let's see. Test accessibility based testing of HTML. Did you start a new chat now? I started a new chat because it had discovered a little bit about the files that I had before. Okay. Yeah. We'll say this is for developers and testers. This, I'll say has no dependencies and is based on accessibility. This project is basically a port of testing library from the world of JavaScript and React. Let's see. I want more kinds of queries. Because the first feature we're going to add is a new kind of query to what's already existing. Look in to md' I have a file, Todo. Md, where before getting into Spec Driven Development, I organized my work locally. And this is where it's going to start looking at your code base and boy, does it find some things that you would never have thought of. I'll skip this for now. This is just a library, not an actual application. So now it's writing the files and the human's in the loop now. I can go over to these files and see, Hey, did you get that right? Let me reformat. Did you get that right? Okay. It looks pretty good. Text stack. Python three fourteen. Wow. You discovered all of those things. That is correct. Let's look at the roadmap. More kinds of queries. In fact, it found the first thing. Is it like the same thing, like phase one and phase two, like the biocompat query is actually the query that you Right. Right. Right. So that was nice that it discovered the work correctly. And, what I'll do is I'll commit now and we'll go back and we'll take some questions. But this is important to do this and to keep this up to date because this steers all the future work. You don't want, as Brian Castle says in the videos, you don't want to retype this in every prompt. You want a small amount, as Dimitri also said, to not fill up the context window. So we've got to learn some engineering on getting this right. And in fact, Clogcode and other agents have tools to help read your artifacts and rewrite them to be really context friendly. Okay, I'm going to bring up a question while I commit. I'm going to bring up a question. Let's see, Dmitry, if you want to take this one while I do my commit. From a software development lifecycle perspective, these specs are dev oriented. How would you view this? This is my contract with the agent. How would you view this as the contract with business stakeholders? I don't necessarily think that the specs need to be dev oriented. So in the end specs, so like right now we are working on the library that target audience for it is developers. So like very clearly the specs for that thing is our developer oriented. But if we were working on an application targeted and end users, I would not expect the specs to be in terms of views and controllers. I would much more expect them to be phrased in terms of end user features. And yes, this will be probably a little too detailed for like, I don't know, executives to be able to understand or not understand, but rather to go into that much detail. So maybe there needs to be some hierarchy where there's like a very high level specification and, but this is the product mission basically. So in a way, the product mission is what's targeted at management and then the UX engineers can very much look at the specifications that describe how individual features work. And for Ops, there's Ops as a whole new separate set of things. You would want also have separate sets of specifications describing how the application is deployed, monitored, stuff like that. One of the interesting territories for research on this is what Dmitry just described also translates well to sub agents. Specialized agents who will get a subset of the configuration context and instructions and do their job really well, not bloat the main context. And maybe even run-in parallel if you can get orchestration. What I just said was a lot of magic wand waving. This is one that I do quite a bit. Instead of going into chat, I write a markdown document first that is all of my thinking. And then I point the spec driven development process at that document and get it to interview me. As it turns out, Dmitry, businesses probably have these documents lying around all over the place, right? Yeah, but they go out of date very quickly. Microsoft Word is a write once format. I'm going go ahead and get on. I actually told the agent to do my merge and squash and delete the branch. So I'm back on main and I have my project has the start of spec driven development in it. I told it, this is what I'm doing with the project. This is my tech stack. Make all your decisions based on that. So now what I want to do, as you do other projects, you'll find the agent asks you great questions about the decisions you want to do, other related projects or initiatives or thinking. Then when we get out of that, I skipped out of plan mode because I needed to go adjust the permissions on it. But we're about to go and do a feature. So in this next step, we want to use a Spec Driven Development feature process. For this one, I'm going to switch over to show that both are fine. I'm going to switch to the terminal. I'm going to run Claude in dangerous permissions mode. And what I want to do first is to go into plan mode. Shift tab, shift tab, shift tab. And in plan mode, see I've got it set up to switch to a different model Opus. And then when I switch out of plan mode, it'll go to SONNET. There's a way in Clog code to configure that. And I want to do work in small units. There are several things that I want all of you to think about from spec driven development. One we've already talked about, which is have a conversation with the agent about these files and specifications. The second is to work in really small units. Dmitry at the very beginning talked about this AI fatigue and the review cycle. When you tell it, Go build the world! It's going to give you back more than you can handle. Work in smaller units on branches, frequent commits so that the surface area of your review is smaller. You also get better context management. Now, Dmitry, I just made a very strong assertion about working in small units. Do you have a comment about that? I think this also depends because if you make the unit too small, are slowing down yourself quite a lot. So you can basically ask the agent to build every individual function and then check if this function is correct. But at some point this is just, this just slows you down with, especially if you try to break the development of each function into multiple steps, like write me a step for this, like write me a spec for this function, write me a task list for this function, write the function, then you are spending a lot of time and a lot of tokens for not really a lot of value. So I think the right amount, I don't know, I think the right amount of, the right size of chunk really depends on the project, but I like my intuition tells me it's like on the order of magnitude of maybe like a few 100 lines of code, like a size of PRs that you are comfortable, that you were comfortable with reviewing before the AI was unleashed upon us? Sure, sure. Speaking of unleashed upon us, we're getting a number of questions about different kinds of agents. I'm going to be a coward and not have much of an opinion about this. Junie or Codex or Claude. At JetBrains, our opinion is we want to be an open platform. And so for example, if I went back to AI Chat, you'll see I've got some choices in here, more choices to come. And this choice is I'm writing my own agent and I'm using ACP. I'm writing an agent in Python just to experiment. So we have a really open concept of plugging into other agents. These things leapfrog each other, models leapfrog each other. You shouldn't be tied to just one. And let's see, this is a provocative question. Can you get the agent to create these specs, these steering files without me? And it fills in a lot of details, limiting what the engineer has to load into their brain because we have a context window too. Dmitry, from a best practices perspective, do you have an opinion about the soft, the sweet spot between human in the loop and I'm overwhelmed. So for new projects, well, if you don't mind your projects suddenly being in Rust rather than Go, for example, you can surely let an agent make all the decisions for you. And that's actually also a very interesting aspect of this agent based development because with the agent, like agents don't really care about the language they're writing. So it's much less important to make the choice of a language when you start to work on the project because to some degree, whatever the language you choose, like the project is going to work. What matters is the runtime environment like switching between jvm.net and Go is still matters, but with the JVM. Sure. By the way, do your camera image seems to be, maybe it's just me, but I don't see your camera image. I hadn't noticed, I'm sorry. Let me switch to my, my apologies, I predicted this would happen. My expensive Sony camera is not trustworthy. Yeah, now we see you again. Wonderful. So I'll go back to using my MacBook camera. All right, here we are. Good. Resuming, I'm working on a feature now. My Roadmap has a feature. And I want to work one feature at a time, small units. So I did a prompt telling ClawdCode, Go find the Roadmap. Find the next item, and then make a branch. I'm so lazy. I can't even be bothered to go up here and make a branch with the right name. Get ClawCode to go into plan mode so it has this conversation with me. And then at the end of plan mode, go run the Agent OS, command where we work on the specification for the feature. The agent found the correct item on the roadmap. I want to add a new kind of query so that I can query my HTML not just by element, but also by comment. Do I have any other things that I want to say about this? I want to find elements by HTML comments. And sometimes it'll ask me another question like, do you want me to return the text or the element? And we can see Claw Code going out. If you haven't used Claw Code before, it collapses a bunch of stuff and you can do Control plus O to expand it. And you can also say CTRL plus B to get it to run-in the background. I can't keep track of so many things, so I don't do it that way. I'm still in Plan Mode, which means I'm using Opus, which means it's doing a lot more thinking. It's also keeping track of my context window. And as we've all said a few times, this really is the art. It isn't just running out of context. It's context rot where the context in the middle loses importance relative to the context on the ends. Let's see if we have a question to answer. Let's see. There are two opposite questions. I will show one of them. Do we think there's a place coming where it's apocalypse and there are no coders? I have a strong opinion about this. Dmitry, give us yours. From what I see, this is not going to happen. So, the role of a software developer changes, the structure of the software development industry changes. So with software becoming much cheaper, there's going to be a lot of custom software being developed. So something like many companies would not want to deal with something like SAP and will, it will be much more affordable for them to build custom systems for things like maybe even ERP that are tailored for their specific requirements and their specific processes rather than trying to shoehorn a general solution to the requirements that they have. But like the people who have the skills to do that, the people who can actually maintain software over the long term know how to do this, they will still be in very much demand. Peter has the kind of opposite opinion that these tools are not currently a threat of replacing serious software engineers. I think I come down where we need to learn some skills to bring the engineering to agents so that we get the best of both worlds. I certainly find that I can do a lot more programming than I used to when I get this acceleration, but I want to make sure the results are something I would have written myself. That's where I come down on that. So continuing, it's going to ask me if I've got more detail about this feature that I'm making. And I'm going to say no, because this doesn't really have a UI. But if you're doing like a GUI or a website, you might have a little hand sketch on your notepad or something or your Figma document. And you feed it into the specification. It's retained over time so that when the next person walks up and wants to see the why behind a commit, they can go find the inputs that the human gave. Now, this gets interesting. We're having a conversation about the feature. It's not me just telling the agent. The agent's going to ask me some really good questions. This happens to me all the time. The agent will say something, I'm like, Oh crap, three weeks from now, I would've thought of that. Thank you for bringing that up. And this is one of them. It knows I've got these other similar things. It just wants to make sure, do I want the same behavior? Yes. Follow existing patterns. Those queries are good. Dmitry, were you going to ask about This is one thing where I wouldn't actually expect the agent to ask me, but rather, of course, I wanted to follow the existing patterns. If I didn't want to, I would have said that explicitly. Is the one where it's a really useful thing. An HTML comment isn't really that valuable. Why am I asking for this feature? Maybe what you want, Paul, is to get the element around it or something like that. But no, I actually want you to return text, the body of that comment. And so I don't want the parent element. I want the actual text. And so I'll just say that, give me the text of the comment. Yeah. That was a good one. Yeah. Because that's really a choice that you needed to make and you made it specifically. So I want the parent, the sibling, whatever. And this gets to Intent. All right. And it knows a little bit about previous ones. So Query by Text, return comment. That's what I want. The second choice. It's recording all of these things in Markdown documents. But one thing this means is five features from now, I might change my mind about something. And it's something we talked about at the beginning when, Dmitry said IntelliJ would have a million Markdown files. If you change your mind, do you want to go back and change all the specifications or is the code the source of truth? Everyone in comments, how do you feel about this? Should you keep the specs as the source of truth or is the code the source of truth? Wish I could do a poll. I'll bet that'll be an interesting I think the way this is, the direction where this is heading is clearly that the code becomes, the specifications become the source of truth. So people even like, for example, Andre Breslav, who was the original designer of Kotlin, he's now working on a startup where he's basically designing programming language that consists of specifications the entire software development workflow, in his view, just should consist of writing and changing specifications. And when you, like the hard problem is when you change the specification, figure out how, like what is the what is the right amount of code that needs to be changed based on the specific change in specification. And he's very explicit that you don't write new specifications. You change the ones that you have made before if you understand any of the task changes. I wonder where this is heading, but I think Where is this heading? What you just described is fascinating, Like all the layers down to the model itself, if you could train a model around what you just said, that it understood there was this thing called a specification. And my tool for discovering my context wasn't GREP through the code, it was go to the specification, you could get some pretty radical improvements. I think you don't even need to train new models for that. I think this is something very much achievable with existing models and AgentOS is already a big step in that direction. Sure. Okay. We have a point here. I'd like to encourage this point. This is a good point about iteration. I'm guilty all the time of over specifying and constraining the agent and filling up the context, I probably could type less and let it unquote, think. God, I hate to say that word, but let it explore is a better way to put it. Dmitry, this one's for you. How are engineers at JetBrains, approaching all of this? You talk to a lot of people. So I don't think we have a formal, any kind of like a formal process that demands people to use AI. Like this is very much based on the initiative of individual developers and a lot of people are exciting using AI. They see that the results that they are getting, the results that their colleagues are getting. We have a very lively Slack channel where people discuss their experiences using AI, their specific approaches that they use. And by the way, we don't demand the use of company products, for example. So a lot of people use Cloud Code on the command line and this is something that also like many people enjoy and this is fine. Sure. And that's useful for us to see what's happening in the industry. That too, yes. I just said yes to, Claude was asking me about some skills to inject And I went ahead and said yes so that we could keep moving. It's asking me a follow-up question after it loaded some skills. Do I want to match anywhere in the comment or the whole thing as an exact match? I don't know. I both have to use the same policy as other queries such as title. So I've got similar decisions I had to make in the past, but they're in code, not specification. So I gotta go tell the agent, go look at my code and see how I'm currently doing it. We're still in plan mode. And what I'm doing is I'm saving time and money to get to a better result because instead of it wandering around and me prompting it to get back, it's riding down a better starting point. And it's only going to be a couple of hops to get to the right destination. I wonder if you believe in this. I believe strongly in this. Dmitry, do you believe in this? Yes, absolutely. And also like another interesting aspect is that I believe that in order to become effective as a developer, you still have to go through all the steps of figuring out how coding works, how software works in general. You would be Engineering. Yeah, yeah, basically. So you would not be able to just sit around prompting an agent and get all the skills that are needed to be able to be an effective senior dev. Yeah, sure. If you want to see how much demand there is for programming, go look at your open tickets in your project. There's an infinite need for programming. Okay. We are at the point where it's finished in the planning phase. Told me what it thinks is the truth for this feature. I love reading these. A lot of people are like, YOLO, let's get going. But I read through this and this is what it's going to write to disk. Buy Comment Query, it needs to do these things. Here are the decisions we made. This is in a short format, so it's easy to read. I don't have to spend too much time. Once I learn this format, I can scan it really quick. Here are the subtasks that it's going to do as it completes this work. If I spot anything from an implementation perspective that doesn't match, I can correct it. It's going to go do all these things. I get one last chance. Do I want to work or do I want to make a change? I want to work. I hit enter. It leaves plan mode And this feature step in Spec Driven Development is now moving to implementation. I've got a feature. I planned the feature. Now I'm implementing the feature. As it turns out, Agent OS wires in a verification step. If I went and looked at the Shape Spec validation. Oh, it invokes something that invokes those things. I have a way to tell it before you finish a feature, do these things. And for me, it's run the Python formatter, run the Python linter, run the Python type checker, run all the tests. If that all works, you can be finished. If not, get back to work and do all of those, get rid of all the errors. So at the end of this implementation step, it's going to do a step about verification and validation. Dmitry, I'll ask your opinion on this since I'm on the fence. All of those subtasks for a language like Python that doesn't have a compiler, would you run the Type Checker on each subtask or would you only do it at the end? I don't really have an opinion on that. I have like most of my time actively writing Python wasn't the type was before we had before we even had type annotations in the language. I remember my colleague, Andre, actually participating in the group that created the PEP for the usual type annotations PEP. Andre Vlasovski. Oh my gosh, love that guy. Team lead ran PyCharm for a bunch of years. Let's see. There's a point we can bring up about Joonie and Brave Mode on a C app. And I guess the point that this person is making is these things can really succeed. Yep. And this is like a legitimate engineering benefit. What this person is saying is, 'Just go look at my code base and find things.' What you're finding now is with ClawCode that people are sharing these skills and plugins that you can install. So there's an ecosystem where people are getting better together. And there are these skills for code simplification. You point it and it finds all of these things where you've got mistakes beyond what a linter could do. Dmitry, you have a background, obviously not just in static analysis, but probably higher order things. Does it feel like this level of tooling can go and do new kinds of code analysis? That's a really interesting question. So, like what kinds of new code analysis would you even want to have? And what would be the benefit of doing this type of analysis? Because I think even with linters, saw this a lot that a linter can report like 100,000 warnings for your code base. And like if you run the literature on the intelligent code base, it will report a few tens of thousands of warnings. Do you actually care about this in 90% of cases? No. And I think with AI, it's also this danger that it will find something that it considers issues, it will add code to fix them, but in the end, like where are they actually issues? Are you actually sure that these are some of the things that need fixing? Even with security issues, a lot of the things like if you put this into like this really weird context that if you call this function in the way that no one actually would call this in production, then something bad might happen. But like trying to actually, like you would never encounter this security issue in actual use. So there's a lot of, you know, a little bit of trigger happiness around these tools. They will fix things that are not really issues that will they will bloat your code by doing that. And they may fix it in a way that doesn't understand on the other side of the database you have a fix. Yeah, so context is very important and basically you have, like this is very big difference between an agent and a person because you have the context of your project in your head, maybe not completely, maybe in a fuzzy way, and an agent never has it. An agent always looks at some specific pieces of code. It runs a couple of grep commands, it reads a couple of markdown files, and that's all it sees. It just doesn't know what else is there. It has never seen it. Seems the things that you have that you know are there. Indeed. And you can try to fight this by putting all the knowledge in CloudMD or in specification files, but in the end, this, does this scale? Can we actually put all the knowledge there? Sure. As this is going, here's the mode that I do is I sit in our commit tool window and I watch the changes that it's making and I get an early start on the review. So I'll go and look at this and I get the diff viewer that we all know and love in JetBrains IDEs. And I scroll down through changes and I get, okay, well, here's something. Here's where it's doing the work. Let me review it. It wrote a doc string. It returned a type. That seems to be kind of what I want. Oh, it's going to go to this Traverse Comments helper function. Okay. I got to look at that. All the variations. I can keep going to the next file. This next file is a utils file that might have a change in it. Okay. That's the helper function that I was just on. How is it walking my tree? Okay, let's see. It's using structural pattern matching in Python 3.12 or whatever. Thank you because that was one of the rules that I gave you. Use structural pattern matching whenever it's appropriate. So I can go through all of these changes and get an early look for when it's finished. And now it's finished and it gives me a report about what it did. Implementation complete. I implemented the feature using Spec Driven Development. Here's what we did. We wrote these files, an implementation plan for the specification, all the stuff for the specification. And then I modified these files. I added the functions for this kind of query to surface it for the user. So that split between helper and what the user actually uses is a good one because it enhances testability. I can write a unit test on the smaller part of the code. I can reuse the smaller part of the code in other query helpers. It even wrote, I believe it wrote some documentation. Let's see. It ran the quality tests. In my project, I have examples and it wrote examples for me to explain to the user how to use it. So this is all good. It's time for the human in the loop. I might go over and read some of this code. I might run my test runner. That's all good. I find that sometimes it names things in a way that I might not like. And so I use the refactoring tools. And I do refactorrename because I trust the IDE to go find all the places that need to be renamed and to do it the way I want it. I'm moving the work from the non deterministic side to the deterministic side when I put my hands on the wheel as the human in the loop. Another one that is just my way is I want other people to use this, not other agents to use this. And so I spend a lot of time at this point in my Docs, making sure that I've explained this correctly to other users. And this is a process where I really use my IDE. It's got a great Markdown editor. It's got a great jump to symbol. I can run Sphinx, my documentation tool, and preview docs and all of that stuff. All the human stuff. When agent does its work, human's in the loop. Human in the loop time? I got a lot to review. I want the best possible tool. And for that, I sit in the IDE. There's no big context switch. The two things work together really well. Before taking some more questions and then wrapping up, Dmitry, I'm going to ask you the same question I asked Tricia. Do you let the agent write the tests for the code that the agent wrote? I know that Tricia thought that it might not actually be the best idea. In my view, it is actually the right thing to do. Primarily because writing, like you have to make sure that the tests are done in the right way, but writing tests can be fairly cumbersome, can be like there's a lot of setup code that you need and I often find that I don't like, I want to move quickly whenever, example, when I work with my personal projects, I want to move quickly. It's just something that I'm working on myself. So there is no team, there is no like level of standards. And I often just keep writing tests because like I want the feature to work. I know that I'm going to be able to fix it if it breaks and often I just move on. And with the agent, like the agent can actually help me do this step. It can actually do the, like figure out the cases that need to be tested and write tests. If it has good examples, if it can if it has the pattern to fall, then it absolutely can be very valuable in this. I'm going to close with this one line and then take a couple more questions before we get out of here. This seems like a little thing, but it's actually a very big thing. I'm telling the agent to go generate the commit message, do my merge, delete the branch just to save me the work of coming up here, typing something, and doing some stuff here. It's cheaper and faster if I go up here and do it, but I'm so lazy, I'll just do it down here. And I think that's kind of the tip of the iceberg in 2026 about where do we do our work. Do we do it up here in the fast, cheap, deterministic or down here in the clever and unpredictable and expensive side. Okay, so over to some questions. Dmitry, this one is definitely for you. It's about languages and how much agents can help in different kinds of languages. I don't think that the size of the language spec is really what matters here. So there's a lot of open source C plus code. So there's a lot of possibility for agents to have learned the patterns of how C plus plus code is written. There are a lot of also coding standards that explain like the right way to use C plus plus in the context of different projects. So I haven't personally used agents for C plus plus development, but I don't see why this particular thing would be an issue. And also the good thing about C plus plus is that it's a statically typed language, so you can actually, the compiler can verify a lot of things for you and it's not as strict as Rust, but still a lot of the problems that have to do specifically with the way language is used are going to be called by the compiler. And that is a really interesting point. The intersection between the non deterministic side and the deterministic side, how can agents and models have a conversation via the compiler to try and get to the right result the first time instead of write code fail, write code fail. With We will two things, the sad one about fatigue. And this person is correct. Especially the article you wrote this morning. Developers are switching away from building to reviewing. And it's a different skill and the velocity is exhausting, right? Yeah, and it's not just review fatigue. In the end, basically you end up with a system that you don't understand and sure, you can ask the agent how it works and the agent will give you an answer, but knowing something and being able to ask something is very much not the same thing. Okay, let's close with a hopeful one, and I'll give this one to you. What excites you about 2026 and the direction that this is going? So I think the, so this ties back to the issue that we talked about with the connection between the deterministic and non deterministic tools. So in my view, there's a lot of unexplored space in basically infusing the IDE features with AI so that it's not just two separate things. It's not just a separate IDE and a separate terminal, but rather it's features in the terminal that it's features in the IDE that are powered by AI and do the things in a smarter way because the AI is available to them. So for example, if you like a very simple example is the generating commit messages with AI. It was one of the first features we added in the AI system and people still love it. Or for example, when you rename a method, the AI can suggest a name much more meaningfully than the heuristics that we've had before that can do this. And if you start thinking about this, there are like many dozens of these small points where you can make ID features very much smarter by going and asking AI what the right thing to do is. And this is very exciting for me. The biggest risk I see as the basically this loss of understanding. So you end up with a large system that you can't actually, you don't actually know what's inside it. You don't know what the, You don't know how it can break, you don't know how all the assumptions that were made by the agent about how it's supposed to work. And also as the system evolves, as the agent continues developing it, you end up with unexpected changes that end up affecting your users because like the agent did something that if you don't have specs, for example, or if you don't have specs for everything, the agent can very easily just go and change things and then users get confused because the system today works in a different way than the same system worked yesterday because the agent went in and did something that you didn't expect. And these are the big risks that I see. Sure. Before I give my closing comment, we do have posted in the chat. We do have an offer for today for, Intotal J JetBrains AI Ultimate, free for one month, commercial license for up to 10 seats. Look for the message and the link in the chat. Give it a try because it gives you access to so much wonderful machinery related to this topic of keeping the human in the loop. Thank you to everyone. I said at the beginning, we're all learning this together. There are no right answers. There are a lot of wrong answers. And this year we will be exploring all of this. I will close with something that Dmitry just said. It's your users. It's your product for your users. These are tools, but make sure that you are giving your users what you want to give them. Thank you everyone for the live stream today. This was wonderful. Join us again next time for more live streams. Thank you. Thank you, Paul, for inviting me and thanks everyone for the questions. There was a lot of great ones. I would be happy to spend the next hour answering them, but I think we need to wrap up. Indeed. Thank you, everybody. Thank you. Bye.