Practical Considerations for Advancing AI Collaboration in Software Development

6 hours ago 1

TL;DR

Human-in-the-loop is essential; AI offers probability, not certainty.
AI excels at word-smithing, so spend more time on documentation and context.
Leverage diverse AI models for varied research, improvements, and analysis.
Be wary of deskilling: if AI makes a task trivial, agents may soon replace it.
You should feel like you are testing the boundaries of what AI is capable of for at least some tasks.

The Problem

AI’s proficiency in handling routine coding allows human engineers to dedicate more time to strategic activities such as system design, architectural planning, intricate requirement elicitation, and the rigorous evaluation of application performance across multifaceted metrics.

Tools often amplify underlying behaviours and failures

— Rob Lambert

Value is increasingly found not in rote knowledge, which AI can often provide, but in the capacity to frame complex problems effectively for AI, critically evaluate its probabilistic outputs, and innovatively integrate AI’s capabilities into novel solutions.

Developing with AI is not merely about adopting a new tool but about learning to collaborate with intelligence that operates on different principles, sometimes without profound contextual understanding, yet capable of processing and synthesising information at a scale and speed that surpasses human capability. However, this power comes with new challenges: ensuring the reliability of probabilistically generated outputs, managing the 'black box' nature of some AI reasoning, and navigating the ethical and IP landscapes of AI-generated content.

Effective collaboration requires developers to master prompt engineering, the art of providing precise context, critically verifying outputs, and understanding the AI’s inherent limitations to guide it effectively. This is akin to mentoring an exceptionally capable but occasionally erratic junior partner.

Moreover, "disposable" or "ephemeral" AI-generated software carries profound and often unexamined implications for software economics, business strategy, and the fundamental definition of software assets. If software can be rapidly generated for specific, transient uses and then discarded because regeneration is trivial, the traditional emphasis on long-term maintainability and total cost of ownership (TCO) may shift, demanding new economic models and strategic thinking around software value.

Does AI Save Time?

Whether AI saves time or not depends on your goal. It saves the most time when you run into a blocker (e.g. you don’t know how to start, you need to use a tool or library you don’t know how to use or don’t know exists, or a solution you hadn’t considered). Many developers have described as feeling like a 10x developer. However, it is impossible to say how much faster this makes you, given that you might otherwise never have attempted the task or that it would have required an unquantifiable amount of research.

How does your goal change the time saved?

Goal Indicative wall-clock change Why It Feels Valuable

Zero to one	Half the time	Less time spent on tasks that aren’t so important
Time-boxed tasks	More consistent delivery on time	Higher quality and completeness of tasks
Leading-edge R&D	100 % longer first iteration	AI enables a far more complete MVP and facilitates exploration of novel solution spaces

Zero-to-One

An AI-generated solution can be better than nothing, and AI can produce something decent in half the time.

Zero to One work is useful for context pieces not part of the deliverable product. We use AI for:

Early-stage prototyping
Generating a realistic stub instead of mocking a service or library
Building a test harness for a library you don’t yet know how to use

Time-Boxed Tasks

We time-box many tasks during planning, estimating roughly how long each is worth. AI enables higher quality within that time and makes on-time delivery more likely.

Leading-edge R&D

When developing a leading-edge solution, spend more time on the first iteration; AI will help you build a far more complete solution, shortening subsequent iterations and improving the final result. AI enables a far more complete MVP and exploration of novel solution spaces.

Effort Profiles

Making Your Life Easier

If AI can generate the code with minimal effort, autonomous agents will likely soon handle that task. You’re not adding much value and may be deskilling yourself, so reserve this approach for peripheral implementations that don’t merit significant time.

20 % Effort

You’ll often feel most productive when AI does ~80 % of the heavy lifting, and you provide the remaining 20 %. Many tasks fall into this sweet spot.

Challenging Yourself

To learn and push boundaries, you want to feel like you are making at least 50% of the effort. If AI easily gives you what you want, raise your expectations and try something harder until it breaks. You want to test what AI can do, so you find different ways it breaks multiple times per day. This mirrors the process of understanding edge cases and failure modes in any complex system, but with AI, these boundaries can be more fluid and surprising. You will be learning and ensuring you make the most of what AI can do today.

Multiple AIs > One AI

Using multiple AIs is more useful than relying on one: each model has different strengths and weaknesses and surfaces different ideas. I regularly use GitHub Copilot, OpenAI o3, Gemini 2.5 Pro, and Claude 3.7 Sonnet. Deep research across the latter three often uncovers insights the others miss.

Optimising Code for AI

Generative AI excels at word-smithing, so invest more in documentation. Aim for a comparable conceptual density: documentation, tests, and production code should ideally reflect a similar depth of understanding. Documentation will be more verbose, code concise, and tests clear and exhaustive. You might aim for ~10 k reviewed words of documentation before significant coding. This detailed context, including clear explanations of intent and non-obvious constraints, is crucial for LLMs to avoid 'hallucinations' and generate relevant, high-quality code. Consider adopting practices like maintaining a 'living' design document that AI can reference.

As the project grows, you will start hitting token limits. To avoid this happening, you want to reduce token sizes and keep documentation concise. If the AI can easily generate the documentation, there is a good chance it doesn’t add any human-level differentiating value and may not significantly improve the AI’s understanding of complex tasks. Concise comments are often best, given that you will have extensive documentation and tests.

AI in the SDLC

Using AI as early as possible in the requirements capturing stage can help ensure more cases are covered and considered from the start. You can use Deep Research on multiple topics, on multiple AI and combine them. For example, using AI to analyse user stories from diverse sources can help identify conflicting requirements or unstated assumptions early on. You can use tools like NotebookLM to create audio summaries or ask an AI to distil the key insights. Diffblue Cover generates tests, JUnit 5 tags them; Claude reviews coverage gaps. Human oversight remains critical here, not just for what AI generates, but to critically assess what it misses, especially in complex or novel scenarios where training data might be sparse. However, it can be worth trying to "mine for diamonds" and look for the details the AI missed in its summaries, which might be important.

AI for Analysis

AI can be helpful in analysis, but it often produces indicative values based on vibe rather than analysis. You could verify values AI produces, or they could be just sample data. AI is very poor at telling you whether it is using sample data or realistic data. For example, if you ask it to tell you whether the values given are varied or not, it can say nonsense values were verified, and accurate values weren’t verified. This highlights the importance of not just verifying AI’s outputs but also understanding the limitations of its 'explainability'. An AI can generate a plausible justification for an incorrect answer.

I have found AI especially useful in evaluating regression tests. We extensively use regression tests to detect unintended consequences of changes, but verifying lengthy YAML test data manually is error-prone. I have found using AI effective at finding issues I missed, and using multiple AI can pick up things other AIs missed, so I use multiple AIs to verify the results.

While AI analysis presents an opportunity for innovation, it also introduces the risk of "AI-induced scope creep" if the myriad insights generated by AI are pursued without rigorous prioritisation, a task that still heavily relies on human judgment and strategic alignment.

When evaluating AI analysis, be aware that, like humans, AI can determine the answer and then backfill it with a justification. This means it can come up with a plausible explanation for why something is the answer, even if you give it a wrong answer for it to justify.

One way to mitigate this is to ask it to give a step by step plan how to do a task, without asking for the solution, and then ask it to give the solution. In my opinion, this often leads to a better solution, and empirically, it can lead to a more robust solution as the AI commits more 'thought' or processing across multiple interactions.

Considerations When Using AI

Token Limits

The models do well up to ~60 k input tokens and 2.5 k output tokens. Above that, they seem to degrade. In particular, OpenAI o3 struggles over ~ 250 lines (roughly 10 tokens per line). Gemini 2.5 Pro seems to have higher limits, but often just produces longer code and text with no obvious increase in value. i.e. more fluff. For these reasons, I favour using Gemini in earlier stages to create volume and broad coverage, but I prefer using o3 for concise refinement of requirements and code.

As models evolve, these specific limits will change, but the principle of providing concise, high-signal context to avoid overwhelming the model’s context window will likely remain.

Maintaining a Decision Log

I have found it helpful to maintain a decision log in asciidoc when using multiple AIs, as they can often come up with different solutions. Esp ones you have decided not to follow for whatever reason. You want to find the suggestions they haven’t already given you.

This log also serves as valuable context for future AI interactions or onboarding new team members to AI-assisted parts of the project.

Consistent Styling

I favour using common text with lower token ids as these have higher sample sizes in the training data, reducing entropy. However, AI has a tendency to produce Unicode characters and uncommon words in an unstable manner, randomly and inconsistently. Common English words seem less likely to be replaced in updates.

While you can give the AI style guides, it is often ignored. e.g. in my style guide it specifcially says don’t use an en-dash, use an hyphen instead and when I format the style guide with AI, all the ones I have tried uses an en-dash in the phrase en-dash itself, despite the explicit instruction. This underscores that while AI can assist, current models are not consistently reliable for enforcing fine-grained stylistic rules without careful output checking and potential post-processing.

Keeping Documentation, Tests, and Code in Sync

AI can help ensure all three copies of the requirements are in sync, and it is worth maintaining them all to ensure this happens. It can help avoid unintended changes, as you see a functional alteration in three distinct forms: English, code, and tests.

This iterative refinement across different representations can also help identify ambiguities or incompleteness in requirements that might have been missed if only one form (e.g., only code) was considered.

Structuring for Advanced Human-AI Collaboration

Context is King: Managing the context provided to AI is paramount for large projects.

Modular Design: Breaking down complex problems and codebases into smaller, well-defined modules with clear interfaces can help provide AI tools with a more focused context.

Living Documentation & Knowledge Graphs: Maintaining up-to-date, AI-accessible documentation can provide persistent context. I use Asciidoc with Mermaid diagrams.

Iterative Context Refinement with AI: As discussed in the analysis section, prompt the AI to summarise or update its understanding of the evolving project context, which you can verify and use in future interactions.

Human-in-the-Loop for Critical Decisions: While AI can automate and suggest, it’s crucial to establish clear checkpoints where human expertise and sign-off are mandatory, especially for architectural changes, security-sensitive code, or features with ethical implications.

Embracing a Culture of Critical Evaluation: Foster a team culture where AI outputs are seen as a starting point or a hypothesis to be tested and validated, not an infallible oracle. Encourage developers to probe for AI limitations actively, potential biases in its suggestions, and edge cases it might not consider.

About the author

As the CEO of Chronicle Software, Peter Lawrey leads the development of cutting-edge, low-latency solutions trusted by 8 out of the top 11 global investment banks. With decades of experience in the financial technology sector, he specialises in delivering ultra-efficient enabling technology that empowers businesses to handle massive volumes of data with unparalleled speed and reliability. Peter’s deep technical expertise and passion for sharing knowledge have established him as a thought leader and mentor in the Java and FinTech communities. Follow Peter on BlueSky or Mastodon.

Collaboration

For this blog article, I used Deep Research and suggested "minimal improvements" by both OpenAI o3 and Google Gemini 2.5 Pro. I also used Grammarly to do a final check of my English.

Read Entire Article