5s of auto code generation demo
When you're developing in Java, how much of your work truly requires creativity?
Do you sometimes find yourself following the same pattern, just swapping out data specs?
In my experience, about 50% of the work in the early stages of development requires human judgment and insight.
But in many cases, the roles of each class could be boiled down to a simple set of rules—sometimes even just a one-page guide.
To me, this kind of implementation work started to feel repetitive and mechanical,
so I decided to build a tool that automates it using LLMs.
Applying to Layered Architecture
This is my view of layered architecture.
In the diagram above, I believe that once the red components are defined, it may be possible to automate the generation of the blue components.
For example, here’s the system prompt I use to generate test code for the ___domain layer:
system prompt
## Goal
Write a unit test class for the given ___domain implementation class,
You will be provided with:
1. A ___domain-level interface that defines the expected behaviors
2. The implementation class of the interface
3. Fixtures for dependency injection (DI)
4. A set of referenced classes (e.g., data models, enums, utilities)
## Rules
- Write test methods for every method declared in the interface.
- Each test cover at least one success case and optionally edge/failure cases.
- Inject fixture objects when instantiating the implementation class. (Fake... class if possible)
- Use pure Java with JUnit 5.
- Annotate each test with @DisplayName using Korean to describe the test purpose.
- Do not invent helper methods, factories, or mock behavior beyond what is defined.
## Output Format
Return only:
- One complete `.java` class
- No markdown
- No external explanations or comments
## Output Example
class DefaultUserReaderTest {
private UserAuthRepository userAuthRepository;
private UserInfoRepository userInfoRepository;
private DefaultUserReader userReader;
@BeforeEach
void setup() {
userAuthRepository = FakeUserAuthRepository.getInstance();
userInfoRepository = FakeUserInfoRepository.getInstance();
userReader = new DefaultUserReader(userAuthRepository, userInfoRepository);
}
@Test
@DisplayName("read(Long userId) with valid userId should return User")
void readById_whenValidUserId_thenReturnUser() {
var authInfo = userAuthRepository.save(UserAuthEntity.of(
1L,
"username",
"pw",
Set.of(UserRole.NORMAL)
));
var userInfo = userInfoRepository.save(UserInfoEntity.of(
1L,
1L,
Location.EUNPYUNG,
UserType.USER
));
var user = userReader.read(authInfo.getId());
assertNotNull(user);
assertEquals(user.getId(), authInfo.getId());
assertEquals(user.getNickname(), authInfo.getUsername());
assertEquals(user.getUserType(), userInfo.getUserType());
assertEquals(user.getLocation(), userInfo.getLocation());
assertTrue(user.getRoles().contains(UserRole.NORMAL));
}
@Test
@DisplayName("read(Long userId) with invalid userId should return null")
void readById_whenInvalidUserId_thenReturnNull() {
var userNoExist = userReader.read(Long.MAX_VALUE);
assertNull(userNoExist);
}
}
While it doesn't cover every case, I was able to generate code quite effectively by using a system prompt like the one above and referencing the appropriate implementation sources.
(Of course, minor adjustments like fixing import paths were needed—but it worked very well as a first draft.)
Result Example (Generated via LLM Automation)
public class DefaultUserReaderTest {
private UserAuthRepository userAuthRepository;
private UserInfoRepository userInfoRepository;
private DefaultUserReader userReader;
@BeforeEach
void setup() {
userAuthRepository = FakeUserAuthRepository.getInstance();
userInfoRepository = FakeUserInfoRepository.getInstance();
userReader = new DefaultUserReader(userAuthRepository, userInfoRepository);
}
@Test
@DisplayName("read(Long userId) with valid userId should return User")
void readById_whenValidUserId_thenReturnUser() {
var authInfo = userAuthRepository.save(UserAuthEntity.of(
1L,
"username",
"pw",
Set.of(UserRole.NORMAL)
));
var userInfo = userInfoRepository.save(UserInfoEntity.of(
1L,
1L,
Location.EUNPYUNG,
UserType.USER
));
var user = userReader.read(authInfo.getId());
assertNotNull(user);
assertEquals(user.getId(), authInfo.getId());
assertEquals(user.getNickname(), authInfo.getUsername());
assertEquals(user.getUserType(), userInfo.getUserType());
assertEquals(user.getLocation(), userInfo.getLocation());
assertTrue(user.getRoles().contains(UserRole.NORMAL));
}
@Test
@DisplayName("read(Long userId) with invalid userId should return null")
void readById_whenInvalidUserId_thenReturnNull() {
var userNoExist = userReader.read(Long.MAX_VALUE);
assertNull(userNoExist);
}
Turning the Idea into a Tool
To turn this idea into a working tool, I found that three key components were essential:
- Allowing users to define their own patterns
- Dynamically collecting code references at runtime
- Sending prompts to GPT-4o
Defining Custom Patterns
It’s clear that not every developer will agree with the rules in the system prompt example above.
Everyone has their own way of writing code, so allowing full customization of the rules is essential.
The image above shows the Pattern Definition Panel in my plugin.
Users can define system prompts for each code generation task using the following components:
- Goal: The objective of the task
- Rules: The generation logic
- Output Format: Desired output structure
- Example (Shot): Few-shot code examples
In addition, users can specify what information should be included in the prompt based on annotation types.
These configurations are saved under the .idea/
directory as XML files,
making them reusable across sessions.
Referencing Code Automatically
When writing a test or implementation class, developers often reference 5 to 20 other classes within a single file—often without even realizing it.
To support automation, there needed to be a mechanism to automatically collect and accurately provide those references at runtime.
The image above shows an example using the @JavaFactory
annotation,
which defines what role a class plays and what other classes it is related to within a given API context.
This annotation is designed to support rules like the following:
"When generating a test for a target API, include all referenced APIs and their implementation classes in the prompt."
With this approach, the LLM can receive complete and precise reference sources required for accurate code generation.
For more details on how reference collection works, see the guide below:
Sending Prompts to GPT-4o
After testing several LLM models, GPT-4o consistently produced the most stable and reliable code.
Lower-tier models often struggled with code structure and syntax, and many failed to produce even usable first drafts.
Below is an actual result of generating an implementation and test class for a ___domain-level API:
Insights
Here are some personal takeaways from building this project.
-
Tasks with shallow reasoning—code or otherwise—will eventually be automated.
In many types of work, if the pattern for a task can be defined within just 2–3 pages of documentation,
then GPT-4o is already capable of reducing the burden through partial automation.
(Of course, human review is still necessary—for now.)
-
LLMs love interfaces.
This became especially clear when defining reference scopes for code generation.
If you base your prompt on a concrete implementation, it can bring in an unlimited cascade of dependencies.
But when using interfaces, the boundary is clear and well-defined—
making it easier for LLMs to reason within a stable, constrained scope.
- Business process automation pays off..** (Just my opinion.)
Links
If you find the project useful or interesting, a star on GitHub would be appreciated.
You're welcome to download it and try it out.
Also, I realize that the reference collection mechanism might still feel a bit complex from the user's perspective.
If you have any ideas or suggestions to make it better, I'd love to hear them.
Top comments (0)