How to Test a Complex Full-XHS Digital Journal

A few days ago, I ran an experiment with an AI-powered testing agent that lets you write test cases in plain English instead of code. I opened its natural language interface and typed four simple sentences to test google.com:

1. Go to google.com2. There should be a long input field on the page3. Type something and verify suggestions appear in a dropdown4. The input field should not have any placeholder text

KaneAI's natural language test authoring interface showing a text input field with the prompt "What do you want to test today?

A real browser opened Google, found the search bar, typed a query, checked for the autocomplete dropdown, and verified there was no placeholder, all from those four lines.

No Playwright selectors. No page.getByRole(). No CSS class names. Just plain English describing what a user would do.

That made me curious: what happens if I try this on something actually complex? So I tested my own full-stack app's auth endpoint the same way:

Send a GET request to /api/auth/status without any session . Verify it returns 401.

Within 15 seconds, done.

The same test took me an hour to set up manually, building a session helper, separating my Express app from the server startup, seeding a test database, just so I could write five lines of Supertest code.

I ended up testing my entire application both ways: the traditional manual approach and the AI-assisted approach. Same endpoints, same assertions, completely different experience. This article is about what I learned.

But before I get into how I tested it, let's talk about what actually matters: the testing concepts themselves. Because no approach, manual or automated, will save you time or energy if you don't understand what you're testing and why.

What we'll cover:

Prerequisites
How Testing Actually Works in Full-Stack Apps
What Made This Hard
The Manual Approach
The AI-Assisted Approach
When to Use Which Approach
Conclusion

Prerequisites

To get the most out of this article, you should have a basic understanding of JavaScript and Node.js, along with some familiarity with React and Express.

Experience writing simple tests with any JavaScript testing framework like Jest or Vitest will be helpful, though I'll explain the core testing concepts as we go.

You should also have Node.js installed on your machine. If you want to follow along with the manual testing examples, you'll need Vitest (or Jest) for unit and API tests, Supertest for HTTP endpoint testing, and Playwright for end-to-end browser tests. For the AI-assisted approach, I used KaneAI by LambdaTest, which you can explore through their platform.

How Testing Actually Works in Full-Stack Apps

If you've only tested isolated React components or written a few unit tests for utility functions, full-stack testing feels like a different sport. The concepts are the same, but the complexity jumps dramatically. Here's what you actually need to know.

Three Layers, Three Different Jobs

Every full-stack application has three natural testing layers, and trying to cover everything with just one of them leads to either fragile tests or blind spots.

Unit Tests

Unit tests check that individual functions return the right output for a given input. They don't touch the database, the network, or the browser.

They run in milliseconds. If your function takes a string and returns a formatted slug, a unit test calls that function and checks the result. That's it.

it("converts a title to a slug", () => {   expect(slugify("My First Post")).toBe("my-first-post");});

API Tests

API tests check that your backend endpoints return the right responses. They send real HTTP requests to your Express (or Next.js) app and verify the status codes, response shapes, and error handling.

If your /api/auth/statusendpoint should return 401 without a session , an API test confirms that contract.

it("returns 401 without session ", async () => {   const res = await request(app).get("/api/auth/status");  expect(res.status).toBe(401);});

End-to-end (E2E) Tests

End-to-end (E2E) tests open a real browser and interact with your app the way a user would. They click buttons, fill forms, navigate pages, and check that the right things appear on screen.

If your login flow should redirect to a dashboard after authentication, an E2E test walks through that entire journey.

test("login redirects to dashboard", async ({  page }) => {   await page.goto("/");  await page.getByTestId("username-input").fill("ajay");  await page.getByTestId("password-input").fill("password123");  await page.getByTestId("login-button").click();  await expect(page.getByTestId("dashboard")).toBeVisible();});

The Pain Points Nobody Warns You About

Tutorials make all three layers look straightforward. In practice, each one has a trap.

First, we have the session problem. Most real apps have authentication. To test any authenticated endpoint, you need a valid session.

That means you need a helper function that logs in a test user, extracts the session from the Set-Cookieheader, and returns it for future requests.

This sounds simple. It took me an hour to build one that actually works with express-session. Every project reinvents this wheel.

Then we have the app vs. server separation issue. Supertest (the most popular API testing library) needs to import your Express app without starting a real server.

If your app.tsfile has app.listen(3000)at the bottom, Supertest will try to bind to port 3000, and your tests will crash when running in parallel.

You have to separate your app definition from the server startup. app.tsexports the Express instance, server.tscalls .listen(). It's a three-minute refactor, but nobody tells you about it until your tests fail.

You also have the SSE and real-time nightmare. If your app uses Server-Sent Events (SSE) or WebSockets, you're testing time-dependent behavior.

You open a connection, trigger an action, and wait for an event to arrive. If the event takes too long, your test times out. If you don't set a timeout, the test hangs forever. You end up writing 30 lines of Promise wrappers, timeout handlers, and cleanup logic for a single assertion.

Finally, there's the selector fragility trap. E2E tests that use CSS selectors (.btn-primary, .card-title) break every time you rename a class.

The fix is using data-testidattributes, stable identifiers that exist solely for testing and don't change during refactors. But retrofitting them into an existing app means touching dozens of components.

Schema Validation: The Hidden Time Sink

Here's something nobody tells you about API testing. Writing the assertion for "does this endpoint return 200" takes one line.

Writing assertions that verify the shape of the response, every field exists, every field has the right type, every enum value is valid, takes 15 to 20 lines per endpoint. Multiply that across a dozen endpoints and you're spending hours writing boilerplate like:

expect(res.body[0]).toHaveProperty("title");expect(typeof res.body[0].title).toBe("string");expect(res.body[0]).toHaveProperty("status");expect(["open", "closed", "merged"]).toContain(res.body[0].status);

It's important work, though: schema validation catches real bugs when your backend changes a response shape. But the repetitiveness is what makes it a good candidate for automation, which I'll get to later.

These aren't edge cases. These are the everyday realities of testing a full-stack app. Knowing them upfront saves you from the "why is this so much harder than the tutorial??" frustration.

What Made This Hard

A few months ago, I wrote a freeCodeCamp article about testing JavaScript apps from unit tests to AI-augmented QA. That article covered testing fundamentals with clean, simple examples.

After publishing it, I kept thinking: what happens when you apply all of this to something messy?

I had the perfect candidate. Creoper(code name) is an AI-powered project management tool I built that connects GitHub with Discord.

Teams can monitor repositories, track pull requests, and query project status using natural language, all without leaving their chat platform.

I built it across two internal hackathons at CreoWis, and it won both times. What started as a simple GitHub-Discord automation bot evolved into a full product with five interconnected components:

Architecture diagram of Creoper showing six interconnected components: React dashboard, Express backend, Discord bot, PostgreSQL database, GitHub webhook handlers, and LLM layer.

It has a React dashboard with GitHub OAuth. An Express backend with REST APIs and SSE. A Discord bot that processes natural language through an LLM intent detection layer. PostgreSQL with Prisma. GitHub webhook handlers.

But here's the thing: despite winning two hackathons, Creoper had zero test cases. The app wasn't even deployed yet. I'd been stuck on Railway monorepo deployment issues for weeks.

So I was staring at a system that had every real-world testing challenge I'd just written about, auth flows, real-time events, multiple integration points, complex business logic, and no safety net at all.

I decided to test it two different ways and document what actually happened. If you want to explore the full project, I've written two separate blogs about how I built it.

The Manual Approach

I mapped pure logic components like the intent parser and embed builder to unit tests, since they deal with straightforward input-output behavior. I assigned Express endpoints to API tests using Supertest, which let me send real HTTP requests and verify response codes and shapes.

I planned to cover the React dashboard with end-to-end tests using Playwright, simulating actual user interactions in a real browser. As for Discord bot interactions and webhook delivery, those couldn't be automated reliably yet, so I documented them and tested them manually.

Here's what each layer looked like in practice.

Unit Tests: The Easy Win

Creoper has a function that classifies Discord messages into structured intents. If someone types "list prs," it should return LIST_PRSwith a high confidence score.

If the message is gibberish, it should return UNKNOWNwith zero confidence. The confidence score matters because anything below a threshold triggers a safe fallback instead of executing an action.

it("detects LIST_PRS intent", () => {   const result = parseIntent("list prs");  expect(result.action).toBe("LIST_PRS");  expect(result.confidence).toBeGreaterThan(0.8);});it("returns low confidence when repo name is missing", () => {   const result = parseIntent("set active repo");  expect(result.confidence).toBeLessThan(0.8);});

Notice these aren't just "does it work"checks. They're testing a safety mechanism, the threshold between executing an action and falling back.

These are exactly the kinds of tests that need to be written by hand because you have to understand the business logic behind the numbers.

I also tested the Discord embed builder the same way. Give it push event data, check that the formatted message contains the right repo name, author, branch, and commit messages.

Pure input, pure output, no external dependencies. Unit tests ran in milliseconds and caught edge cases like empty commit arrays immediately.

API Tests: Where the Friction Starts

Testing the Express endpoints required the infrastructure work I described earlier. I separated app.tsfrom server.ts, built the createTestSession()helper, and set up an in-memory test database so tests wouldn't touch real data.

it("returns 401 without session ", async () => {   const res = await request(app).get("/api/auth/status");  expect(res.status).toBe(401);  expect(res.body).toHaveProperty("error");});it("returns user data with valid session", async () => {   const  = await createTestSession();  const res = await request(app)    .get("/api/auth/status")    .set("Cookie", );  expect(res.status).toBe(200);  expect(res.body).toHaveProperty("username");  expect(res.body).not.toHaveProperty("accessToken");});

Five lines of test code, one hour of infrastructure to make those five lines work.

Then I had to repeat this pattern across every endpoint: repos, pull requests, issues, active repo configuration, each with happy path, error cases, and the tedious schema validation I mentioned earlier.

The SSE test was the worst. I needed a Promise wrapper, an EventSource connection, a timeout handler, an onopencallback to trigger the change, an event listener to catch the response, and cleanup for both the connection and the server. About 30 lines for a single assertion, and it took three attempts to get the timing right.

E2E Tests: The Full Journey

Playwright's E2E tests were actually pleasant to write once I added data-testidattributes to the React components. The login flow, note creation, editing, and deletion all followed a predictable pattern.

test("login and create a note", async ({  page }) => {   await page.goto("/");  await page.getByTestId("username-input").fill("ajay");  await page.getByTestId("password-input").fill("password123");  await page.getByTestId("login-button").click();  await expect(page.getByTestId("username-display")).toContainText("ajay");});

The real cost wasn't writing the tests — it was maintaining them. Midway through development, I renamed a CSS class from .repo-list-itemto .repository-card. Two Playwright tests broke immediately. I found the references, updated them, re-ran. Ten minutes for a CSS rename. I can see this becoming death-by-a-thousand-cuts as the UI evolves.

The AI-Assisted Approach

Now here's the same project, tested with a fundamentally different workflow.

Instead of writing test code, you describe what you want to test in natural language. An AI agent interprets your intent, interacts with the actual application, generates assertions, and produces exportable test code.

The tool I used is KaneAI, a GenAI-native testing agent that covers web UIs, APIs, and mobile apps through natural language test authoring with real browser execution. That's the only background you need. Let me show you the workflow.

API Testing: Describing Instead of Coding

Instead of writing Supertest code, I opened the slash command menu, selected API, and pasted a curl command:

curl -X GET http://localhost:3000/api/auth/status

It fired the request through the tunnel, showed the 401 response, and I added it to my test steps. For the authenticated version, I pasted the same command with a session from DevTools. No createTestSession()helper. No test database. No app separation.

For the repository endpoints, I described the flow in plain English:

1. Set active repository to "atechajay/no-javascript" via POST to /api/repos/active2. Verify the response confirms the repository is active3. Fetch open pull requests via GET to /api/repos/pulls4. Verify each item has title, author, url, and status fields5. Try an invalid repository name, verify 400 error

It generated assertions for the happy path and added schema validation I didn't ask for checking that titleis a string, labelsis an array, statusis one of the expected values. That's the tedious work that ate up hours in the manual approach, generated in seconds.

E2E Testing: Plain English, Real Browser

For the React dashboard, instead of Playwright selectors, I described:

1. Navigate to localhost:30012. Click "Go to Dashboard"3. Verify redirect to GitHub OAuth4. After auth, verify the dashboard loads5. Verify the username appears in the sidebar

It executed each step in a real cloud browser connected to my localhost. No page.getByRole(), no page.waitForURL(), no selector debugging.

After each test, I exported the generated code. It came with wait conditions and assertion logic baked in.

It wasn't perfect copy-paste: I updated environment variables, adjusted base URLs, and fixed a few field name mismatches where it expected pullRequestUrlinstead of my actual urlfield. But it gave me roughly 70–80% of the foundation.

The Feature That Surprised Me

Midway through testing, I renamed that CSS class from .repo-list-itemto .repository-card. My manual Playwright tests broke immediately.

But the AI tool's auto-healing detected the selector change, found the closest matching element based on the test's original intent, and continued the test with a review flag. No code changes needed.

For a rapidly changing MVP where class names are still in flux, that alone saved significant maintenance time.

When to Use Which Approach

After testing the same project both ways, here's my honest take.

Write tests by hand when you're testing business logic that requires domain understanding. For Creoper's intent parser, I needed to think about what "low confidence" means in the context of the application's safety mechanism.

An AI tool can generate assertions, but it can't understand why a confidence score of 0.5 should trigger a fallback instead of an action. Pure logic with meaningful edge cases is where hand-written tests earn their keep.

You should also write tests by hand when they need to run in CI without external dependencies. Vitest tests with mocked dependencies are self-contained. They run in milliseconds and don't need a tunnel, a cloud browser, or a third-party account.

Hand-written tests are also best when the team needs to maintain them. Hand-written tests are transparent. Generated code, even when exported, can feel opaque to someone who wasn't there when it was authored.

Reach for AI-assisted testing, on the other hand, when your UI changes frequently. For an MVP where CSS classes and component structure are still in flux, auto-healing prevents the "my tests broke because I renamed a div" problem. You spend less time fixing selectors and more time shipping features.

AI-assisted testing is also helpful when you need coverage fast and plan to refine later. The 70–80% foundation is a real boost when you're the only developer and you need coverage now. You can always hand-tune the exported code later.

Never rely solely on either approach to understand your system. No tool knows that an SSE connection drops after 30 seconds if the heartbeat isn't configured. No tool understands that a Discord bot should never execute a write action when confidence is below 0.8. No tool realizes the OAuth callback silently fails if the redirect_uridoesn't match precisely.

The strategy relies on you knowing which endpoints are crucial, identifying dangerous edge cases, and understanding what should occur during failures. The tool simply accelerates how quickly you can articulate and implement that strategy.

Conclusion

My Full-stack app won two hackathons. But without tests, it was a house of cards. One renamed CSS class, one changed API response, and the whole system could silently break.

Testing it both ways taught me that the manual vs AI question is the wrong question. The real skill is matching the approach to the problem.

Write unit tests by hand for business logic. Use AI-assisted testing when you're drowning in repetitive schema validation across a dozen endpoints.

Use auto-healing for E2E tests on a fast-changing UI. And for the things you can't automate yet, like Discord bot interactions or webhook delivery, document them and test them manually until you can.

If you're building something complex and thinking "I'll add tests after I deploy", flip that. Test what you can now. Document what you can't. When deployment day comes, you'll ship with confidence instead of anxiety.

Before We End

I hope you found this article insightful. I’m Ajay Yadav, a software developer and content creator.

You can connect with me on:

Twitter/X and LinkedIn, where I share insights to help you improve 0.01% each day.
Check out my GitHub for more projects.
Check out my Medium page for more blogs.
I also run a YouTube Channel where I share content about careers, software engineering, and technical writing.

See you in the next article — until then, keep learning!