Practical Guide · Building a Memory-Powered AI Writing Partner (Part 1): Multi-Agent Architecture Evolution

Shengxu included in AI DevOps

2026-01-25 About 2100 words 10 minutes

Contents

When writing a long novel, the most painful part isn’t “not being able to write”—it’s “forgetting what you’ve already written.” Did I set up that foreshadowing properly? Was that character already injured in the last chapter? When exactly was that world-building rule established? Once your manuscript crosses the hundreds-of-thousands-of-words mark, relying solely on your brain and scattered notes quickly becomes unmanageable.

FantasyNovelAgent grew out of this exact need. It started as a simple Python script, then evolved to include dynamic memory and auto-archiving, later added multi-device sync, and is now taking its first steps toward a front-end/back-end separation with cloud-native storage. This article retraces that evolution path and explains the key trade-offs, offering a reference for similar projects.

If you’d like to try the project yourself, here’s an online demo: demo online (feel free to test it). To prevent abuse and cost leakage, the demo requires you to fill in your own LLM API Key in the settings before it actually calls the model.

Demo online: Model configuration and writing workspace

1. Core Features: How AI Writes Like a Partner

Before diving into the technical architecture, let’s look at what it can do. FantasyNovelAgent is not a simple “continuation tool”; it’s more like a “writing studio” staffed by multiple experts.

1.1 Brainstorming

When you’re stuck, click “Auto Brainstorm.” The system will analyze the plot direction of the last 10 chapters, unresolved plot threads (future plans), and world-building settings to provide 3 distinct plot branches. You can choose one or blend their ideas.

1.2 Writing & Polishing

Muse: Handles the “skeleton.” Based on your chosen outline, it quickly generates a ~2000-word first draft, focusing on plot progression and foreshadowing.
Stylist: Handles the “flesh.” It deeply polishes the draft, transforming a bland “he threw a punch” into “the wind howled as his fist shot forward, carrying the force of a thunderbolt…”, ensuring the style matches the “modern xianxia power fantasy” tone.

1.3 Active Memory

This is the project’s killer feature. You don’t need to manually maintain “character sheets” or “inventories.”

The Archivist works silently in the background. After you finish a chapter, it automatically analyzes the text: “The protagonist obtained the ‘Azure Cloud Sword’.” “‘Li Si’ was mortally wounded and died.”
This information is extracted as structured data and stored in the SQLite database. When writing the next chapter, the AI won’t confuse whether the protagonist is holding a sword or a knife.

Memory bank and recap: Context sources during writing

graph TD
    User[User Input] --> Router{Intent Router}
    Router -->|Writing| Muse[Muse]
    Router -->|Polishing| Stylist[Stylist]
    Router -->|Checking| Guard[Guard]

    Context[(Context Builder)] --> Muse
    Context --> Stylist

    Muse --> Result[Generated Content]
    Result --> Archivist[Archivist]
    Archivist -->|Extract & Update| Memory[(Memory/DB)]
    Memory --> Context

graph TD
    User[User Input] --> Router{Intent Router}
    Router -->|Writing| Muse[Muse]
    Router -->|Polishing| Stylist[Stylist]
    Router -->|Checking| Guard[Guard]

    Context[(Context Builder)] --> Muse
    Context --> Stylist

    Muse --> Result[Generated Content]
    Result --> Archivist[Archivist]
    Archivist -->|Extract & Update| Memory[(Memory/DB)]
    Memory --> Context

graph TD
    User[User Input] --> Router{Intent Router}
    Router -->|Writing| Muse[Muse]
    Router -->|Polishing| Stylist[Stylist]
    Router -->|Checking| Guard[Guard]

    Context[(Context Builder)] --> Muse
    Context --> Stylist

    Muse --> Result[Generated Content]
    Result --> Archivist[Archivist]
    Archivist -->|Extract & Update| Memory[(Memory/DB)]
    Memory --> Context

1.4 Logic Guard

Want the protagonist to suddenly learn a forbidden technique from a rival sect? The Guard will immediately warn you: “Detected setting conflict: This forbidden technique requires ‘Demonic Bloodline,’ but the protagonist currently has a ‘Pure Yang Body’.”

1.5 LLM Strategy

To achieve the best results, I didn’t bind to a single model. Instead, I adopted a “best tool for the job” strategy:

Task Type	Recommended Model	Reason
Logic Check / Complex Reasoning	DeepSeek R1 / OpenAI o1	These “reasoning” models perform long chain-of-thought (CoT) thinking before outputting, making them excellent for finding plot holes or designing complex intellectual battles.
Drafting / Polishing	Claude 3.5 Sonnet / GPT-4o	Excellent prose, natural language flow, especially good at environmental descriptions and emotional rendering.
Memory Extraction / Summarization	Gemini Flash / DeepSeek V3	Fast, low cost, large context window, ideal for processing large amounts of text for analysis.

LLM Management: Single model vs. multiple model switching

2. Architecture Evolution: From Files to Database

In the project’s early days, to quickly validate the idea, I used the simplest “file system storage” approach.

Chapters: Each chapter was a .txt file.
Memory: Character cards, world settings, and plot outlines were stored as character_db.json, world_settings.md, etc.
Advantages: Extremely fast development, Git-friendly version control, human-readable.
Disadvantages: As the number of chapters grew (e.g., to chapter 100), the data/ directory became cluttered with hundreds of small files. File I/O became frequent, and complex queries (like “search all chapters mentioning ‘Azure Cloud Sword’”) were difficult.

3. Feature Completion and Automation

As the core logic solidified, I introduced more engineering features:

Intent Router: Routes user commands in natural language (“Write a fight scene for me” vs. “Check this chapter for bugs”) to the appropriate Agent.
Usage Tracking: Integrated token consumption statistics for clear cost visibility.
Auto-Archiving: When the user clicks “Save,” the system not only writes the file but also triggers a series of background tasks—updating the summary chain, checking future plan completion, etc.

4. Deployment: Putting AI on a Raspberry Pi

To enable writing anytime, anywhere, I deployed the project on my home Raspberry Pi.

Tunneling: Used Cloudflare Tunnel for secure access via a custom domain without needing a public IP.
Automated Ops: Wrote systemd service scripts for auto-start on boot and process monitoring.
One-Click Deploy: Developed a deploy.sh script. After writing code on my Mac, a single command automatically handles Git commit, code sync (Rsync), and remote service restart.

5. Key Turning Point: SQLite Architecture Refactoring

This was the most significant recent bottom-up change.

As the drawbacks of the “file-as-database” model became increasingly apparent, I decided to introduce SQLite.

5.1 Why Change?

Data Integrity: The file system lacks transaction support; a write interruption could corrupt JSON files.
Query Capability: I needed more powerful retrieval to support the AI’s “long-term memory.”
Deployment Complexity: Syncing 1000 small files is far more error-prone than syncing a single .db file.

5.2 Refactoring Plan

I designed an abstract Storage Layer:

Interface-based: Decoupled the business logic in memory_manager.py from the underlying I/O.
Data Migration: Wrote scripts to seamlessly import old JSON/TXT data into novel.db.
Hybrid Architecture:
- Core Data (chapters, memories, drafts) → SQLite
- Config & Logs (API Keys, Logs) → Separate JSON files (easier for Git to ignore and for log rotation)

5.3 Bidirectional Sync Flow

To prevent the disaster of “writing new chapters on the Raspberry Pi, only to have them overwritten by old code on the Mac,” I added data rollback protection to the deployment script:

Sync Back: Before deployment, the script pulls the latest novel.db from the Raspberry Pi to the local machine.
Backup: Automatically commits the pulled data to a private repository for backup.
Push: Only pushes the new code to the Raspberry Pi after ensuring data safety.

sequenceDiagram
    participant Mac as Local Mac
    participant GitHub as Backup Repo
    participant Pi as Raspberry Pi

    Note over Mac: Run deploy.sh
    Mac->>Pi: 1. Pull remote data (Sync Back)
    Pi-->>Mac: Return latest novel.db
    Mac->>GitHub: 2. Backup data
    Mac->>Pi: 3. Push new code & DB (Rsync)
    Mac->>Pi: 4. Restart service (Systemd)

sequenceDiagram
    participant Mac as Local Mac
    participant GitHub as Backup Repo
    participant Pi as Raspberry Pi

    Note over Mac: Run deploy.sh
    Mac->>Pi: 1. Pull remote data (Sync Back)
    Pi-->>Mac: Return latest novel.db
    Mac->>GitHub: 2. Backup data
    Mac->>Pi: 3. Push new code & DB (Rsync)
    Mac->>Pi: 4. Restart service (Systemd)

sequenceDiagram
    participant Mac as Local Mac
    participant GitHub as Backup Repo
    participant Pi as Raspberry Pi

    Note over Mac: Run deploy.sh
    Mac->>Pi: 1. Pull remote data (Sync Back)
    Pi-->>Mac: Return latest novel.db
    Mac->>GitHub: 2. Backup data
    Mac->>Pi: 3. Push new code & DB (Rsync)
    Mac->>Pi: 4. Restart service (Systemd)

6. Transition Phase: Front-End/Back-End Separation (The Great Decoupling)

Before moving towards a more “service-oriented” architecture, I realized the current Streamlit monolith was becoming bloated: UI rendering, business logic, and database operations were all crammed into one entry point.

To support potential future mobile apps or multi-user collaboration, I planned a front-end/back-end separation:

Backend API-ification: Introduced FastAPI to encapsulate the capabilities of Agents like Muse and Guard into standard REST interfaces (e.g., /api/v1/brainstorm).
Lightweight Frontend: Streamlit degrades to a pure “frontend panel,” responsible only for display and sending requests; it could be replaced by React/Vue in the future.
Independent Deployment: The backend can run independently in a Docker container, serving multiple frontends.

While this step doesn’t involve changes to the underlying storage, it’s a crucial springboard for the system to evolve from a “script” to a “platform.” Once the boundaries are clear, the system can more naturally expand towards capabilities like multi-tenancy, permission isolation, canary releases, and async tasks.

7. Future Outlook: Cloud Native Architecture

Phase 2: Retrieval Upgrade (SQLite + Vector Retrieval Dual System)

As the manuscript grows longer, simply “remembering facts” is not enough. The system needs to both maintain structured facts (who holds what, who is injured, which rules are active) and perform fuzzy recall during writing (similar passages, atmospheric text, foreshadowing/memory triggers, character voice consistency). Therefore, I define the next phase’s goal as a SQLite + Vector Retrieval Dual System:

SQLite continues to handle “facts and structured memory”: Verifiable, traceable data like character states, settings, and timelines that can be used for constraint checking.
Vector Retrieval handles “fuzzy recall”: Similar fragments, related dialogues, writing references for similar scenes, and semantically related content that can activate “foreshadowing/memories.”

The corresponding deliverables will be more engineering-focused and iterable:

A Pluggable Retrieval Module: Exposes a unified interface retrieve(query) -> passages[] to the upper layers, with swappable underlying implementations (built-in SQLite / sidecar index / remote vector store).
Context Assembly Rules: For writing/polishing/Q&A, the context is assembled uniformly with the priority: “structured facts + vector retrieval snippets (TopK) + recent chapters,” ensuring both reliability and inspiration.

For incremental implementation, I’ll prioritize a “local closure first, then replace” path:

Start Local: Add an embeddings table to SQLite or use a sidecar file index to first close the “vector retrieval loop,” validating chunking strategies, retrieval quality, and context assembly strategies.
Then Replace: When multi-device/multi-user/larger scale is needed, migrate to systems like pgvector/Milvus/Pinecone that are better suited for online retrieval and concurrency.

Here are two design principles I believe must be upheld:

Chunking Strategy Matters More Than “Which Vector Store”: Chunking by paragraph, event, or dialogue often yields significantly better retrieval usability than chunking by a fixed word count (especially for tasks like “character voice consistency” and “foreshadowing payoff”).
Fact Priority (Conflict Resolution): When a vector-retrieved snippet conflicts with a structured fact from SQLite, the SQLite fact takes precedence. Vector retrieval provides inspiration and context, not the “source of truth” for modifying the world’s facts.

Phase 3: Cloud Native Prototype (Database + Object Storage)

SQLite is just the first step. As the novel grows to millions of words, I still plan for a “Database + Object Storage” architecture:

Data Type	Storage Solution	Reason
Metadata/Index	Cloudflare D1 / AWS RDS	Chapter lists, character relationship graphs, etc., require high-frequency, complex structured queries.
Content/Materials	Cloudflare R2 / AWS S3	Novel text and illustrations are large but have simple read/write patterns; separating storage significantly reduces database load.

To make “multi-device writing + multi-device sync” truly reliable, the core of the next phase will no longer be “can it generate,” but “can it stably govern creative assets long-term”: data consistency, backup and rollback, permissions and auditing, cost and observability will gradually become the main themes of architectural evolution.

Conclusion

The evolution of FantasyNovelAgent is also a microcosm of a developer’s journey from “just make it work” to “pursuing architectural beauty.” Every refactoring is aimed at making the AI assistant more stable and smarter, allowing me to focus on the most important thing—telling a good story.

Want updates? Subscribe via RSS