Practical Guide · Building a Memory-Powered AI Writing Partner (Part 1): Multi-Agent Architecture Evolution
When writing a long novel, the most painful part isn’t “not being able to write”—it’s “forgetting what you’ve already written.” Did I set up that foreshadowing properly? Was that character already injured in the last chapter? When exactly was that world-building rule established? Once your manuscript crosses the hundreds-of-thousands-of-words mark, relying solely on your brain and scattered notes quickly becomes unmanageable.
FantasyNovelAgent grew out of this exact need. It started as a simple Python script, then evolved to include dynamic memory and auto-archiving, later added multi-device sync, and is now taking its first steps toward a front-end/back-end separation with cloud-native storage. This article retraces that evolution path and explains the key trade-offs, offering a reference for similar projects.
If you’d like to try the project yourself, here’s an online demo: demo online (feel free to test it). To prevent abuse and cost leakage, the demo requires you to fill in your own LLM API Key in the settings before it actually calls the model.

1. Core Features: How AI Writes Like a Partner
Before diving into the technical architecture, let’s look at what it can do. FantasyNovelAgent is not a simple “continuation tool”; it’s more like a “writing studio” staffed by multiple experts.
1.1 Brainstorming
When you’re stuck, click “Auto Brainstorm.” The system will analyze the plot direction of the last 10 chapters, unresolved plot threads (future plans), and world-building settings to provide 3 distinct plot branches. You can choose one or blend their ideas.
1.2 Writing & Polishing
- Muse: Handles the “skeleton.” Based on your chosen outline, it quickly generates a ~2000-word first draft, focusing on plot progression and foreshadowing.
- Stylist: Handles the “flesh.” It deeply polishes the draft, transforming a bland “he threw a punch” into “the wind howled as his fist shot forward, carrying the force of a thunderbolt…”, ensuring the style matches the “modern xianxia power fantasy” tone.
1.3 Active Memory
This is the project’s killer feature. You don’t need to manually maintain “character sheets” or “inventories.”
- The Archivist works silently in the background. After you finish a chapter, it automatically analyzes the text: “The protagonist obtained the ‘Azure Cloud Sword’.” “‘Li Si’ was mortally wounded and died.”
- This information is extracted as structured data and stored in the SQLite database. When writing the next chapter, the AI won’t confuse whether the protagonist is holding a sword or a knife.

graph TD
User[User Input] --> Router{Intent Router}
Router -->|Writing| Muse[Muse]
Router -->|Polishing| Stylist[Stylist]
Router -->|Checking| Guard[Guard]
Context[(Context Builder)] --> Muse
Context --> Stylist
Muse --> Result[Generated Content]
Result --> Archivist[Archivist]
Archivist -->|Extract & Update| Memory[(Memory/DB)]
Memory --> Contextgraph TD
User[User Input] --> Router{Intent Router}
Router -->|Writing| Muse[Muse]
Router -->|Polishing| Stylist[Stylist]
Router -->|Checking| Guard[Guard]
Context[(Context Builder)] --> Muse
Context --> Stylist
Muse --> Result[Generated Content]
Result --> Archivist[Archivist]
Archivist -->|Extract & Update| Memory[(Memory/DB)]
Memory --> Contextgraph TD
User[User Input] --> Router{Intent Router}
Router -->|Writing| Muse[Muse]
Router -->|Polishing| Stylist[Stylist]
Router -->|Checking| Guard[Guard]
Context[(Context Builder)] --> Muse
Context --> Stylist
Muse --> Result[Generated Content]
Result --> Archivist[Archivist]
Archivist -->|Extract & Update| Memory[(Memory/DB)]
Memory --> Contextgraph TD
User[User Input] --> Router{Intent Router}
Router -->|Writing| Muse[Muse]
Router -->|Polishing| Stylist[Stylist]
Router -->|Checking| Guard[Guard]
Context[(Context Builder)] --> Muse
Context --> Stylist
Muse --> Result[Generated Content]
Result --> Archivist[Archivist]
Archivist -->|Extract & Update| Memory[(Memory/DB)]
Memory --> Context1.4 Logic Guard
Want the protagonist to suddenly learn a forbidden technique from a rival sect? The Guard will immediately warn you: “Detected setting conflict: This forbidden technique requires ‘Demonic Bloodline,’ but the protagonist currently has a ‘Pure Yang Body’.”
1.5 LLM Strategy
To achieve the best results, I didn’t bind to a single model. Instead, I adopted a “best tool for the job” strategy:
| Task Type | Recommended Model | Reason |
|---|---|---|
| Logic Check / Complex Reasoning | DeepSeek R1 / OpenAI o1 | These “reasoning” models perform long chain-of-thought (CoT) thinking before outputting, making them excellent for finding plot holes or designing complex intellectual battles. |
| Drafting / Polishing | Claude 3.5 Sonnet / GPT-4o | Excellent prose, natural language flow, especially good at environmental descriptions and emotional rendering. |
| Memory Extraction / Summarization | Gemini Flash / DeepSeek V3 | Fast, low cost, large context window, ideal for processing large amounts of text for analysis. |

2. Architecture Evolution: From Files to Database
In the project’s early days, to quickly validate the idea, I used the simplest “file system storage” approach.
- Chapters: Each chapter was a
.txtfile. - Memory: Character cards, world settings, and plot outlines were stored as
character_db.json,world_settings.md, etc. - Advantages: Extremely fast development, Git-friendly version control, human-readable.
- Disadvantages: As the number of chapters grew (e.g., to chapter 100), the
data/directory became cluttered with hundreds of small files. File I/O became frequent, and complex queries (like “search all chapters mentioning ‘Azure Cloud Sword’”) were difficult.
3. Feature Completion and Automation
As the core logic solidified, I introduced more engineering features:
- Intent Router: Routes user commands in natural language (“Write a fight scene for me” vs. “Check this chapter for bugs”) to the appropriate Agent.
- Usage Tracking: Integrated token consumption statistics for clear cost visibility.
- Auto-Archiving: When the user clicks “Save,” the system not only writes the file but also triggers a series of background tasks—updating the summary chain, checking future plan completion, etc.
4. Deployment: Putting AI on a Raspberry Pi
To enable writing anytime, anywhere, I deployed the project on my home Raspberry Pi.
- Tunneling: Used Cloudflare Tunnel for secure access via a custom domain without needing a public IP.
- Automated Ops: Wrote
systemdservice scripts for auto-start on boot and process monitoring. - One-Click Deploy: Developed a
deploy.shscript. After writing code on my Mac, a single command automatically handles Git commit, code sync (Rsync), and remote service restart.
5. Key Turning Point: SQLite Architecture Refactoring
This was the most significant recent bottom-up change.
As the drawbacks of the “file-as-database” model became increasingly apparent, I decided to introduce SQLite.
5.1 Why Change?
- Data Integrity: The file system lacks transaction support; a write interruption could corrupt JSON files.
- Query Capability: I needed more powerful retrieval to support the AI’s “long-term memory.”
- Deployment Complexity: Syncing 1000 small files is far more error-prone than syncing a single
.dbfile.
5.2 Refactoring Plan
I designed an abstract Storage Layer:
- Interface-based: Decoupled the business logic in
memory_manager.pyfrom the underlying I/O. - Data Migration: Wrote scripts to seamlessly import old JSON/TXT data into
novel.db. - Hybrid Architecture:
- Core Data (chapters, memories, drafts) → SQLite
- Config & Logs (API Keys, Logs) → Separate JSON files (easier for Git to ignore and for log rotation)
5.3 Bidirectional Sync Flow
To prevent the disaster of “writing new chapters on the Raspberry Pi, only to have them overwritten by old code on the Mac,” I added data rollback protection to the deployment script:
- Sync Back: Before deployment, the script pulls the latest
novel.dbfrom the Raspberry Pi to the local machine. - Backup: Automatically commits the pulled data to a private repository for backup.
- Push: Only pushes the new code to the Raspberry Pi after ensuring data safety.
sequenceDiagram
participant Mac as Local Mac
participant GitHub as Backup Repo
participant Pi as Raspberry Pi
Note over Mac: Run deploy.sh
Mac->>Pi: 1. Pull remote data (Sync Back)
Pi-->>Mac: Return latest novel.db
Mac->>GitHub: 2. Backup data
Mac->>Pi: 3. Push new code & DB (Rsync)
Mac->>Pi: 4. Restart service (Systemd)sequenceDiagram
participant Mac as Local Mac
participant GitHub as Backup Repo
participant Pi as Raspberry Pi
Note over Mac: Run deploy.sh
Mac->>Pi: 1. Pull remote data (Sync Back)
Pi-->>Mac: Return latest novel.db
Mac->>GitHub: 2. Backup data
Mac->>Pi: 3. Push new code & DB (Rsync)
Mac->>Pi: 4. Restart service (Systemd)sequenceDiagram
participant Mac as Local Mac
participant GitHub as Backup Repo
participant Pi as Raspberry Pi
Note over Mac: Run deploy.sh
Mac->>Pi: 1. Pull remote data (Sync Back)
Pi-->>Mac: Return latest novel.db
Mac->>GitHub: 2. Backup data
Mac->>Pi: 3. Push new code & DB (Rsync)
Mac->>Pi: 4. Restart service (Systemd)sequenceDiagram
participant Mac as Local Mac
participant GitHub as Backup Repo
participant Pi as Raspberry Pi
Note over Mac: Run deploy.sh
Mac->>Pi: 1. Pull remote data (Sync Back)
Pi-->>Mac: Return latest novel.db
Mac->>GitHub: 2. Backup data
Mac->>Pi: 3. Push new code & DB (Rsync)
Mac->>Pi: 4. Restart service (Systemd)6. Transition Phase: Front-End/Back-End Separation (The Great Decoupling)
Before moving towards a more “service-oriented” architecture, I realized the current Streamlit monolith was becoming bloated: UI rendering, business logic, and database operations were all crammed into one entry point.
To support potential future mobile apps or multi-user collaboration, I planned a front-end/back-end separation:
- Backend API-ification: Introduced FastAPI to encapsulate the capabilities of Agents like
MuseandGuardinto standard REST interfaces (e.g.,/api/v1/brainstorm). - Lightweight Frontend: Streamlit degrades to a pure “frontend panel,” responsible only for display and sending requests; it could be replaced by React/Vue in the future.
- Independent Deployment: The backend can run independently in a Docker container, serving multiple frontends.
While this step doesn’t involve changes to the underlying storage, it’s a crucial springboard for the system to evolve from a “script” to a “platform.” Once the boundaries are clear, the system can more naturally expand towards capabilities like multi-tenancy, permission isolation, canary releases, and async tasks.
7. Future Outlook: Cloud Native Architecture
Phase 2: Retrieval Upgrade (SQLite + Vector Retrieval Dual System)
As the manuscript grows longer, simply “remembering facts” is not enough. The system needs to both maintain structured facts (who holds what, who is injured, which rules are active) and perform fuzzy recall during writing (similar passages, atmospheric text, foreshadowing/memory triggers, character voice consistency). Therefore, I define the next phase’s goal as a SQLite + Vector Retrieval Dual System:
- SQLite continues to handle “facts and structured memory”: Verifiable, traceable data like character states, settings, and timelines that can be used for constraint checking.
- Vector Retrieval handles “fuzzy recall”: Similar fragments, related dialogues, writing references for similar scenes, and semantically related content that can activate “foreshadowing/memories.”
The corresponding deliverables will be more engineering-focused and iterable:
- A Pluggable Retrieval Module: Exposes a unified interface
retrieve(query) -> passages[]to the upper layers, with swappable underlying implementations (built-in SQLite / sidecar index / remote vector store). - Context Assembly Rules: For writing/polishing/Q&A, the context is assembled uniformly with the priority: “structured facts + vector retrieval snippets (TopK) + recent chapters,” ensuring both reliability and inspiration.
For incremental implementation, I’ll prioritize a “local closure first, then replace” path:
- Start Local: Add an
embeddingstable to SQLite or use a sidecar file index to first close the “vector retrieval loop,” validating chunking strategies, retrieval quality, and context assembly strategies. - Then Replace: When multi-device/multi-user/larger scale is needed, migrate to systems like pgvector/Milvus/Pinecone that are better suited for online retrieval and concurrency.
Here are two design principles I believe must be upheld:
- Chunking Strategy Matters More Than “Which Vector Store”: Chunking by paragraph, event, or dialogue often yields significantly better retrieval usability than chunking by a fixed word count (especially for tasks like “character voice consistency” and “foreshadowing payoff”).
- Fact Priority (Conflict Resolution): When a vector-retrieved snippet conflicts with a structured fact from SQLite, the SQLite fact takes precedence. Vector retrieval provides inspiration and context, not the “source of truth” for modifying the world’s facts.
Phase 3: Cloud Native Prototype (Database + Object Storage)
SQLite is just the first step. As the novel grows to millions of words, I still plan for a “Database + Object Storage” architecture:
| Data Type | Storage Solution | Reason |
|---|---|---|
| Metadata/Index | Cloudflare D1 / AWS RDS | Chapter lists, character relationship graphs, etc., require high-frequency, complex structured queries. |
| Content/Materials | Cloudflare R2 / AWS S3 | Novel text and illustrations are large but have simple read/write patterns; separating storage significantly reduces database load. |
To make “multi-device writing + multi-device sync” truly reliable, the core of the next phase will no longer be “can it generate,” but “can it stably govern creative assets long-term”: data consistency, backup and rollback, permissions and auditing, cost and observability will gradually become the main themes of architectural evolution.
Conclusion
The evolution of FantasyNovelAgent is also a microcosm of a developer’s journey from “just make it work” to “pursuing architectural beauty.” Every refactoring is aimed at making the AI assistant more stable and smarter, allowing me to focus on the most important thing—telling a good story.
🤖 AI Related Posts by semantic similarity
Want updates? Subscribe via RSS
Related Content
- Practical · Building a Memory-Enabled AI Writing Partner (Part 3): Security Architecture (RAG Protection, Fact Guard, and BYOK)
- Practical · Building a Memory-Enabled AI Writing Partner (Part 2): Database (From JSON to Single Table to Relational Tables)
- Hands-On: From AI Semantic Search to AI Content Pipeline – How Static Blogs Continuously Evolve (Continued)
- Practical · Building a Memory-Enabled AI Writing Partner (Part 4): Observability (Metrics + Logs + Trace + Cost)
- Practical Guide: Building a Memory-Enabled AI Writing Partner (ikun) – Retrieval System (Vector Search, Hybrid Search & Cloud Deployment)