# StudyFlow Workflow Runbook

This runbook explains the intended use of:

- [studyflow-document-pipeline.workflow.json](/home/ovidiu/~devops-center/brainstorming/studyflow-gemini-aistudio/studyflow-document-pipeline.workflow.json)
- [studyflow-grounded-chat.workflow.json](/home/ovidiu/~devops-center/brainstorming/studyflow-gemini-aistudio/studyflow-grounded-chat.workflow.json)
- [studyflow-document-pipeline-errors.workflow.json](/home/ovidiu/~devops-center/brainstorming/studyflow-gemini-aistudio/studyflow-document-pipeline-errors.workflow.json)

## Purpose

The StudyFlow automation package now supports a clearer product model:

0. require authentication before any private document feature is accessible
1. transform uploaded documents into a friendly HTML workspace
2. enable grounded chat on top of extracted content
3. optionally generate AI study tools

This is no longer a study-pack-only pipeline.

Important scope note:

- these n8n workflows are backend support workflows
- they assume the user is already authenticated in the main application
- they also assume the user already has an active paid entitlement for the requested action
- landing page, sign-up, sign-in, sessions, and route protection are outside n8n and should be handled by the app shell/backend

Note:

- the one-document no-plan deterministic trial should stay outside these paid/private AI workflows unless we explicitly add a separate trial-safe processing path later

## Workflow Set

### 1. StudyFlow Document Workspace Pipeline

Role:

- run only after authenticated upload has already happened
- ingest uploaded file
- extract text
- persist normalized source
- render friendly HTML
- build retrieval index
- optionally generate a study pack
- persist final workspace
- mark the document run ready

### 2. StudyFlow Grounded Chat

Role:

- run only for authenticated users with access to the target document
- accept user chat messages
- load workspace context
- retrieve relevant chunks
- load conversation history
- call grounded QA service
- persist the conversation turn
- return the answer synchronously

### 3. StudyFlow Workflow Errors

Role:

- capture failures from both workflows
- persist review-queue entries
- optionally notify Slack
- mark known runs as failed

## Document Workspace Trigger

- webhook path: `/studyflow/process-document`
- method: `POST`

Expected payload:

```json
{
  "userId": "user_123",
  "sessionUserId": "user_123",
  "documentId": "doc_456",
  "documentVersionId": "docv_001",
  "subject": "AI Tokenization",
  "fileUrl": "https://...",
  "mimeType": "application/pdf",
  "fileName": "ai-tokenization.pdf",
  "processingMode": "workspace_only",
  "aiMode": "mini"
}
```

Authentication expectation:

- `userId` should come from the verified application session, not from an untrusted browser field alone
- the upload endpoint that hands work to n8n should verify the session first
- `fileUrl` should point to a user-owned uploaded file, not an arbitrary public URL supplied without checks
- duplicate detection should already have passed before the workflow starts

### Meaning of `processingMode`

- `workspace_only`
  - transform + grounded chat
- `workspace_plus_study_pack`
  - transform + grounded chat + optional study tools

`workspace_only` should be the normal default.

## Grounded Chat Trigger

- webhook path: `/studyflow/chat`
- method: `POST`

Expected payload:

```json
{
  "userId": "user_123",
  "sessionUserId": "user_123",
  "documentId": "doc_456",
  "documentVersionId": "docv_001",
  "conversationId": "conv_789",
  "message": "What does the document say about tokenization?",
  "topK": 6
}
```

Authentication expectation:

- the app backend should verify that the authenticated user owns or is allowed to access `documentId`
- the grounded chat workflow should be treated as a private authenticated action, not a public chatbot endpoint

## High-Level Flow: Document Workspace

1. Receive an authenticated document-processing request from the app backend
2. Validate payload and create `runId`
3. Confirm user-scoped document identifiers, document version id, and storage references
4. Record initial run
5. Create any needed usage reservation before expensive metered actions
6. Download file
7. Extract text
8. Persist normalized source document for that version
9. Estimate token size and internal study-pack strategy
10. Render friendly HTML
11. Build retrieval index for grounded chat
12. Optionally generate study pack
13. Persist final workspace result for that version
14. Finalize reserved usage for successful actions
15. Mark run as `ready`

### Recommended user-visible states

The workflow should emit enough run metadata for the app to present:

- `processing`
  - stage: `extracting`
  - stage: `rendering`
  - stage: `indexing`
  - stage: `study_pack_generating` when applicable
- `ready`
- `failed`

Duplicate detection should never enter this workflow in MVP.

It should be handled before dispatch and surfaced as a separate app-side state:

- `duplicate_blocked`

## High-Level Flow: Grounded Chat

1. Receive an authenticated chat request from the app backend
2. Validate message payload
3. Load workspace/document/version context
4. Verify document ownership context is present
5. Create a chat usage reservation tied to the client message id
6. Retrieve relevant extracted chunks
7. Load recent conversation history
8. Build grounded QA request
9. Generate answer using only retrieval results
10. Persist chat turn
11. Finalize reserved chat usage on success
12. Return answer with citations

## Public App Flow Outside n8n

These parts are required but should not be modeled as the same workflow package:

1. Public landing page
2. Sign up
3. Sign in
4. Session creation
5. Protected app routes
6. Dashboard/document list
7. Upload handoff into the document workflow
8. Chat handoff into the grounded chat workflow

## Recommended App Boundary

Use the main app/backend to do:

- sign-up and sign-in
- session or token validation
- no-plan vs paid-plan access checks
- protected routing
- file upload ownership
- duplicate fingerprint checks before dispatch
- storage URL issuance
- forwarding verified `userId` and `documentId` into n8n
- usage reservation creation before expensive async actions
- usage finalization only after successful completion

Use `n8n` to do:

- extraction orchestration
- transform orchestration
- retrieval orchestration
- grounded AI orchestration
- optional study-pack orchestration
- failure/review handling

Important usage rule:

- `n8n` should not be the source of truth for whether quota has already been consumed
- the app/backend should create reservations before dispatch where needed
- successful workflow completion should finalize reservations into usage events
- failed workflow completion should release or expire reservations

## Grounded Chat Rule

The grounded chat workflow is intentionally strict:

- answers must come only from extracted content
- if retrieval does not support an answer, the assistant should say so
- citations or chunk references should be returned whenever possible

Additional trust rules:

- document text must be treated as source evidence, not executable instruction
- prompt-injection-like text inside the document must be ignored as instruction content
- low-quality extraction should downgrade confidence or trigger refusal
- unsupported answers should be returned explicitly rather than guessed

This behavior should be implemented in the helper AI QA service, not faked in the frontend.

## Helper Services Assumed

These workflows orchestrate helper services. They do not implement those services directly.

Required environment variables:

- `STUDYFLOW_DB_API_BASE`
- `STUDYFLOW_EXTRACTOR_URL`
- `STUDYFLOW_TEMPLATE_RENDERER_URL`
- `STUDYFLOW_RETRIEVAL_URL`
- `STUDYFLOW_AI_SERVICE_URL`

Optional:

- `STUDYFLOW_SLACK_WEBHOOK_URL`

These workflows also assume that the upstream application/backend has already handled:

- user registration
- user login
- session verification
- file ownership and storage access
- file-type allowlist checks
- file-size checks
- duplicate checks
- parser-safe acceptance checks where possible

## Expected Endpoints

### Run tracking

- `POST {STUDYFLOW_DB_API_BASE}/runs`
- `PATCH {STUDYFLOW_DB_API_BASE}/runs/:runId`

### Source document persistence

- `POST {STUDYFLOW_DB_API_BASE}/documents/source`

### Workspace persistence

- `POST {STUDYFLOW_DB_API_BASE}/document-workspaces`
- `POST {STUDYFLOW_DB_API_BASE}/document-workspaces/context`

### Conversation persistence

- `POST {STUDYFLOW_DB_API_BASE}/conversations/history`
- `POST {STUDYFLOW_DB_API_BASE}/conversations/messages`

### Review queue

- `POST {STUDYFLOW_DB_API_BASE}/review-queue`

### Extraction

- `POST {STUDYFLOW_EXTRACTOR_URL}/extract`

Expected extraction response:

```json
{
  "documentId": "doc_456",
  "text": "plain extracted text ...",
  "pageCount": 200,
  "charCount": 420000,
  "mimeType": "application/pdf",
  "fileName": "ai-tokenization.pdf"
}
```

Recommended extraction failure response:

```json
{
  "error": {
    "code": "PASSWORD_PROTECTED_PDF",
    "message": "This PDF is password-protected and cannot be processed.",
    "retrySafe": false
  }
}
```

### Friendly HTML transformation

- `POST {STUDYFLOW_TEMPLATE_RENDERER_URL}/render-friendly`

Expected response:

```json
{
  "result": {
    "html": "<article>...</article>",
    "sections": [],
    "provider": "non-ai-template",
    "generationTimeMs": 1200
  }
}
```

### Retrieval indexing

- `POST {STUDYFLOW_RETRIEVAL_URL}/index`
- `POST {STUDYFLOW_RETRIEVAL_URL}/retrieve`

Expected retrieval index response:

```json
{
  "result": {
    "retrievalReady": true,
    "chunksIndexed": 128,
    "indexId": "idx_123"
  }
}
```

Expected retrieval response:

```json
{
  "result": {
    "chunks": [
      {
        "chunkId": "chunk_001",
        "text": "Relevant extracted text...",
        "page": 14,
        "score": 0.92
      }
    ]
  }
}
```

### Grounded QA

- `POST {STUDYFLOW_AI_SERVICE_URL}/grounded-chat`

Expected response:

```json
{
  "result": {
    "answer": "Based on the document...",
    "citations": [],
    "provider": "openai",
    "modelUsed": "gpt-5.4-mini",
    "generationTimeMs": 2200
  }
}
```

### Optional study-pack generation

- `POST {STUDYFLOW_AI_SERVICE_URL}/single-pass-study-pack`
- `POST {STUDYFLOW_AI_SERVICE_URL}/chunked-study-pack`

Expected response:

```json
{
  "result": {
    "digest": [],
    "flashcards": [],
    "quiz": [],
    "plan": [],
    "provider": "openai",
    "modelUsed": "gpt-5.4-mini",
    "generationTimeMs": 18000
  }
}
```

## Internal Strategy Routing

The user-facing mode should stay simple:

- `workspace_only`
- `workspace_plus_study_pack`

Internally, the workflow can still decide:

- `none`
- `single_pass_ai`
- `chunked_ai`

Suggested rule:

- `workspace_only` => `none`
- `workspace_plus_study_pack` + smaller file => `single_pass_ai`
- `workspace_plus_study_pack` + large file => `chunked_ai`

## Idempotency

The document workflow generates:

- `runId`
- `idempotencyKey = userId:documentVersionId`

Recommended backend rule:

- if a ready workspace already exists for `documentVersionId`, return it
- do not create duplicate active runs for the same `idempotencyKey`

For MVP duplicate behavior:

- if an already parsed project with the same fingerprint exists for the user, do not dispatch the workflow at all
- instead return the existing project/document id and a duplicate warning state

The chat workflow should use:

- `conversationId`
- plus `documentVersionId`

to keep context stable.

## Error Handling

Use the companion workflow:

- `StudyFlow Workflow Errors`

That workflow should:

1. capture failure metadata
2. write review-queue entries
3. optionally notify Slack
4. mark known runs as failed when `runId` exists

When possible, failed runs should also include:

- failed stage
- short user-facing message
- retry-safe flag

## Deployment Order

1. import/update `StudyFlow Document Workspace Pipeline`
2. import/create `StudyFlow Grounded Chat`
3. import/update `StudyFlow Workflow Errors`
4. configure shared env vars in n8n
5. activate the error workflow first
6. activate the document workflow
7. activate the chat workflow

## Why This Package Is Better

This package now matches the actual product:

- deterministic transform is the default value path
- grounded chat is the primary AI feature
- study-pack generation is optional and premium-feeling
- n8n coordinates the system instead of pretending to be the system

## Best Next Improvements

1. Add per-step retry rules to ingestion
2. Add citation formatting rules for grounded chat
3. Add credit accounting before optional study-pack generation
4. Add retention and cleanup workflows
5. Add document-size guardrails before expensive AI calls
6. Add a small synchronous shortcut for very small text documents
7. Replace placeholder helper endpoints with real StudyFlow services