What is Fency
Fency is an end-to-end framework for getting AI features into production. It currently supports any backend and webapps built on React (more webapp libraries coming soon).
Core concepts
- Memories: Made for mirroring your data into Fency. Your backend creates and updates memories via the Fency API (POST
/v1/memories, PATCH/v1/memories/:id). Memories support text, URL, and file sources, and can be used for RAG, context injection, and more. - Sessions: Control access to memories and API usage. Your backend creates sessions with a secret key; the client receives a short-lived
clientToken. Sessions scope which memories can be accessed. Learn more in Sessions and client tokens. - @fencyai/react SDK: Orchestrates the flow and renders progress in the webapp. It handles token fetching, session creation, streaming, and UI state—tying your backend, the Fency API, and React together. See React Integration for setup.
End-to-end flow
The diagram below shows how your backend syncs memories to Fency, then how the React SDK obtains a client token and makes API requests. See React Integration and Server Integration for implementation details.
Getting started
Fency.ai provides seamless LLM integration for React webapps. Get started in minutes with our React SDK and start building AI-powered features.
Quick start
To get started with Fency.ai in your React application:
- Log in: Log in to your Fency.ai account
- Obtain publishable and secret keys: Generate your first publishable key and secret key in the dashboard
- Install the React SDK: Follow our installation and setup guide
- Set up endpoints: Create backend endpoints for stream sessions and agent task sessions
For a fully functioning setup, see our example repositories built with Next.js: streaming chat completion, structured chat completion, and memory chat completion.
Supported models
Fency.ai supports leading LLM models from Anthropic, Google, and OpenAI:
- Anthropic: Claude Haiku 4.5, Claude Sonnet 4.5, Claude Opus 4.5, Claude Sonnet 4.6, Claude Opus 4.6
- Google: Gemini 2.5 Flash, Gemini 2.5 Flash Lite, Gemini 2.5 Pro
- OpenAI: GPT-4.1-nano, GPT-4.1-mini, GPT-4.1, GPT-5-mini, GPT-5-nano
Publishable keys
Publishable keys are designed for client-side use in your React webapp. They are safe to expose in browser code. They are used in combination with sessions created with a secret key in your backend. For setup details, see Obtaining a publishable key.
Secret keys
Secret keys authenticate server-to-server requests from your backend to the Fency.ai API. They must never be exposed in client-side code or public repositories. Use secret keys when creating sessions, managing memories, or making any API calls from your server. For setup details, see Obtaining a secret key. For a full overview of operations, see the API Reference.
Sessions and client tokens
The React SDK uses client tokens to authenticate with the Fency.ai API. Client tokens are short-lived credentials that your server creates by calling the Fency API. Sessions control access to memories—you decide which memories each session can access when creating it. This architecture keeps your secret key secure on the server while allowing your React app to make authenticated requests.
How it works
- Client SDK initiates the request: When the React SDK needs to make API requests (e.g. for creating a new stream), it calls the
fetchCreateStreamClientTokenfunction you pass toFencyProvider, which requests a client token from your server endpoint (e.g./api/stream-client-token). - Server creates a session: Your backend receives the request and calls the Fency API with your secret key. The API returns a session object that includes a
clientToken, which your endpoint returns to the frontend. - SDK uses the token: The React SDK receives the
clientTokenand uses it to authenticate with the Fency.ai API.
Session types
Different session types support different features. A stream session enables basic streaming, while an agent task session is required for chat completions (streaming, structured, or memory-based). Your server endpoint should create the appropriate session type for the features your React app needs.
Security
Client tokens are scoped to a single session and expire after use or when the session ends. They never expose your secret key. By creating sessions on your server, you maintain full control over who can obtain client tokens and can enforce your own authentication and rate limiting before creating sessions.
React Integration
Fency.ai provides a dedicated React SDK that makes it easy to integrate LLM capabilities into your React applications. Our SDK includes hooks, components, and utilities designed specifically for React developers, available on npm as @fencyai/js and @fencyai/react.
Obtaining a publishable key
Publishable keys are the primary way to authenticate with Fency.ai from your React webapp. These keys are safe to expose in client-side code and provide secure access to our LLM integration services.
Creating a publishable key
To create a publishable key, navigate to your Fency.ai dashboard and go to the API Keys section. Click "Create new key" and select "Publishable key" as the key type.
Your publishable key will look something like this:
pk_1234567890abcdef1234567890abcdef1234567890abcdef1234567890abcdef
Allowed origins
Allowed origins (CORS) restrict which domains can use your publishable key, providing an additional layer of security. When creating or editing a publishable key, you can specify which domains are allowed to make requests:
- Development:
http://localhost:3000,http://localhost:5173 - Production:
https://yourdomain.com
Best practices
- Use separate publishable keys for local development and production.
- Only add origins that actually need access.
- Regularly review and update your allowed origins list, especially for production.
Installation and setup
This section contains instructions for installing and setting up Fency.ai with React.
Before you start
We assume you have a basic understanding of React and TypeScript, and have created a publishable key in the dashboard.
Installation
Install the Fency.ai SDK for JavaScript and React:
npm install --save @fencyai/js @fencyai/react
Then provide your publishable key to loadFency and wrap your app in the FencyProvider to make the Fency.ai client available to the React hooks we'll use later.
import { loadFency } from '@fencyai/js'
import { FencyProvider } from '@fencyai/react'
const fency = loadFency({ publishableKey: 'pk_your_publishable_key' })
const fetchCreateStreamClientToken = async () => {
const res = await fetch('/api/stream-client-token', { method: 'POST' })
if (!res.ok) throw new Error('Failed to create stream client token')
const data = await res.json()
if (!data.clientToken) throw new Error('No clientToken in response')
return { clientToken: data.clientToken }
}
export default function App() {
return (
<FencyProvider
fency={fency}
fetchCreateStreamClientToken={fetchCreateStreamClientToken}
>
<div />
</FencyProvider>
)
}
The /api/stream-client-token endpoint should create a stream session on your server. See Creating a stream session for implementation details.
Streaming Chat Completions
Use the useAgentTasks hook and its createAgentTask method with type: 'StreamingChatCompletion' to stream chat completions in real-time, receiving tokens as they are generated.
Usage
import { AgentTaskProgress, useAgentTasks } from '@fencyai/react'
import { useState } from 'react'
const fetchCreateAgentTaskClientToken = async () => {
const res = await fetch('/api/agent-task-session', { method: 'POST' })
return res.json()
}
export default function StreamingChatCompletionExample() {
const [result, setResult] = useState('')
const { latest, createAgentTask } = useAgentTasks({
fetchCreateAgentTaskClientToken,
})
const handleSend = async () => {
const response = await createAgentTask({
type: 'StreamingChatCompletion',
messages: [{ role: 'user', content: 'Hello world!' }],
model: 'anthropic/claude-sonnet-4.5',
})
if (
response.type === 'success' &&
response.response.taskType === 'StreamingChatCompletion'
) {
setResult(response.response.text)
}
}
return (
<div>
<button onClick={handleSend}>Send Message</button>
{latest && <AgentTaskProgress agentTask={latest} />}
<div>{result}</div>
</div>
)
}
Structured Chat Completions
Use the useAgentTasks hook and its createAgentTask method with type: 'StructuredChatCompletion' to get responses in a specific format using Zod schemas. This is perfect for extracting structured data from AI responses.
Usage
import { AgentTaskProgress, useAgentTasks } from '@fencyai/react'
import { useState } from 'react'
import { z } from 'zod'
const responseSchema = z.object({
actorName: z.string().describe('Name of the actor'),
age: z.string().describe('Age of the actor'),
})
const fetchCreateAgentTaskClientToken = async () => {
const res = await fetch('/api/agent-task-session', { method: 'POST' })
return res.json()
}
export default function StructuredChatCompletionExample() {
const [result, setResult] = useState<z.infer<typeof responseSchema> | null>(null)
const { latest, createAgentTask } = useAgentTasks({
fetchCreateAgentTaskClientToken,
})
const handleSend = async () => {
const response = await createAgentTask({
type: 'StructuredChatCompletion',
messages: [{ role: 'user', content: 'Tell me about a famous actor' }],
model: 'anthropic/claude-sonnet-4.5',
jsonSchema: JSON.stringify(z.toJSONSchema(responseSchema)),
})
if (
response.type === 'success' &&
response.response.taskType === 'StructuredChatCompletion'
) {
setResult(response.response.json as z.infer<typeof responseSchema>)
}
}
return (
<div>
<button onClick={handleSend}>Send Message</button>
{latest && <AgentTaskProgress agentTask={latest} />}
{result && <div>{result.actorName} (age {result.age})</div>}
</div>
)
}
Memory Chat Completions
Use the useAgentTasks hook and its createAgentTask method with type: 'MemoryChatCompletion' to build document-aware chat interfaces. The model can answer questions grounded in memories you have uploaded — such as PDFs, contracts, or knowledge base articles.
Memory Chat Completions require a server-side agent task session that specifies which memory types the task is allowed to access via guardRails.memoryTypes.
Usage
import { AgentTaskProgress, useAgentTasks } from '@fencyai/react'
import { useState } from 'react'
const fetchCreateAgentTaskClientToken = async () => {
const res = await fetch('/api/agent-task-session', { method: 'POST' })
return res.json()
}
export default function MemoryChatCompletionExample() {
const [result, setResult] = useState('')
const { latest, createAgentTask } = useAgentTasks({
fetchCreateAgentTaskClientToken,
})
const handleSend = async () => {
const response = await createAgentTask({
type: 'MemoryChatCompletion',
messages: [{ role: 'user', content: 'Hello world!' }],
model: 'openai/gpt-4.1-mini',
})
if (
response.type === 'success' &&
response.response.taskType === 'MemoryChatCompletion'
) {
setResult(response.response.text)
}
}
return (
<div>
<button onClick={handleSend}>Send Message</button>
{latest && <AgentTaskProgress agentTask={latest} />}
<div>{result}</div>
</div>
)
}
Advanced use
The Memory Chat Completion task searches a vector database for memories defined by your guardRails. Pass chunkLimit and memoryScanLimit directly to createAgentTask to control how search results are retrieved and processed:
const response = await createAgentTask({
type: 'MemoryChatCompletion',
messages: [{ role: 'user', content: 'Summarize the key points.' }],
model: 'openai/gpt-4.1-mini',
chunkLimit: 20,
memoryScanLimit: 5,
})
chunkLimit— Number of chunks returned for each vector search the agent makes. Chunks are ordered by highest score and may belong to many different memories.memoryScanLimit— Number of memories that will be fully scanned. Applied to the top N highest-scored memories from the chunk results.
These parameters give you fine-grained control over the search. For example, set memoryScanLimit to 0 to omit full scans completely and rely only on the chunk results.
Obtaining a secret key
Secret keys are used to authenticate server-to-server requests from your backend to the Fency.ai API. Unlike publishable keys, secret keys must never be exposed in client-side code, browser bundles, or public repositories.
Creating a secret key
To create a secret key, navigate to your Fency.ai dashboard and go to the API Keys section. Click "Create new key" and select "Secret key" as the key type.
Your secret key will look something like this:
sk_1234567890abcdef1234567890abcdef1234567890abcdef1234567890abcdef
Storing your secret key
Store your secret key as a server-side environment variable. For local development, add it to your .env or .env.local file:
FENCY_SECRET_KEY=sk_...
In production, prefer injecting it via your hosting platform's environment variable settings or a dedicated secrets manager (such as AWS Secrets Manager, Google Cloud Secret Manager, or HashiCorp Vault) rather than committing it to a file.
Never expose it to the browser — do not include it in client-side bundles, public environment variables, or any other mechanism that makes it accessible from the browser.
Creating a stream session
The Fency.ai React SDK uses a short-lived stream client token to authenticate real-time task updates in the browser. Your server creates this token by calling the Fency sessions API and returning the result to the client. The token is then passed to FencyProvider via the fetchCreateStreamClientToken callback.
Server-side route
Create a server endpoint that calls POST https://api.fency.ai/v1/sessions with a createStream: {} body using your FENCY_SECRET_KEY:
// server route handler (e.g. POST /api/stream-session)
const secretKey = process.env.FENCY_SECRET_KEY
if (!secretKey) {
throw new Error('FENCY_SECRET_KEY is not defined.')
}
export async function POST() {
const response = await fetch('https://api.fency.ai/v1/sessions', {
method: 'POST',
headers: {
Authorization: `Bearer ${secretKey}`,
'Content-Type': 'application/json',
},
body: JSON.stringify({ createStream: {} }),
})
// data: { id, createdAt, type, clientToken }
const data = await response.json()
return new Response(JSON.stringify(data), {
status: response.status,
headers: { 'Content-Type': 'application/json' },
})
}
Client-side setup
Pass a callback that fetches the stream token to FencyProvider. The SDK calls this automatically before each task:
import { loadFency } from '@fencyai/js'
import { FencyProvider } from '@fencyai/react'
const fency = loadFency({
publishableKey: 'pk_1234567890abcdef1234567890abcdef1234567890abcdef1234567890abcdef',
})
async function fetchCreateStreamClientToken() {
const res = await fetch('/api/stream-session', { method: 'POST' })
if (!res.ok) {
throw new Error('Failed to create stream session')
}
const data = await res.json()
if (!data.clientToken) {
throw new Error('No clientToken in session response')
}
return { clientToken: data.clientToken }
}
export default function Home() {
return (
<FencyProvider
fency={fency}
fetchCreateStreamClientToken={fetchCreateStreamClientToken}
>
<App />
</FencyProvider>
)
}
For a complete working example, see the streaming chat completion example, which demonstrates the full server-side integration pattern.
Creating an agent task session
Each call to createAgentTask in the React SDK requires a short-lived agent task client token. Your server creates this token by calling POST https://api.fency.ai/v1/sessions with a createAgentTask body. The task type and any access restrictions (such as which memories are available) are set here on the server — never on the client.
Server-side route
The createAgentTask body varies by task type. Use separate endpoints or route logic for each, or a single endpoint that accepts a task type parameter:
Client-side usage
Pass a callback that fetches the agent task token to useAgentTasks:
const fetchCreateAgentTaskClientToken = async () => {
const res = await fetch('/api/agent-task-session', { method: 'POST' })
if (!res.ok) {
throw new Error('Failed to create agent task session')
}
const data = await res.json()
if (!data.clientToken) {
throw new Error('No clientToken in session response')
}
return { clientToken: data.clientToken }
}
const { createAgentTask } = useAgentTasks({
fetchCreateAgentTaskClientToken,
})
For Memory Chat Completions, the server-side route should include guardRails.memoryTypes to specify which memory types and memory IDs the task is allowed to access. See the Memory Chat Completions section for the full client-side usage pattern.
Streaming chat completion
Creates an agent task session for streaming chat completions.
// server route handler (e.g. POST /api/agent-task-session)
const secretKey = process.env.FENCY_SECRET_KEY
if (!secretKey) {
throw new Error('FENCY_SECRET_KEY is not defined.')
}
export async function POST() {
const response = await fetch('https://api.fency.ai/v1/sessions', {
method: 'POST',
headers: {
Authorization: `Bearer ${secretKey}`,
'Content-Type': 'application/json',
},
body: JSON.stringify({
createAgentTask: {
taskType: 'STREAMING_CHAT_COMPLETION',
},
}),
})
const data = await response.json()
return new Response(JSON.stringify(data), {
status: response.status,
headers: { 'Content-Type': 'application/json' },
})
}
Structured chat completion
Creates an agent task session for structured chat completions.
// server route handler (e.g. POST /api/agent-task-session)
const secretKey = process.env.FENCY_SECRET_KEY
if (!secretKey) {
throw new Error('FENCY_SECRET_KEY is not defined.')
}
export async function POST() {
const response = await fetch('https://api.fency.ai/v1/sessions', {
method: 'POST',
headers: {
Authorization: `Bearer ${secretKey}`,
'Content-Type': 'application/json',
},
body: JSON.stringify({
createAgentTask: {
taskType: 'STRUCTURED_CHAT_COMPLETION',
},
}),
})
const data = await response.json()
return new Response(JSON.stringify(data), {
status: response.status,
headers: { 'Content-Type': 'application/json' },
})
}
Memory chat completion
Creates an agent task session for memory chat completions.
// server route handler (e.g. POST /api/agent-task-session)
const secretKey = process.env.FENCY_SECRET_KEY
if (!secretKey) {
throw new Error('FENCY_SECRET_KEY is not defined.')
}
export async function POST() {
const response = await fetch('https://api.fency.ai/v1/sessions', {
method: 'POST',
headers: {
Authorization: `Bearer ${secretKey}`,
'Content-Type': 'application/json',
},
body: JSON.stringify({
createAgentTask: {
taskType: 'MEMORY_CHAT_COMPLETION',
guardRails: {
memoryTypes: [
{
memoryTypeId: 'mty_...',
memoryIds: ['mem_...'],
match: { organization_id: 'org_...' },
},
],
},
},
}),
})
const data = await response.json()
return new Response(JSON.stringify(data), {
status: response.status,
headers: { 'Content-Type': 'application/json' },
})
}
Guides and Examples
Step-by-step guides and standalone example repositories to help you build with Fency.ai.
Uploading files for memories
This guide walks through the full process of uploading a file to create a memory in Fency.ai. File memories can then be used as context in Memory Chat Completions.
The upload flow consists of three steps: creating an empty file memory, obtaining a presigned upload URL, and uploading the file directly to that URL. These calls are made from your server using your secret key.
Step 1 — Create a memory
Call POST /v1/memories with sourceType: "FILE" to create an empty memory record. This returns a memoryId you will use in the next step.
Step 2 — Create an upload for the memory
Call POST /v1/memories/:id/uploads with the file's fileName, fileSize, and mimeType. This returns a presigned S3 upload URL along with the required form fields.
Step 3 — Upload the file
Post a FormData payload containing all the returned S3 fields plus the file itself to the presigned uploadUrl.
// Step 1: Create a memory with sourceType FILE
const createMemoryResponse = await fetch('https://api.fency.ai/v1/memories', {
method: 'POST',
headers: {
Authorization: `Bearer ${process.env.FENCY_SECRET_KEY}`,
'Content-Type': 'application/json',
},
body: JSON.stringify({
memoryTypeId: 'mty_...',
sourceType: 'FILE',
title: 'My document',
}),
})
const { memoryId } = await createMemoryResponse.json()
// Step 2: Create an upload for the memory to get a presigned URL
const createUploadResponse = await fetch(
`https://api.fency.ai/v1/memories/${memoryId}/uploads`,
{
method: 'POST',
headers: {
Authorization: `Bearer ${process.env.FENCY_SECRET_KEY}`,
'Content-Type': 'application/json',
},
body: JSON.stringify({
fileName: file.name,
fileSize: file.size,
mimeType: file.type,
}),
}
)
const upload = await createUploadResponse.json()
// Step 3: Upload the file directly to the presigned URL
const formData = new FormData()
formData.append('key', upload.key)
formData.append('policy', upload.policy)
formData.append('x-amz-algorithm', upload.xAmzAlgorithm)
formData.append('x-amz-credential', upload.xAmzCredential)
formData.append('x-amz-date', upload.xAmzDate)
formData.append('x-amz-signature', upload.xAmzSignature)
formData.append('x-amz-security-token', upload.sessionToken)
formData.append('file', file)
await fetch(upload.uploadUrl, {
method: 'POST',
body: formData,
})
Knowing when the memory is ready
After the upload completes, Fency processes the file asynchronously. Once the memory is ready to use, a memory.updated webhook event is sent to your registered webhook endpoint. See the memory.updated event in the API reference for the full event payload.
For full details on the request parameters, see the API reference for Create memory by file and Create upload for memory.
Streaming chat completion example
An example application demonstrating how to build a streaming chat interface using Fency's Streaming Chat Completion task type. The AI streams responses token-by-token as they are generated, and conversation history is maintained across turns.
Structured chat completion example
An example application demonstrating how to get structured JSON responses from an LLM using Zod schemas and Fency's Structured Chat Completion task type. Each question is stateless and independent.
Memory chat completion example
An example application demonstrating how to build a document-aware chat interface using Fency's Memory Chat Completion task type. The AI answers questions grounded in a PDF contract loaded as a memory, with multi-turn conversation history maintained across turns.