Liberating Plaud.ai: Automating Your Audio with n8n and Gemini
Plaud's hardware is brilliant but their closed app ecosystem and pricey subscription are a total drag. Here’s a crude but highly effective n8n workflow to liberate your audio recordings and feed them directly into the latest Gemini models, giving you total control over your own data.
Plaud's closed app ecosystem and pricey subscription are a total drag, even if the tech itself is top-tier. Here’s a crude but highly effective n8n workflow (along with a custom Python library) to liberate your audio recordings and feed them directly into the latest Gemini models, giving you total control over your own data. (Plaud, if you're reading this, please don't ban my account!)
Let's be brutally honest for a second: the physical device is actually fantastic. It's literally credit-card sized, barely thicker than a few IDs, and slips right into your wallet so it's always there when you need to record on the fly. Depending on the model, it also packs up to four microphones, which is killer for beamforming—you get incredibly clear voice capture even in noisy environments. (Standard disclaimer: Don't be a creep. Always get permission before recording people!)
But their app? It leaves a lot to be desired. The speaker diarization and recognition perform about as poorly as everywhere else. What's more frustrating is that they use Gemini under the hood anyway. Ultimately, you are much better off routing your raw audio files into Gemini yourself rather than paying their monthly subscription fee for a walled-garden experience.
The biggest issue with Plaud is that it exists in complete isolation from the rest of the productivity ecosystem. It's super annoying. To get around this, I’ve hacked together some JavaScript that runs inside n8n. This acts as a controller to fetch your files, allowing you to do whatever the fuck you want with your audio—store it in a custom library, transcribe it, format it, or build automated action items.
The Code/CLI Alternative: Python Library
If you prefer dropping into a terminal over dragging nodes around in n8n, I've packaged the API interactions into a Python library. You can grab it on my forge. It lets you fetch and process your files programmatically from your own scripts or from CLI.
A Quick Disclaimer on Credentials
If you do use the n8n route, a quick warning: this method involves storing your Plaud password in plain text within an n8n Set node so the custom code can use it. It's not ideal, but because Plaud's authentication isn't standard, n8n's native credential manager won't easily expose what we need to our custom JS block. Make sure your n8n instance is locked down and secure.
The Workflow Architecture
Here is how the data flows through the pipeline:
graph TD
A[Schedule Trigger] -->|Every 1 hour| B[Set Credentials Node]
B -->|User & Pass| C[Code Node: Plaud Controller]
C -->|Login, Filter, Get S3 URLs| D[HTTP Request Node]
D -->|Download S3 File| E[Code Node: Fix OGG MIME Type]
E -->|Clean Audio Binary| F[Google Gemini Node]
Step 1: Set Credentials
Add a Set (or Edit Fields) node right after your Schedule Trigger.
- Field 1:
username(your Plaud email) - Field 2:
password(your Plaud password)
Step 2: The Plaud Controller
Add a Code node, set the mode to "Run Once for All Items", and paste this exact code. This handles the login, fetches your file list, filters out what you've already processed, and grabs the temporary download URLs.
// --- Configuration ---
const { username, password } = $input.first().json;
const baseUrl = "https://api-euc1.plaud.ai";
const deviceId = Math.random().toString(16).slice(2, 18);
// 1. Initialize Deduplication (n8n internal memory)
const staticData = $getWorkflowStaticData('global');
staticData.processedIds = staticData.processedIds || [];
// Helper for Plaud API requests
async function plaudReq(method, endpoint, body = null, headers = {}, isForm = false) {
const options = {
method,
url: `${baseUrl}${endpoint}`,
headers: {
"accept": "application/json",
"x-device-id": deviceId,
"x-pld-tag": deviceId,
...headers
},
json: !isForm
};
if (body) {
options.body = body;
if (isForm) options.headers['Content-Type'] = 'application/x-www-form-urlencoded';
}
return await this.helpers.httpRequest(options);
}
// 2. Login
const form = `username=${encodeURIComponent(username)}&password=${encodeURIComponent(password)}&client_id=web&password_encrypted=false`;
const { access_token } = await plaudReq('POST', '/auth/access-token', form, {}, true);
const authHeader = { "authorization": `Bearer ${access_token}` };
// 3. Get User ID
const { data_user } = await plaudReq('GET', '/user/me', null, authHeader);
const userIdHeader = { ...authHeader, "x-pld-user": data_user.id.toString() };
// 4. List Latest Conversations
const query = 'skip=0&limit=10&is_trash=2&sort_by=start_time&is_desc=true';
const { data_file_list } = await plaudReq('GET', `/file/simple/web?${query}`, null, userIdHeader);
// 5. Filter New Files & Get S3 URLs
const newFiles = [];
for (const file of data_file_list) {
if (!staticData.processedIds.includes(file.id)) {
// Fetch the temporary S3 URL for this new file
const { temp_url } = await plaudReq('GET', `/file/temp-url/${file.id}`, null, userIdHeader);
newFiles.push({
...file,
temp_url
});
// Mark as processed
staticData.processedIds.push(file.id);
}
}
// Keep the internal memory clean (last 500 IDs)
if (staticData.processedIds.length > 500) staticData.processedIds.shift();
return newFiles.map(f => ({ json: f }));
Step 3: Download Audio
Add an HTTP Request node connected directly to the Code node.
- Method:
GET - URL:
={{ $json.temp_url }} - Response Format:
File - Binary Property:
data
Step 3.5: Fix the Audio MIME Type
Because AWS S3 returns these files as a generic binary/octet-stream, Gemini will reject them unless we explicitly tell n8n that it's an audio file. Add another Code node, set the mode to "Run Once for Each Item", and paste this short snippet:
// Overwrite the generic binary type with the correct audio MIME type
if ($input.item.binary && $input.item.binary.data) {
$input.item.binary.data.mimeType = 'audio/ogg';
$input.item.binary.data.fileName = 'audio.ogg';
}
return $input.item;
Step 4: Process with Gemini
Now for the magic. Add a Google Gemini node. You need to use a current, valid model string from the API (my previous suggestions were placeholders).
- Resource:
Model - Operation:
Generate Text - Model: Use
gemini-3.1-pro-preview(for heavy reasoning and complex extraction) orgemini-3-flash-preview(for fast, cheap processing). - Input Data: Enable Binary Data and set the property to
data. - Prompt: "Please summarize this meeting audio, identify the key themes, and extract a bulleted list of the main action items."
Important Note on Deduplication!
The "Static Data" memory used inside the controller Code node (Step 2) to remember which files have already been downloaded only persists between production runs.
When you click "Test Workflow" manually in the n8n editor, n8n will often clear this memory, causing it to pull old files again. Once you toggle the workflow to Active, it will remember the processed IDs permanently and only grab brand new recordings.