-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(community): Added Reddit integration with tool and document loader #7300
base: main
Are you sure you want to change the base?
feat(community): Added Reddit integration with tool and document loader #7300
Conversation
Added Reddit integration to LangchainJS to bring it closer to parity with LangchainPY as per discussion langchain-ai#7043
The latest updates on your projects. Learn more about Vercel for Git ↗︎
1 Skipped Deployment
|
import dotenv from "dotenv"; | ||
import { AsyncCaller } from "@langchain/core/utils/async_caller"; | ||
|
||
dotenv.config(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
import dotenv from "dotenv"; | |
import { AsyncCaller } from "@langchain/core/utils/async_caller"; | |
dotenv.config(); | |
import "dotenv/config"; | |
import { AsyncCaller } from "@langchain/core/utils/async_caller"; |
This could be changed to a side-effect import since the default configuration isn't being changed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MirrorLimit I actually don't see process.env
being used here at all, can this import be removed?
import dotenv from "dotenv"; | |
import { AsyncCaller } from "@langchain/core/utils/async_caller"; | |
dotenv.config(); | |
import { AsyncCaller } from "@langchain/core/utils/async_caller"; |
private async authenticate() { | ||
if (this.token) return; | ||
|
||
const authString = btoa(`${this.clientId}:${this.clientSecret}`); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MirrorLimit btoa
is usually a browser-specific function IIRC and isn't meant to be used in Node. Here is the proper way to create the base64-encoded string.
const authString = btoa(`${this.clientId}:${this.clientSecret}`); | |
const authString = Buffer.from(`${this.clientId}:${this.clientSecret}`).toString('base64'); |
if (!response.ok) { | ||
throw new Error( | ||
`Error authenticating with Reddit: ${response.statusText}` | ||
); | ||
} | ||
|
||
const data = await response.json(); | ||
this.token = data.access_token; | ||
} catch (error) { | ||
console.error("Error authenticating with Reddit:", error); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I am reading this correctly, if it encounters an issue, it will throw an error with the text
Error authenticating with Reddit: ${response.statusText}
but will then do console.error("Error authenticating with Reddit:", error);
, which will output something like:
Error authenticating with Reddit:
Error authenticating with Reddit: "invalid auth token"
I'd suggest just throwing the status text from Reddit instead and then logging out the authentication error in the catch instead.
if (!response.ok) { | |
throw new Error( | |
`Error authenticating with Reddit: ${response.statusText}` | |
); | |
} | |
const data = await response.json(); | |
this.token = data.access_token; | |
} catch (error) { | |
console.error("Error authenticating with Reddit:", error); | |
} | |
if (!response.ok) { | |
throw new Error(response.statusText); | |
} | |
const data = await response.json(); | |
this.token = data.access_token; | |
} catch (error) { | |
console.error("Error authenticating with Reddit:", error); | |
} |
): Promise<any> { | ||
await this.authenticate(); | ||
|
||
const url = new URL(`${this.baseUrl}${endpoint}`); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to use the URL constructor as intended
const url = new URL(`${this.baseUrl}${endpoint}`); | |
const url = new URL(endpoint, this.baseUrl); |
sort: "new", | ||
limit: 10, | ||
time: "all" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these meant to be defaults?
sort: "new", | |
limit: 10, | |
time: "all" | |
sort = "new", | |
limit = 10, | |
time = "all" |
expect.objectContaining({ | ||
method: "POST", | ||
headers: expect.objectContaining({ | ||
Authorization: expect.stringContaining("Basic"), // Checks if Basic auth is used |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MirrorLimit What are your thoughts on including the base64 string generation in the test where you set what the generated token should be and then verify that it was properly generated and used in the request?
|
||
/** | ||
* Class representing a document loader for loading Reddit posts. It extends | ||
* the BaseDocumentLoader and implements the RedditAPIConfig interface. | ||
* @example | ||
* ```typescript | ||
* const loader = new RedditPostsLoader({ | ||
* clientId: "REDDIT_CLIENT_ID", | ||
* clientSecret: "REDDIT_CLIENT_SECRET", | ||
* userAgent: "REDDIT_USER_AGENT", | ||
* searchQueries: ["LangChain", "Langchaindev"], | ||
* mode: "subreddit", | ||
* categories: ["hot", "new"], | ||
* numberPosts: 5 | ||
* }); | ||
* const docs = await loader.load(); | ||
* ``` | ||
*/ | ||
export class RedditPostsLoader | ||
extends BaseDocumentLoader | ||
implements RedditAPIConfig | ||
{ | ||
public clientId: string; | ||
|
||
public clientSecret: string; | ||
|
||
public userAgent: string; | ||
|
||
private redditApiWrapper: RedditAPIWrapper; | ||
|
||
private searchQueries: string[]; | ||
|
||
private mode: string; | ||
|
||
private categories: string[]; | ||
|
||
private numberPosts: number; | ||
|
||
constructor({ | ||
clientId = getEnvironmentVariable("REDDIT_CLIENT_ID") as string, | ||
clientSecret = getEnvironmentVariable("REDDIT_CLIENT_SECRET") as string, | ||
userAgent = getEnvironmentVariable("REDDIT_USER_AGENT") as string, | ||
searchQueries, | ||
mode, | ||
categories = ["new"], | ||
numberPosts = 10, | ||
}: RedditAPIConfig & { | ||
searchQueries: string[]; | ||
mode: string; | ||
categories?: string[]; | ||
numberPosts?: number; | ||
}) { | ||
super(); | ||
this.clientId = clientId; | ||
this.clientSecret = clientSecret; | ||
this.userAgent = userAgent; | ||
this.redditApiWrapper = new RedditAPIWrapper({ | ||
clientId: this.clientId, | ||
clientSecret: this.clientSecret, | ||
userAgent: this.userAgent, | ||
}); | ||
this.searchQueries = searchQueries; | ||
this.mode = mode; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be a lot nicer if the "mode" parameter was typed instead since different values cause errors to be thrown, like so:
/** | |
* Class representing a document loader for loading Reddit posts. It extends | |
* the BaseDocumentLoader and implements the RedditAPIConfig interface. | |
* @example | |
* ```typescript | |
* const loader = new RedditPostsLoader({ | |
* clientId: "REDDIT_CLIENT_ID", | |
* clientSecret: "REDDIT_CLIENT_SECRET", | |
* userAgent: "REDDIT_USER_AGENT", | |
* searchQueries: ["LangChain", "Langchaindev"], | |
* mode: "subreddit", | |
* categories: ["hot", "new"], | |
* numberPosts: 5 | |
* }); | |
* const docs = await loader.load(); | |
* ``` | |
*/ | |
export class RedditPostsLoader | |
extends BaseDocumentLoader | |
implements RedditAPIConfig | |
{ | |
public clientId: string; | |
public clientSecret: string; | |
public userAgent: string; | |
private redditApiWrapper: RedditAPIWrapper; | |
private searchQueries: string[]; | |
private mode: string; | |
private categories: string[]; | |
private numberPosts: number; | |
constructor({ | |
clientId = getEnvironmentVariable("REDDIT_CLIENT_ID") as string, | |
clientSecret = getEnvironmentVariable("REDDIT_CLIENT_SECRET") as string, | |
userAgent = getEnvironmentVariable("REDDIT_USER_AGENT") as string, | |
searchQueries, | |
mode, | |
categories = ["new"], | |
numberPosts = 10, | |
}: RedditAPIConfig & { | |
searchQueries: string[]; | |
mode: string; | |
categories?: string[]; | |
numberPosts?: number; | |
}) { | |
super(); | |
this.clientId = clientId; | |
this.clientSecret = clientSecret; | |
this.userAgent = userAgent; | |
this.redditApiWrapper = new RedditAPIWrapper({ | |
clientId: this.clientId, | |
clientSecret: this.clientSecret, | |
userAgent: this.userAgent, | |
}); | |
this.searchQueries = searchQueries; | |
this.mode = mode; | |
export type SearchMode = "subreddit" | "username"; | |
/** | |
* Class representing a document loader for loading Reddit posts. It extends | |
* the BaseDocumentLoader and implements the RedditAPIConfig interface. | |
* @example | |
* ```typescript | |
* const loader = new RedditPostsLoader({ | |
* clientId: "REDDIT_CLIENT_ID", | |
* clientSecret: "REDDIT_CLIENT_SECRET", | |
* userAgent: "REDDIT_USER_AGENT", | |
* searchQueries: ["LangChain", "Langchaindev"], | |
* mode: "subreddit", | |
* categories: ["hot", "new"], | |
* numberPosts: 5 | |
* }); | |
* const docs = await loader.load(); | |
* ``` | |
*/ | |
export class RedditPostsLoader | |
extends BaseDocumentLoader | |
implements RedditAPIConfig | |
{ | |
public clientId: string; | |
public clientSecret: string; | |
public userAgent: string; | |
private redditApiWrapper: RedditAPIWrapper; | |
private searchQueries: string[]; | |
private mode: SearchMode; | |
private categories: string[]; | |
private numberPosts: number; | |
constructor({ | |
clientId = getEnvironmentVariable("REDDIT_CLIENT_ID") as string, | |
clientSecret = getEnvironmentVariable("REDDIT_CLIENT_SECRET") as string, | |
userAgent = getEnvironmentVariable("REDDIT_USER_AGENT") as string, | |
searchQueries, | |
mode, | |
categories = ["new"], | |
numberPosts = 10, | |
}: RedditAPIConfig & { | |
searchQueries: string[]; | |
mode: SearchMode; | |
categories?: string[]; | |
numberPosts?: number; | |
}) { | |
super(); | |
this.clientId = clientId; | |
this.clientSecret = clientSecret; | |
this.userAgent = userAgent; | |
this.redditApiWrapper = new RedditAPIWrapper({ | |
clientId: this.clientId, | |
clientSecret: this.clientSecret, | |
userAgent: this.userAgent, | |
}); | |
this.searchQueries = searchQueries; | |
this.mode = mode; |
if (this.mode === "subreddit") { | ||
posts = await this.redditApiWrapper.searchSubreddit( | ||
query, | ||
"*", | ||
category, | ||
this.numberPosts | ||
); | ||
} else if (this.mode === "username") { | ||
posts = await this.redditApiWrapper.fetchUserPosts( | ||
query, | ||
category, | ||
this.numberPosts | ||
); | ||
} else { | ||
throw new Error( | ||
"Invalid mode: please choose 'subreddit' or 'username'" | ||
); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be a switch statement, which is technically faster and more consistent with what is used elsewhere in the library.
if (this.mode === "subreddit") { | |
posts = await this.redditApiWrapper.searchSubreddit( | |
query, | |
"*", | |
category, | |
this.numberPosts | |
); | |
} else if (this.mode === "username") { | |
posts = await this.redditApiWrapper.fetchUserPosts( | |
query, | |
category, | |
this.numberPosts | |
); | |
} else { | |
throw new Error( | |
"Invalid mode: please choose 'subreddit' or 'username'" | |
); | |
} | |
switch (this.mode) { | |
case "subreddit": | |
posts = await this.redditApiWrapper.searchSubreddit( | |
query, | |
"*", | |
category, | |
this.numberPosts | |
); | |
break; | |
case "username": | |
posts = await this.redditApiWrapper.fetchUserPosts( | |
query, | |
category, | |
this.numberPosts | |
); | |
break; | |
default: | |
throw new Error( | |
"Invalid mode: please choose 'subreddit' or 'username'" | |
); | |
} |
@@ -0,0 +1,123 @@ | |||
import { getEnvironmentVariable } from "@langchain/core/utils/env"; //"../../../../langchain-core/src/utils/env.js"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed unused comment
import { getEnvironmentVariable } from "@langchain/core/utils/env"; //"../../../../langchain-core/src/utils/env.js"; | |
import { getEnvironmentVariable } from "@langchain/core/utils/env"; |
const apiWrapper = new RedditAPIWrapper({ | ||
clientId: this.clientId, | ||
clientSecret: this.clientSecret, | ||
userAgent: this.userAgent, | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you make the API wrapper client a property on the class so it doesn't need to re-auth every time and can re-use the existing client?
Here's an example of how it was implemented in the Discord tool:
langchainjs/libs/langchain-community/src/tools/discord.ts
Lines 63 to 84 in 53feade
protected client: Client; | |
constructor(fields?: DiscordGetMessagesToolParams) { | |
super(); | |
const { | |
botToken = getEnvironmentVariable("DISCORD_BOT_TOKEN"), | |
messageLimit = 10, | |
client, | |
} = fields ?? {}; | |
if (!botToken) { | |
throw new Error( | |
"Environment variable DISCORD_BOT_TOKEN missing, but is required for DiscordGetMessagesTool." | |
); | |
} | |
this.client = | |
client ?? | |
new Client({ | |
intents: [GatewayIntentBits.Guilds, GatewayIntentBits.GuildMessages], | |
}); |
//import { Document } from "@langchain/core/documents"; | ||
//import { RedditPostsLoader } from "../web/reddit.js"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed unused comments
//import { Document } from "@langchain/core/documents"; | |
//import { RedditPostsLoader } from "../web/reddit.js"; |
Adds feature as per discussion #7043
Description:
The JavaScript version of LangChain currently lacks several features available in the Python version. This PR aims to bridge that gap by adding a Reddit integration with the following features:
Made together with @HyphenHook, @baoyng, and @erinL168