Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(community): Added Reddit integration with tool and document loader #7300

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

MirrorLimit
Copy link

Adds feature as per discussion #7043

Description:
The JavaScript version of LangChain currently lacks several features available in the Python version. This PR aims to bridge that gap by adding a Reddit integration with the following features:

  • Reddit tool for searching posts from a subreddit or from a Reddit user
  • Reddit document loader that loads Reddit posts as documents from a subreddit or from a Reddit user and formats them into a structured document format
  • API wrapper for the Reddit integration to communicate to Reddit with

Made together with @HyphenHook, @baoyng, and @erinL168

Added Reddit integration to LangchainJS to bring it closer to parity with LangchainPY as per discussion langchain-ai#7043
@dosubot dosubot bot added the size:XL This PR changes 500-999 lines, ignoring generated files. label Nov 30, 2024
Copy link

vercel bot commented Nov 30, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
langchainjs-docs ✅ Ready (Inspect) Visit Preview Nov 30, 2024 7:34pm
1 Skipped Deployment
Name Status Preview Comments Updated (UTC)
langchainjs-api-refs ⬜️ Ignored (Inspect) Nov 30, 2024 7:34pm

@dosubot dosubot bot added the auto:enhancement A large net-new component, integration, or chain. Use sparingly. The largest features label Nov 30, 2024
Comment on lines +1 to +4
import dotenv from "dotenv";
import { AsyncCaller } from "@langchain/core/utils/async_caller";

dotenv.config();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
import dotenv from "dotenv";
import { AsyncCaller } from "@langchain/core/utils/async_caller";
dotenv.config();
import "dotenv/config";
import { AsyncCaller } from "@langchain/core/utils/async_caller";

This could be changed to a side-effect import since the default configuration isn't being changed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MirrorLimit I actually don't see process.env being used here at all, can this import be removed?

Suggested change
import dotenv from "dotenv";
import { AsyncCaller } from "@langchain/core/utils/async_caller";
dotenv.config();
import { AsyncCaller } from "@langchain/core/utils/async_caller";

private async authenticate() {
if (this.token) return;

const authString = btoa(`${this.clientId}:${this.clientSecret}`);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MirrorLimit btoa is usually a browser-specific function IIRC and isn't meant to be used in Node. Here is the proper way to create the base64-encoded string.

Suggested change
const authString = btoa(`${this.clientId}:${this.clientSecret}`);
const authString = Buffer.from(`${this.clientId}:${this.clientSecret}`).toString('base64');

Comment on lines +67 to +77
if (!response.ok) {
throw new Error(
`Error authenticating with Reddit: ${response.statusText}`
);
}

const data = await response.json();
this.token = data.access_token;
} catch (error) {
console.error("Error authenticating with Reddit:", error);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I am reading this correctly, if it encounters an issue, it will throw an error with the text

Error authenticating with Reddit: ${response.statusText}

but will then do console.error("Error authenticating with Reddit:", error);, which will output something like:

Error authenticating with Reddit:
  Error authenticating with Reddit: "invalid auth token"

I'd suggest just throwing the status text from Reddit instead and then logging out the authentication error in the catch instead.

Suggested change
if (!response.ok) {
throw new Error(
`Error authenticating with Reddit: ${response.statusText}`
);
}
const data = await response.json();
this.token = data.access_token;
} catch (error) {
console.error("Error authenticating with Reddit:", error);
}
if (!response.ok) {
throw new Error(response.statusText);
}
const data = await response.json();
this.token = data.access_token;
} catch (error) {
console.error("Error authenticating with Reddit:", error);
}

): Promise<any> {
await this.authenticate();

const url = new URL(`${this.baseUrl}${endpoint}`);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to use the URL constructor as intended

Suggested change
const url = new URL(`${this.baseUrl}${endpoint}`);
const url = new URL(endpoint, this.baseUrl);

Comment on lines +116 to +118
sort: "new",
limit: 10,
time: "all"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these meant to be defaults?

Suggested change
sort: "new",
limit: 10,
time: "all"
sort = "new",
limit = 10,
time = "all"

expect.objectContaining({
method: "POST",
headers: expect.objectContaining({
Authorization: expect.stringContaining("Basic"), // Checks if Basic auth is used
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MirrorLimit What are your thoughts on including the base64 string generation in the test where you set what the generated token should be and then verify that it was properly generated and used in the request?

Comment on lines +9 to +71

/**
* Class representing a document loader for loading Reddit posts. It extends
* the BaseDocumentLoader and implements the RedditAPIConfig interface.
* @example
* ```typescript
* const loader = new RedditPostsLoader({
* clientId: "REDDIT_CLIENT_ID",
* clientSecret: "REDDIT_CLIENT_SECRET",
* userAgent: "REDDIT_USER_AGENT",
* searchQueries: ["LangChain", "Langchaindev"],
* mode: "subreddit",
* categories: ["hot", "new"],
* numberPosts: 5
* });
* const docs = await loader.load();
* ```
*/
export class RedditPostsLoader
extends BaseDocumentLoader
implements RedditAPIConfig
{
public clientId: string;

public clientSecret: string;

public userAgent: string;

private redditApiWrapper: RedditAPIWrapper;

private searchQueries: string[];

private mode: string;

private categories: string[];

private numberPosts: number;

constructor({
clientId = getEnvironmentVariable("REDDIT_CLIENT_ID") as string,
clientSecret = getEnvironmentVariable("REDDIT_CLIENT_SECRET") as string,
userAgent = getEnvironmentVariable("REDDIT_USER_AGENT") as string,
searchQueries,
mode,
categories = ["new"],
numberPosts = 10,
}: RedditAPIConfig & {
searchQueries: string[];
mode: string;
categories?: string[];
numberPosts?: number;
}) {
super();
this.clientId = clientId;
this.clientSecret = clientSecret;
this.userAgent = userAgent;
this.redditApiWrapper = new RedditAPIWrapper({
clientId: this.clientId,
clientSecret: this.clientSecret,
userAgent: this.userAgent,
});
this.searchQueries = searchQueries;
this.mode = mode;
Copy link
Contributor

@nick-w-nick nick-w-nick Dec 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be a lot nicer if the "mode" parameter was typed instead since different values cause errors to be thrown, like so:

Suggested change
/**
* Class representing a document loader for loading Reddit posts. It extends
* the BaseDocumentLoader and implements the RedditAPIConfig interface.
* @example
* ```typescript
* const loader = new RedditPostsLoader({
* clientId: "REDDIT_CLIENT_ID",
* clientSecret: "REDDIT_CLIENT_SECRET",
* userAgent: "REDDIT_USER_AGENT",
* searchQueries: ["LangChain", "Langchaindev"],
* mode: "subreddit",
* categories: ["hot", "new"],
* numberPosts: 5
* });
* const docs = await loader.load();
* ```
*/
export class RedditPostsLoader
extends BaseDocumentLoader
implements RedditAPIConfig
{
public clientId: string;
public clientSecret: string;
public userAgent: string;
private redditApiWrapper: RedditAPIWrapper;
private searchQueries: string[];
private mode: string;
private categories: string[];
private numberPosts: number;
constructor({
clientId = getEnvironmentVariable("REDDIT_CLIENT_ID") as string,
clientSecret = getEnvironmentVariable("REDDIT_CLIENT_SECRET") as string,
userAgent = getEnvironmentVariable("REDDIT_USER_AGENT") as string,
searchQueries,
mode,
categories = ["new"],
numberPosts = 10,
}: RedditAPIConfig & {
searchQueries: string[];
mode: string;
categories?: string[];
numberPosts?: number;
}) {
super();
this.clientId = clientId;
this.clientSecret = clientSecret;
this.userAgent = userAgent;
this.redditApiWrapper = new RedditAPIWrapper({
clientId: this.clientId,
clientSecret: this.clientSecret,
userAgent: this.userAgent,
});
this.searchQueries = searchQueries;
this.mode = mode;
export type SearchMode = "subreddit" | "username";
/**
* Class representing a document loader for loading Reddit posts. It extends
* the BaseDocumentLoader and implements the RedditAPIConfig interface.
* @example
* ```typescript
* const loader = new RedditPostsLoader({
* clientId: "REDDIT_CLIENT_ID",
* clientSecret: "REDDIT_CLIENT_SECRET",
* userAgent: "REDDIT_USER_AGENT",
* searchQueries: ["LangChain", "Langchaindev"],
* mode: "subreddit",
* categories: ["hot", "new"],
* numberPosts: 5
* });
* const docs = await loader.load();
* ```
*/
export class RedditPostsLoader
extends BaseDocumentLoader
implements RedditAPIConfig
{
public clientId: string;
public clientSecret: string;
public userAgent: string;
private redditApiWrapper: RedditAPIWrapper;
private searchQueries: string[];
private mode: SearchMode;
private categories: string[];
private numberPosts: number;
constructor({
clientId = getEnvironmentVariable("REDDIT_CLIENT_ID") as string,
clientSecret = getEnvironmentVariable("REDDIT_CLIENT_SECRET") as string,
userAgent = getEnvironmentVariable("REDDIT_USER_AGENT") as string,
searchQueries,
mode,
categories = ["new"],
numberPosts = 10,
}: RedditAPIConfig & {
searchQueries: string[];
mode: SearchMode;
categories?: string[];
numberPosts?: number;
}) {
super();
this.clientId = clientId;
this.clientSecret = clientSecret;
this.userAgent = userAgent;
this.redditApiWrapper = new RedditAPIWrapper({
clientId: this.clientId,
clientSecret: this.clientSecret,
userAgent: this.userAgent,
});
this.searchQueries = searchQueries;
this.mode = mode;

Comment on lines +88 to +105
if (this.mode === "subreddit") {
posts = await this.redditApiWrapper.searchSubreddit(
query,
"*",
category,
this.numberPosts
);
} else if (this.mode === "username") {
posts = await this.redditApiWrapper.fetchUserPosts(
query,
category,
this.numberPosts
);
} else {
throw new Error(
"Invalid mode: please choose 'subreddit' or 'username'"
);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be a switch statement, which is technically faster and more consistent with what is used elsewhere in the library.

Suggested change
if (this.mode === "subreddit") {
posts = await this.redditApiWrapper.searchSubreddit(
query,
"*",
category,
this.numberPosts
);
} else if (this.mode === "username") {
posts = await this.redditApiWrapper.fetchUserPosts(
query,
category,
this.numberPosts
);
} else {
throw new Error(
"Invalid mode: please choose 'subreddit' or 'username'"
);
}
switch (this.mode) {
case "subreddit":
posts = await this.redditApiWrapper.searchSubreddit(
query,
"*",
category,
this.numberPosts
);
break;
case "username":
posts = await this.redditApiWrapper.fetchUserPosts(
query,
category,
this.numberPosts
);
break;
default:
throw new Error(
"Invalid mode: please choose 'subreddit' or 'username'"
);
}

@@ -0,0 +1,123 @@
import { getEnvironmentVariable } from "@langchain/core/utils/env"; //"../../../../langchain-core/src/utils/env.js";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed unused comment

Suggested change
import { getEnvironmentVariable } from "@langchain/core/utils/env"; //"../../../../langchain-core/src/utils/env.js";
import { getEnvironmentVariable } from "@langchain/core/utils/env";

Comment on lines +86 to +90
const apiWrapper = new RedditAPIWrapper({
clientId: this.clientId,
clientSecret: this.clientSecret,
userAgent: this.userAgent,
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you make the API wrapper client a property on the class so it doesn't need to re-auth every time and can re-use the existing client?

Here's an example of how it was implemented in the Discord tool:

protected client: Client;
constructor(fields?: DiscordGetMessagesToolParams) {
super();
const {
botToken = getEnvironmentVariable("DISCORD_BOT_TOKEN"),
messageLimit = 10,
client,
} = fields ?? {};
if (!botToken) {
throw new Error(
"Environment variable DISCORD_BOT_TOKEN missing, but is required for DiscordGetMessagesTool."
);
}
this.client =
client ??
new Client({
intents: [GatewayIntentBits.Guilds, GatewayIntentBits.GuildMessages],
});

Comment on lines +2 to +3
//import { Document } from "@langchain/core/documents";
//import { RedditPostsLoader } from "../web/reddit.js";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed unused comments

Suggested change
//import { Document } from "@langchain/core/documents";
//import { RedditPostsLoader } from "../web/reddit.js";

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto:enhancement A large net-new component, integration, or chain. Use sparingly. The largest features size:XL This PR changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants