Glean LangChain Integration
Glean's official LangChain integration enables you to build powerful AI agents that can search and reason over your organization's knowledge using Python and the LangChain framework.
langchain-glean
Official LangChain integration for Glean's search and chat capabilities
Installation
- pip
- poetry
- uv
pip install -U langchain-glean
poetry add langchain-glean
uv add langchain-glean
Configuration
API Tokens
You'll need Glean API credentials, and specifically a user-scoped API token. API Tokens require the following scopes: chat
, search
. You should speak to your Glean administrator to provision these tokens.
Configure Environment Variables
Configure your Glean credentials by setting the following environment variables:
export GLEAN_SUBDOMAIN="your-glean-subdomain"
export GLEAN_API_TOKEN="your-glean-api-token"
export GLEAN_ACT_AS="user@example.com" # Optional: Email to act as when making requests
Usage Examples
Using the Retriever
The GleanSearchRetriever
allows you to search and retrieve documents from Glean:
from langchain_glean.retrievers import GleanSearchRetriever
# Initialize the retriever (will use environment variables)
retriever = GleanSearchRetriever()
# Search for documents
documents = retriever.invoke("quarterly sales report")
# Process the results
for doc in documents:
print(f"Title: {doc.metadata.get('title')}")
print(f"URL: {doc.metadata.get('url')}")
print(f"Content: {doc.page_content}")
print("---")
Building an Agent with Tools
The GleanSearchTool
can be used in LangChain agents to search Glean:
from langchain_core.prompts import ChatPromptTemplate
from langchain_openai import ChatOpenAI
from langchain.agents import AgentExecutor, create_openai_tools_agent
from langchain_glean.retrievers import GleanSearchRetriever
from langchain_glean.tools import GleanSearchTool
# Initialize the retriever
retriever = GleanSearchRetriever()
# Create the tool
glean_tool = GleanSearchTool(
retriever=retriever,
name="glean_search",
description="Search for information in your organization's content using Glean."
)
# Create an agent with the tool
llm = ChatOpenAI(model="gpt-4o")
prompt = ChatPromptTemplate.from_messages([
("system", "You are a helpful assistant with access to Glean search."),
("user", "{input}")
])
agent = create_openai_tools_agent(llm, [glean_tool], prompt)
agent_executor = AgentExecutor(agent=agent, tools=[glean_tool])
# Run the agent
response = agent_executor.invoke({"input": "Find the latest quarterly report"})
print(response["output"])
RAG with LangChain Chains
You can integrate the retriever with LangChain chains for more complex workflows:
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_openai import ChatOpenAI
from langchain_glean.retrievers import GleanSearchRetriever
# Initialize the retriever
retriever = GleanSearchRetriever()
# Create a prompt template
prompt = ChatPromptTemplate.from_template(
"""Answer the question based only on the context provided.
Context: {context}
Question: {question}"""
)
# Initialize the language model
llm = ChatOpenAI(model="gpt-4o")
# Format documents function
def format_docs(docs):
return "\n\n".join(doc.page_content for doc in docs)
# Create the chain
chain = (
{"context": retriever | format_docs, "question": RunnablePassthrough()}
| prompt
| llm
| StrOutputParser()
)
# Run the chain
result = chain.invoke("What were our Q2 sales results?")
print(result)
Advanced Usage
Search Parameters
You can customize your search by passing additional parameters:
# Search with additional parameters
documents = retriever.invoke(
"quarterly sales report",
page_size=5, # Number of results to return
disable_spellcheck=True, # Disable spellcheck
max_snippet_size=200 # Maximum snippet size
)
Custom Retriever Configuration
Configure the retriever with custom settings:
# Initialize with custom settings
retriever = GleanSearchRetriever(
subdomain="your-subdomain", # Override environment variable
api_token="your-api-token", # Override environment variable
act_as="user@example.com", # Override environment variable
page_size=10, # Default number of results
max_snippet_size=300 # Default snippet size
)
For the complete API documentation and implementation details, visit the GitHub repository.