Knowledge Graphs
Interactive network visualization of research relationships using clustering, PCA, and trend analytics.
Knowledge Graphs
Knowledge Graphs provide interactive visual maps of your research landscape. They reveal connections between papers, authors, and topics that are difficult to see in a traditional list or table view. Use them to discover hidden relationships, identify research clusters, and track how topics evolve over time.
What a Knowledge Graph Shows
A knowledge graph is a network of nodes connected by edges:
Node types:
- Papers -- Individual publications from your library. Size often reflects citation count or relevance.
- Authors -- Researchers associated with your documents. Connections show co-authorship relationships.
- Topics -- Research themes and concepts extracted from your documents. These reveal thematic structure.
Edges represent relationships between nodes. A paper-to-author edge means the author wrote that paper. A paper-to-paper edge indicates shared references, similar content, or co-citation. A topic-to-paper edge shows that the paper covers that topic.
Creating a Knowledge Graph
- Navigate to the knowledge graph feature.
- Provide a name and optional description for your graph.
- Configure the graph parameters:
- Node types to include: Choose whether to include papers, authors, topics, or any combination.
- Similarity threshold: Set the minimum similarity score for creating edges between papers. Higher thresholds produce sparser, more focused graphs.
- Maximum nodes: Limit the total number of nodes to keep the visualization readable.
- Layout: Choose the spatial arrangement algorithm for positioning nodes.
- Click generate. The system analyzes your documents using vector embeddings and builds the graph.
Clustering Algorithms
Knowledge Graphs use machine learning to group related items into clusters:
KMeans clustering -- Divides nodes into a specified number of groups based on content similarity. Each cluster represents a thematic area in your research. This method works well when you have a rough idea of how many distinct topics exist in your corpus.
DBSCAN clustering -- Identifies clusters based on density rather than a predetermined number. Nodes that are closely grouped form clusters, while isolated nodes are flagged as outliers. This method is useful for discovering unexpected groupings or identifying papers that do not fit neatly into any category.
Both algorithms operate on the vector embeddings of your documents, so clustering reflects semantic similarity rather than surface-level keyword matching.
PCA Dimensionality Reduction
The system uses Principal Component Analysis (PCA) to reduce high-dimensional embedding vectors into two or three dimensions for visualization. This means that the spatial position of nodes on screen reflects their content similarity -- papers near each other on the graph cover related topics, even if they use different terminology.
Physics Simulation
Knowledge graphs use a physics-based simulation to position nodes. Nodes repel each other to prevent overlap, while edges act as springs pulling connected nodes together. The result is an organic layout where clusters of related items naturally group together.
You can interact with the simulation:
- Drag nodes to rearrange the layout manually.
- Zoom in and out to explore different regions.
- Click a node to see its details and connections.
Trend Analytics
Knowledge Graphs include a trend analytics component that tracks how topics and clusters evolve over time. This feature analyzes publication dates and topic associations to show:
- Emerging topics -- Themes that appear with increasing frequency in recent publications.
- Declining topics -- Areas that were once active but have seen reduced publication activity.
- Stable topics -- Consistently active research areas.
Trend data is presented as time series data points, allowing you to see the trajectory of each research topic.
Tips
- Start with a lower similarity threshold and increase it if the graph is too dense. A threshold that is too high may hide meaningful connections.
- Use DBSCAN when you do not know how many clusters to expect. The outlier detection is particularly useful for identifying interdisciplinary papers that bridge multiple fields.
- Combine Knowledge Graphs with Multi-Document Chat. After visualizing your research landscape, use chat to ask deeper questions about the clusters or connections you discover.
- Generate graphs at different scopes (project, lab, global) to see how your research structure changes at different levels of aggregation.