Manage unstructured data in BigConnect

Unstructured data comes in many forms and formats, and is usually dispersed around the organization. Documents, images, presentations, spreadsheets, videos, audio files are various forms of unstructured information that can carry a lot of meaning:  obvious meaning but also hidden meaning. The obvious meaning is the information that you can infer by looking at a specific piece of content. Hidden meaning is all other info that you cannot see like How does a document correlate to another one ? What are similar documents talking about our last product release ? What is the tone of our customer reports ? What information do we have on a specific subject ?

Hidden information is usually more difficult to get and requires that people spend more time gathering it.

A first natural thought is to go ahead and index all your information in a Content Management System (CMS) to be able to search and quickly find what you want. The problem is that CMS were designed to organize your content and do basic search on them. CMS are not intelligent and don't learn from your content, hence they cannot infer new meaning and are limited to the information that is available to them.

What you would typically need is a system that can process your content while being loaded in the system and add information about how your new piece of content relates to all existing information in the repository. And this is where BigConnect comes in.

BigConnect has a content processing pipeline in its core engine. It can understand various document, image, video and audio formats and can also extract all possible information from them:

  • Text content from documents, images, video and audio files using various techniques
  • Metadata like author, date, device, location, resolution
  • Objects from images and video files

and then feed all this information to Machine Learning pipelines for adding more and more hidden information that is extremely valuable, like

  • Known Persons, Companies, Locations, Products etc. through Named Entity Recognition
  • How do Entities relate to each-other through Relationship Extraction (ex. John resigned from Apple)
  • The language in which the content was written through Language Recognition
  • The tone of text through Sentiment Analysis
  • The main topics covered in the content through Category Extraction
  • Content that is similar through Similarity Identification
  • People, brands, logos etc from visual content through Object Detection
  • Action intents for extracting mental states based on text (ex. secretive, relieved, nervous, sly, satisfied, excited)
  • etc.

All extracted information is added to the global BigConnect Knowledge Graph and can be used when performing a search. More, you can even find hidden patterns in your content using pattern matching techniques like Cypher or Gremlin, to answer questions like "Which are the top 10 people that responded in an inappropriate manner to our customer requests.".

Queries don't have to be limited to document contents, because BigConnect adds a bunch of extra information: persons, companies, sentiment, actions, mental states, relationships etc.

What you can do is only limited by imagination.

Back to Blog