Find relevant knowledge and discover new patterns

Groundbreaking unsupervised concept extraction

UNSILO extracts the most important semantic concepts from a document. Using Machine Learning technologies and Natural Language Processing, UNSILO understands the precise meaning of phrases within a document and automatically captures both semantic and syntactic variations. Our solution does not require any document metadata, and simply reads the full text to learn and comprehend the topics, things, and events that connect documents in a content collection. Automatically extracting semantic fingerprint of documents, our technology uses the latest advances in knowledge modeling and Deep Learning to understand the content itself. Our algorithms can also predict the usefulness of a freshly written manuscript, because it doesn’t relying on existing traffic metrics or external user data.

XAI Solutions That Work With (or Without) Existing Taxonomies and Ontologies

Most NLP platforms use either static dictionaries or brute force Machine Learning to understand what the key concepts in a text are. UNSILO can also import, leverage, and extend your existing taxonomies and ontologies, but more importantly, our technology can learn the key concepts and ideas in a completely unknown knowledge domain without human guidance or access to taxonomies or ontologies.

Legacy Text Analytics Providers tend to define their solutions in terms of the assets they produce, such as an updated taxonomy. By contrast, UNSILO is focussed on outcomes and value generation in your organisation; We work with your teams to leverage all your existing assets, and our Natural Language Understanding (NLU) can power state of the art Explainable AI solutions (XAI).

Explainable AI help users understand and trust automated solutions, and is a key requirement for Quality Assurance and continuous improvements. XAI presents the basis and justification behind automated suggestions and provides ways for the user to improve the performance of autonomous systems.

Use of Mathematical Knowledge to Understand Semantic Similarity

UNSILO constructs a mathematical knowledge model of semantic similarity that describes every Concept identified in a client’s corpus. Such vector models are high-dimensional continuous space representations of words and phrases based on their distributional properties in a large text corpus. At the most basic level, these models can be used to identify terms that are commonly used interchangeably, and therefore likely to be synonyms. But vector knowledge models can also be used for more advanced features like query expansion and disambiguation of entities and abbreviations.

For example, when a user searches for “project evaluation”, we can also show documents that mention “project appraisal”, “feasibility study”, or “cost-benefit analysis”. And we can use vector math to accurately determine when the letters “PCB” in a specific sentence refers to a circuit board, a jet engine, or an organic compound.

Most other solutions analyse each individual word in isolation, but UNSILO analyses phrases and understands complex linguistic features like prepositions, negations, uncertainties, and attributions. This approach is considered more accurate than generalized Deep Learning solutions that use simple word representations, such as the freely available word vectors from Google, Facebook, or Stanford University.

Advanced Named Entity Recognition (NER):

During concept linking, the UNSILO pipeline identifies references to existing ontology and taxonomy terms and resolves any ambiguous references to the most likely matching entity using a mathematical knowledge model.

UNSILO also uses Deep Learning to identify concepts of specific types, including unknown phrases that resemble existing known entities, and are used in a similar way. This is the same method that allow humans to deduce that Peroxytetrahydrofuran most likely is a chemical compound, just from reading the sentence “Peroxytetrahydrofuran was proposed to be the true oxidant”.

Using this approach, UNSILO’s AI-Supported Dynamic NER can identify novel concepts of any type, provided that we have access to many examples of other entities of the same type and sentences that exemplify how they are normally used. We have built high-accuracy Dynamic NER models that can detect previously unseen chemical entities, mathematical formulas, software code references, and names of people, places, and organizations.

Technology Vision

UNSILO is at the forefront of NLP and NLU research. On scientific content, we clearly outperform comparable enrichment services from Google, IBM, Microsoft, and Amazon, with more detailed concepts and much less noise. We are constantly investing in R&D, and regularly publish our contributions to the Open Source projects that we use and benefit from.

We are currently working several new key technologies, including unsupervised extraction of “facts” (Semantic Triples/concept relationships) from scientific corpora. We have best-in-class entity extraction, but there are many ways to describe the same facts, and normalization of relationships between entities is still an open problem. The goal is to automatically construct an ontology or other knowledge model from a natural language description of a limited domain, and perform automatic reasoning to answer any question that a human would be able to answer after reading the same text.

We are constantly improving and refining our leading technology for unsupervised concept extraction, increasing quality and precision, and pushing the limits of what automatic and unsupervised AI tools can do.

UNSILO outperforms Google, Amazon, IBM and Microsoft

UNSILO outperforms competing services from Google, Amazon, IBM and Microsoft, by extracting more precise concepts and much less noise.

More information, including the underlying dataset and the detailed evaluation performed, can be found in our white paper “Comparing UNSILO concept extraction to leading NLP cloud solutions”

F.A.Q.

FAQ v.1.0

A:  UNSILO requires no external list of terms to extract concepts. This is a major advance compared to many earlier machine-learning tools, which necessitated the creation of a list of subject terms before the system could index documents. However, if you have a taxonomy or ontology in a domain, UNSILO can use the terms in that list and identify them where they appear.

A: One definition of “Understanding” is to identify the named entities contained within a document, such as names of people, places, and things. However, this approach is less precise than UNSILO’s multiple-word phrases. For example, “Paris” the capitol of France, is not the same thing as “Paris Hilton”, “The Paris Accord”, or “Paris, Texas”, but most entity extraction tools do not disambiguate these entities. By contrast, UNSILO disambiguates entities with high accuracy, and captures unambiguous phrases rather than individual words, to provide a much more accurate indicator of meaning.

A: Depending on client system architecture and requirements, we usually receive and process new documents every couple of hours. However, the actual processing time is considerably less, and we can scale any solution to provide a faster turn-round, or even realtime response rates where speed to publication is essential.

A: UNSILO provides tools to combining automatic and human processing. For example, UNSILO Classify uses machine learning when classifying documents by subject to identify those documents that would benefit from manual curation. In this way we estimate we can reduce the time taken to manually create a topic collection by more than two-thirds. This means the content owner has both lower costs and improved quality.