The open set of NLP tasks in healthcare domain

Successful development and application of deep learning methods to new fields of knowledge are impossible without using specialized benchmarks and data.

The shortage of such resources is particularly pervasive in highly regulated subject areas. A prime example is the field of automatic natural language processing (NLP) in medicine. This problem, in particular, is also relevant for the Russian language, as the open medical data sets and machine learning problem definitions are extremely limited.

We offer an open Russian-language NLP benchmark that allows testing language models in a wide range of medical tasks.


RuMedDaNet

Task goal

Measure the model's ability to "understand" the medical text and correctly answer clarifying questions.

Task description

A true medical AI model must have a comprehensive knowledge and "understanding" of various areas related to health. Such abilities can be partly tested by assessing the model's answers to context-sensitive questions. A task example consists of a context and an associated binary question. The model's goal is to correctly answer the question, either yes or no. Contexts are collected from a wide range of medicine-related fields: internal medicine, human physiology and anatomy, pharmacology, biochemistry, etc. The questions are generated and marked by assessors.

Metrics

Accuracy


RuMedNLI

Task goal

Define the logical relationship type between two natural language texts.

Task description

Logical inference (Natural Language Inference) allows to test the model ability making decisions based on medical records, taking into account linguistic nuances: paraphrase, phraseological units, abbreviations, etc.

Each example is defined by a pair of input texts, the first of them contains the initial statement (a fragment from the patient's medical history), and the second is the hypothesis to be tested. There are three possible outcomes of this output:

  • - the hypothesis is true, i.e. follows logically from the initial statement;

  • - the hypothesis is neutral, on the basis of the proposed data it is impossible to draw an unambiguous conclusion;

  • - the hypothesis clearly contradicts the initial statement.

Metrics

Accuracy


RuMedTest

Task goal

Checking the "knowledge" of the model within the specialty "General Medical Practice".

Task description

Obviously, the model of medical AI must be able to use the techniques of clinical thinking and have in-depth knowledge within the framework of the basic disciplines of leading medical universities.

Unlike the previous ones, this task contains only the test part, without training and validation. In this formulation, the task can be used to test the ability of a large language model to complete a task with a minimal set of training examples (zero- or few-shot learning).

Each task consists of a question and 4 possible answers, only one is correct.

Metrics

Accuracy


Leaderboard

Rank

Team

Model

RuMedDaNet

RuMedNLI

RuMedTest

Upload date

No list result

Supported by:

Logo Sber
Logo Sber