The open set of tasks in healthcare domain

Successful development and application of deep learning methods to new fields of knowledge are impossible without using specialized benchmarks and data.

The shortage of such resources is particularly pervasive in highly regulated subject areas. A prime example is the field of automatic natural language processing (NLP) in medicine. This problem, in particular, is also relevant for the Russian language, as the open medical data sets and machine learning problem definitions are extremely limited.

We offer an open benchmark that allows testing ML-models in a wide range of medical tasks.


RuMedDaNet

Task goal

Measure the model's ability to "understand" the medical text and correctly answer clarifying questions.

Task description

A true medical AI model must have a comprehensive knowledge and "understanding" of various areas related to health. Such abilities can be partly tested by assessing the model's answers to context-sensitive questions. A task example consists of a context and an associated binary question. The model's goal is to correctly answer the question, either yes or no. Contexts are collected from a wide range of medicine-related fields: internal medicine, human physiology and anatomy, pharmacology, biochemistry, etc. The questions are generated and marked by assessors.

Metrics

Accuracy


RuMedNLI

Task goal

Define the logical relationship type between two natural language texts.

Task description

Logical inference (Natural Language Inference) allows to test the model ability making decisions based on medical records, taking into account linguistic nuances: paraphrase, phraseological units, abbreviations, etc.

Each example is defined by a pair of input texts, the first of them contains the initial statement (a fragment from the patient's medical history), and the second is the hypothesis to be tested. There are three possible outcomes of this output:

  • - the hypothesis is true, i.e. follows logically from the initial statement;

  • - the hypothesis is neutral, on the basis of the proposed data it is impossible to draw an unambiguous conclusion;

  • - the hypothesis clearly contradicts the initial statement.

Metrics

Accuracy


RuMedTest

Task goal

Checking the "knowledge" of the model within the specialty "General Medical Practice".

Task description

Obviously, the model of medical AI must be able to use the techniques of clinical thinking and have in-depth knowledge within the framework of the basic disciplines of leading medical universities.

Unlike the previous ones, this task contains only the test part, without training and validation. In this formulation, the task can be used to test the ability of a large language model to complete a task with a minimal set of training examples (zero- or few-shot learning).

Each task consists of a question and 4 possible answers, only one is correct.

Metrics

Accuracy


ECG2Pathology

Task goal

Estimate the quality of multilabel ECG signal classification.

Task description

For over 100 years, electrocardiography has been a reliable tool for diagnosing severe heart disease. With this method, the electrical impulses of the heart are recorded. The cardiac activity measurement is usually done through 12 standard leads (channels), and the resulting presentation data is displayed on an electrocardiogram (ECG). Analysis and detection of cardiac pathologies in such signals require detailed work and the attention of highly qualified cardiologists. However, machine learning methods for analyzing ECG signals have great potential for faster and better detecting of possible heart diseases.

The data for the task are ECG signals from the open PTB-XL dataset. Signal labeling was performed according to the thesaurus of diagnostic conclusions by three cardiologists and verified by the moderator. For each test sample, a model has to predict a list from 73 possible thesaurus elements (cardiac pathologies or service classes).

Metrics

F1-score (macro)


Leaderboard

Rank

Team

Model

RuMedDaNet

RuMedNLI

RuMedTest

ECG2Pathology

Upload date

No list result

Supported by:

Logo Sber
Logo Sber