Industrial LLM Benchmark

Structure of the Benchmark File

The benchmark file is written using YAML syntax.

You can find here our example benchmark file.

Here is a roundup of the benchmark file, it is only meant for documentation purpose and would fail when you try to run our benchmark on it.

models:
    # This section contains a list of model definition, which can be used later
    # in the benchmark configuration file.
    # A model definition tells the benchmark how to interact with a MLLM

graders:
    # This section contains a list of grader definition, which can be used later
    # in the benchmark configuration file.
    # A grader is used to grade the output of MLLM answer to our tasks.

system_prompts:
    # Here you could add a system prompt specific to the models out of the models
    # section. If you use the magic model name `default` it would be used for
    # every model.
    # Probably you don't want to specify the system prompt here, but in the
    # tasksets or tasks section of the configuration file

tasksets:
    # This section will contains a list of tasksets. All tasks within a
    # task set [TO BE DEFINED]

This is the top level structure of the benchmark configuration file. As the model and grader definition often can be reused in other benchmark configuration files, you can extract them into their own files, instead of adding those details in the different benchmark configuration files. To do so, you would take the models or graders section and move them into its own YAML file. In our example the files are models.yml and graders.yml.

Instead of having this:

models:
    # This section contains a list of model definition

graders:
    # This section contains a list of grader definition

you could write this:

includes:
  - ./models.yml
  - ./graders.yml

If the file paths are relative, they are applied to the location of the configuration file.

You can also mention several models or graders files, as long as the names of the models and graders are distinct.