Super - Linguistic and LLM Evaluator and Author
/
/
Overview
This project will consist of two different approaches that we will call “workflows”:
Workflow #1: Manual SxS Human Evaluation
In this task, you will see a user prompt and two AI-generated responses (or responses from 2 different AI models). You will assess each response in several dimensions:
Safety/Harmlessness
Writing Style
Verbosity
Instruction Following
Truthfulness
Overall Quality
At the end, you will select which response you think is better and explain why. Finally, you will be required to rewrite your chosen response to improve it.
Workflow #2: Quality Evaluation
In this task, you will be given an original prompt and two translated versions of prompts derived from two different LLMs. You are supposed to read all the prompts (original and both translations) and then rate the translated prompt on four aspects:
Verbatim Accuracy
Formatting Preservation
Semantic Equivalence
Extraneous Information
The work contains 2 types of tasks:
After rating, compare both the prompts and add a brief comment to justify your ratings.
After rating, compare both the prompts and rewrite the prompt in the same targeted language.
Purpose
Workflow #1: To compare the quality of the performance of two AI Assistant responses. Workflow #2: To assess AI-generated translations by reviewing and rating the quality of translations based on specific criteria. Click on your language and get started!
About OneForma
OneForma brings together data, intelligence, and experiences to deliver human-centric solutions to complex business challenges.
OneForma is an equal opportunity employer and will not discriminate against any of our applicants on the grounds of race, gender, religion, or cultural background.
You might also be interested in these…
See more jobs
Join today