Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

[RAG/SDG preprocessing][Dev] Stronger typing for seed_instruction_data #3096

Copy link
Copy link
@jwm4

Description

@jwm4
Issue body actions

SDG's instructlab.sdg.taxonomy has a method called _read_taxonomy_file which returns a value called seed_instruction_data, a list of dictionaries of this form:

                    {
                        "questions_and_answers": question_answer_list,
                        "context": context,
                        "taxonomy_path": tax_path,
                        "documents": document_contents,
                        "filepaths": doc_filepaths,
                        "domain": domain,
                        "document_outline": contents.get("document_outline"),
                    }

Anything that consumes this then needs to know the string labels where each of the fields go. The effect is essentially the same as it would be if this were a Python class but without the type checking advantages you get from having a real class with field names. It would be better to re-represent this as a class, perhaps using typing.NamedTuple which is a convenient way to make a class with a simple list of fields.

Alternatively, we could at least replace the string labels with constants, but that seems like a less robust solution.

Note that this method is called by instructlab/rag/taxonomy_utils.py in core, so changing this code will also require corresponding changes in core. So it would be easier to do this after the SDG preprocessing also moves to core so the change is all contained in one repo.

Acceptance Criteria

  • The _read_taxonomy_file method returns a list of structured objects with named fields instead of a list of dictionaries with hard coded strings.
  • All consumers of this method are updated to use these objects.
Reactions are currently unavailable

Metadata

Metadata

Assignees

No one assigned

    Labels

    RAGRAG specific issuesRAG specific issuesSDGSDG specific issuesSDG specific issuesstaletech-debtIssue or PR pertaining to technical debtIssue or PR pertaining to technical debt

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      Morty Proxy This is a proxified and sanitized view of the page, visit original site.