Star

linkMultimodality

For MTLLM to have actual neurosymbolic powers, it needs to be able to handle multimodal inputs and outputs. This means that it should be able to understand text, images, and videos. In this section, we will discuss how MTLLM can handle multimodal inputs.

linkImage

MTLLM can handle images as inputs. You can provide an image as input to the MTLLM Function or Method using the Image format of mtllm. Here is an example of how you can provide an image as input to the MTLLM Function or Method:

1linkimport:py from mtllm.llms, OpenAI;

2linkimport:py from mtllm, Image;

3link

4linkglob llm = OpenAI(model_name="gpt-4o");

5link

6linkenum Personality {

7link INTROVERT: 'Person who is shy and reticent' = "Introvert",

8link EXTROVERT: 'Person who is outgoing and socially confident' = "Extrovert"

9link}

10link

11linkobj 'Person'

12linkPerson {

13link has full_name: str,

14link yod: 'Year of Death': int,

15link personality: 'Personality of the Person': Personality;

16link}

17link

18linkcan get_person_info(img: 'Image of Person': Image) -> Person

19linkby llm();

20link

21linkwith entry {

22link person_obj = get_person_info(Image("person.png"));

23link print(person_obj);

24link}

Input Image (person.png): person.png

1link# Output

2linkPerson(full_name='Albert Einstein', yod=1955, personality=Personality.INTROVERT)

In the above example, we have provided an image of a person ("Albert Einstein") as input to the get_person_info method. The method returns the information of the person in the image. The output of the method is a Person object with the name, year of death, and personality of the person in the image.

linkVideo

Similarly, MTLLM can handle videos as inputs. You can provide a video as input to the MTLLM Function or Method using the Video format of mtllm. Here is an example of how you can provide a video as input to the MTLLM Function or Method:

1linkimport:py from mtllm.llms, OpenAI;

2linkimport:py from mtllm, Video;

3link

4linkglob llm = OpenAI(model_name="gpt-4o");

5link

6linkcan is_aligned(video: Video, text: str) -> bool

7linkby llm(method="Chain-of-Thoughts", context="Mugen is the moving character");

8link

9linkwith entry {

10link video = Video("mugen.mp4", 1);

11link text = "Mugen jumps off and collects few coins.";

12link print(is_aligned(video, text));

13link}

Input Video (mugen.mp4): mugen.mp4

1link# Output

2linkTrue

In the above example, we have provided a video of a character ("Mugen") as input to the is_aligned method. The method checks if the text is aligned with the video. The output of the method is a boolean value indicating whether the text is aligned with the video.

linkAudio

We are working on adding support for audio inputs to MTLLM. Stay tuned for updates on this feature.

MultimodalityImageVideoAudio

Home

Quick Startchevron_right
Building Blockschevron_right
Tutorialschevron_right
API Referencechevron_right
Tips and Trickschevron_right

FAQs