Skip to content

Add Moondream VLM Architecture #2549

@BharathC0

Description

@BharathC0

Vision Encoder: SigLIP (Sigmoid Loss for Language Image Pre-Training).

Text Decoder: Phi-1.5 (Microsoft's tiny model).

Connector: simple Projection Layer.

HuggingFace Official Moondream Card

Metadata

Metadata

Assignees

Labels

stat:contributions welcomeAdd this label to feature request issues so they are separated out from bug reporting issuestype:featureNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions