forked from ggml-org/llama.cpp
-
Notifications
You must be signed in to change notification settings - Fork 18
[draft] Submit engine changes to Llama.cpp/ggml public repos. Async Memory. #44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
jesusmb1995
wants to merge
23
commits into
tetherto:temp-b6817
from
jesusmb1995:jmb/memory_load_pr_to_upstream21
Closed
[draft] Submit engine changes to Llama.cpp/ggml public repos. Async Memory. #44
jesusmb1995
wants to merge
23
commits into
tetherto:temp-b6817
from
jesusmb1995:jmb/memory_load_pr_to_upstream21
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Convert llama_file to a pure virtual class that can be overriden by multiple implementations (disk, single memory buffer, ...)
5ad997a to
98d10a3
Compare
…e used on the tests.
This change adds an additional automated test loading from disk, to ensure the existing functionallity does not break.
- Ensures a char trait implementation for uint8 exists, that can be used with std::basic_streambuff. - Adds an implementation of std::basic_streambuff for a single vector. Will be used by llama.cpp and tests when loading from a single memory buffer.
Override the pure virtual interface with a class that can operate on a single memory buffer.
Auxiliary function to convert a list of C strings to a vector of C++ strings.
Add new GGUF reader implementation that can read metadata from a memory buffer. diff --git a/examples/llama.android/app/build.gradle.kts b/examples/llama.android/app/build.gradle.kts deleted file mode 100644 index 8d1b371..000000000 --- a/examples/llama.android/app/build.gradle.kts +++ /dev/null @@ -1,65 +0,0 @@ -plugins { - id("com.android.application") - id("org.jetbrains.kotlin.android") -} - -android { - namespace = "com.example.llama" - compileSdk = 34 - - defaultConfig { - applicationId = "com.example.llama" - minSdk = 33 - targetSdk = 34 - versionCode = 1 - versionName = "1.0" - - testInstrumentationRunner = "androidx.test.runner.AndroidJUnitRunner" - vectorDrawables { - useSupportLibrary = true - } - } - - buildTypes { - release { - isMinifyEnabled = false - proguardFiles( - getDefaultProguardFile("proguard-android-optimize.txt"), - "proguard-rules.pro" - ) - } - } - compileOptions { - sourceCompatibility = JavaVersion.VERSION_1_8 - targetCompatibility = JavaVersion.VERSION_1_8 - } - kotlinOptions { - jvmTarget = "1.8" - } - buildFeatures { - compose = true - } - composeOptions { - kotlinCompilerExtensionVersion = "1.5.1" - } -} - -dependencies { - - implementation("androidx.core:core-ktx:1.12.0") - implementation("androidx.lifecycle:lifecycle-runtime-ktx:2.6.2") - implementation("androidx.activity:activity-compose:1.8.2") - implementation(platform("androidx.compose:compose-bom:2023.08.00")) - implementation("androidx.compose.ui:ui") - implementation("androidx.compose.ui:ui-graphics") - implementation("androidx.compose.ui:ui-tooling-preview") - implementation("androidx.compose.material3:material3") - implementation(project(":llama")) - testImplementation("junit:junit:4.13.2") - androidTestImplementation("androidx.test.ext:junit:1.1.5") - androidTestImplementation("androidx.test.espresso:espresso-core:3.5.1") - androidTestImplementation(platform("androidx.compose:compose-bom:2023.08.00")) - androidTestImplementation("androidx.compose.ui:ui-test-junit4") - debugImplementation("androidx.compose.ui:ui-tooling") - debugImplementation("androidx.compose.ui:ui-test-manifest") -} diff --git a/examples/llama.android/app/proguard-rules.pro b/examples/llama.android/app/proguard-rules.pro deleted file mode 100644 index f1b4245..000000000 --- a/examples/llama.android/app/proguard-rules.pro +++ /dev/null @@ -1,21 +0,0 @@ -# Add project specific ProGuard rules here. -# You can control the set of applied configuration files using the -# proguardFiles setting in build.gradle. -# -# For more details, see -# http://developer.android.com/guide/developing/tools/proguard.html - -# If your project uses WebView with JS, uncomment the following -# and specify the fully qualified class name to the JavaScript interface -# class: -#-keepclassmembers class fqcn.of.javascript.interface.for.webview { -# public *; -#} - -# Uncomment this to preserve the line number information for -# debugging stack traces. -#-keepattributes SourceFile,LineNumberTable - -# If you keep the line number information, uncomment this to -# hide the original source file name. -#-renamesourcefileattribute SourceFile diff --git a/examples/llama.android/app/src/main/AndroidManifest.xml b/examples/llama.android/app/src/main/AndroidManifest.xml deleted file mode 100644 index 41a358a..000000000 --- a/examples/llama.android/app/src/main/AndroidManifest.xml +++ /dev/null @@ -1,30 +0,0 @@ -<?xml version="1.0" encoding="utf-8"?> -<manifest xmlns:android="http://schemas.android.com/apk/res/android" - xmlns:tools="http://schemas.android.com/tools"> - - <uses-permission android:name="android.permission.INTERNET" /> - - <application - android:allowBackup="true" - android:dataExtractionRules="@xml/data_extraction_rules" - android:fullBackupContent="@xml/backup_rules" - android:icon="@mipmap/ic_launcher" - android:label="@string/app_name" - android:roundIcon="@mipmap/ic_launcher_round" - android:supportsRtl="true" - android:theme="@style/Theme.LlamaAndroid" - > - - <activity - android:name=".MainActivity" - android:exported="true" - android:theme="@style/Theme.LlamaAndroid"> - <intent-filter> - <action android:name="android.intent.action.MAIN" /> - - <category android:name="android.intent.category.LAUNCHER" /> - </intent-filter> - </activity> - </application> - -</manifest> diff --git a/examples/llama.android/app/src/main/java/com/example/llama/Downloadable.kt b/examples/llama.android/app/src/main/java/com/example/llama/Downloadable.kt deleted file mode 100644 index 78c231a..000000000 --- a/examples/llama.android/app/src/main/java/com/example/llama/Downloadable.kt +++ /dev/null @@ -1,119 +0,0 @@ -package com.example.llama - -import android.app.DownloadManager -import android.net.Uri -import android.util.Log -import androidx.compose.material3.Button -import androidx.compose.material3.Text -import androidx.compose.runtime.Composable -import androidx.compose.runtime.getValue -import androidx.compose.runtime.mutableDoubleStateOf -import androidx.compose.runtime.mutableStateOf -import androidx.compose.runtime.remember -import androidx.compose.runtime.rememberCoroutineScope -import androidx.compose.runtime.setValue -import androidx.core.database.getLongOrNull -import androidx.core.net.toUri -import kotlinx.coroutines.delay -import kotlinx.coroutines.launch -import java.io.File - -data class Downloadable(val name: String, val source: Uri, val destination: File) { - companion object { - @JvmStatic - private val tag: String? = this::class.qualifiedName - - sealed interface State - data object Ready: State - data class Downloading(val id: Long): State - data class Downloaded(val downloadable: Downloadable): State - data class Error(val message: String): State - - @JvmStatic - @composable - fun Button(viewModel: MainViewModel, dm: DownloadManager, item: Downloadable) { - var status: State by remember { - mutableStateOf( - if (item.destination.exists()) Downloaded(item) - else Ready - ) - } - var progress by remember { mutableDoubleStateOf(0.0) } - - val coroutineScope = rememberCoroutineScope() - - suspend fun waitForDownload(result: Downloading, item: Downloadable): State { - while (true) { - val cursor = dm.query(DownloadManager.Query().setFilterById(result.id)) - - if (cursor == null) { - Log.e(tag, "dm.query() returned null") - return Error("dm.query() returned null") - } - - if (!cursor.moveToFirst() || cursor.count < 1) { - cursor.close() - Log.i(tag, "cursor.moveToFirst() returned false or cursor.count < 1, download canceled?") - return Ready - } - - val pix = cursor.getColumnIndex(DownloadManager.COLUMN_BYTES_DOWNLOADED_SO_FAR) - val tix = cursor.getColumnIndex(DownloadManager.COLUMN_TOTAL_SIZE_BYTES) - val sofar = cursor.getLongOrNull(pix) ?: 0 - val total = cursor.getLongOrNull(tix) ?: 1 - cursor.close() - - if (sofar == total) { - return Downloaded(item) - } - - progress = (sofar * 1.0) / total - - delay(1000L) - } - } - - fun onClick() { - when (val s = status) { - is Downloaded -> { - viewModel.load(item.destination.path) - } - - is Downloading -> { - coroutineScope.launch { - status = waitForDownload(s, item) - } - } - - else -> { - item.destination.delete() - - val request = DownloadManager.Request(item.source).apply { - setTitle("Downloading model") - setDescription("Downloading model: ${item.name}") - setAllowedNetworkTypes(DownloadManager.Request.NETWORK_WIFI) - setDestinationUri(item.destination.toUri()) - } - - viewModel.log("Saving ${item.name} to ${item.destination.path}") - Log.i(tag, "Saving ${item.name} to ${item.destination.path}") - - val id = dm.enqueue(request) - status = Downloading(id) - onClick() - } - } - } - - Button(onClick = { onClick() }, enabled = status !is Downloading) { - when (status) { - is Downloading -> Text(text = "Downloading ${(progress * 100).toInt()}%") - is Downloaded -> Text("Load ${item.name}") - is Ready -> Text("Download ${item.name}") - is Error -> Text("Download ${item.name}") - } - } - } - - } -} diff --git a/examples/llama.android/app/src/main/java/com/example/llama/MainActivity.kt b/examples/llama.android/app/src/main/java/com/example/llama/MainActivity.kt deleted file mode 100644 index 9da04f7..000000000 --- a/examples/llama.android/app/src/main/java/com/example/llama/MainActivity.kt +++ /dev/null @@ -1,154 +0,0 @@ -package com.example.llama - -import android.app.ActivityManager -import android.app.DownloadManager -import android.content.ClipData -import android.content.ClipboardManager -import android.net.Uri -import android.os.Bundle -import android.os.StrictMode -import android.os.StrictMode.VmPolicy -import android.text.format.Formatter -import androidx.activity.ComponentActivity -import androidx.activity.compose.setContent -import androidx.activity.viewModels -import androidx.compose.foundation.layout.Box -import androidx.compose.foundation.layout.Column -import androidx.compose.foundation.layout.Row -import androidx.compose.foundation.layout.fillMaxSize -import androidx.compose.foundation.layout.padding -import androidx.compose.foundation.lazy.LazyColumn -import androidx.compose.foundation.lazy.items -import androidx.compose.foundation.lazy.rememberLazyListState -import androidx.compose.material3.Button -import androidx.compose.material3.LocalContentColor -import androidx.compose.material3.MaterialTheme -import androidx.compose.material3.OutlinedTextField -import androidx.compose.material3.Surface -import androidx.compose.material3.Text -import androidx.compose.runtime.Composable -import androidx.compose.ui.Modifier -import androidx.compose.ui.unit.dp -import androidx.core.content.getSystemService -import com.example.llama.ui.theme.LlamaAndroidTheme -import java.io.File - -class MainActivity( - activityManager: ActivityManager? = null, - downloadManager: DownloadManager? = null, - clipboardManager: ClipboardManager? = null, -): ComponentActivity() { - private val tag: String? = this::class.simpleName - - private val activityManager by lazy { activityManager ?: getSystemService<ActivityManager>()!! } - private val downloadManager by lazy { downloadManager ?: getSystemService<DownloadManager>()!! } - private val clipboardManager by lazy { clipboardManager ?: getSystemService<ClipboardManager>()!! } - - private val viewModel: MainViewModel by viewModels() - - // Get a MemoryInfo object for the device's current memory status. - private fun availableMemory(): ActivityManager.MemoryInfo { - return ActivityManager.MemoryInfo().also { memoryInfo -> - activityManager.getMemoryInfo(memoryInfo) - } - } - - override fun onCreate(savedInstanceState: Bundle?) { - super.onCreate(savedInstanceState) - - StrictMode.setVmPolicy( - VmPolicy.Builder(StrictMode.getVmPolicy()) - .detectLeakedClosableObjects() - .build() - ) - - val free = Formatter.formatFileSize(this, availableMemory().availMem) - val total = Formatter.formatFileSize(this, availableMemory().totalMem) - - viewModel.log("Current memory: $free / $total") - viewModel.log("Downloads directory: ${getExternalFilesDir(null)}") - - val extFilesDir = getExternalFilesDir(null) - - val models = listOf( - Downloadable( - "Phi-2 7B (Q4_0, 1.6 GiB)", - Uri.parse("https://huggingface.co/ggml-org/models/resolve/main/phi-2/ggml-model-q4_0.gguf?download=true"), - File(extFilesDir, "phi-2-q4_0.gguf"), - ), - Downloadable( - "TinyLlama 1.1B (f16, 2.2 GiB)", - Uri.parse("https://huggingface.co/ggml-org/models/resolve/main/tinyllama-1.1b/ggml-model-f16.gguf?download=true"), - File(extFilesDir, "tinyllama-1.1-f16.gguf"), - ), - Downloadable( - "Phi 2 DPO (Q3_K_M, 1.48 GiB)", - Uri.parse("https://huggingface.co/TheBloke/phi-2-dpo-GGUF/resolve/main/phi-2-dpo.Q3_K_M.gguf?download=true"), - File(extFilesDir, "phi-2-dpo.Q3_K_M.gguf") - ), - ) - - setContent { - LlamaAndroidTheme { - // A surface container using the 'background' color from the theme - Surface( - modifier = Modifier.fillMaxSize(), - color = MaterialTheme.colorScheme.background - ) { - MainCompose( - viewModel, - clipboardManager, - downloadManager, - models, - ) - } - - } - } - } -} - -@composable -fun MainCompose( - viewModel: MainViewModel, - clipboard: ClipboardManager, - dm: DownloadManager, - models: List<Downloadable> -) { - Column { - val scrollState = rememberLazyListState() - - Box(modifier = Modifier.weight(1f)) { - LazyColumn(state = scrollState) { - items(viewModel.messages) { - Text( - it, - style = MaterialTheme.typography.bodyLarge.copy(color = LocalContentColor.current), - modifier = Modifier.padding(16.dp) - ) - } - } - } - OutlinedTextField( - value = viewModel.message, - onValueChange = { viewModel.updateMessage(it) }, - label = { Text("Message") }, - ) - Row { - Button({ viewModel.send() }) { Text("Send") } - Button({ viewModel.bench(8, 4, 1) }) { Text("Bench") } - Button({ viewModel.clear() }) { Text("Clear") } - Button({ - viewModel.messages.joinToString("\n").let { - clipboard.setPrimaryClip(ClipData.newPlainText("", it)) - } - }) { Text("Copy") } - } - - Column { - for (model in models) { - Downloadable.Button(viewModel, dm, model) - } - } - } -} diff --git a/examples/llama.android/app/src/main/java/com/example/llama/MainViewModel.kt b/examples/llama.android/app/src/main/java/com/example/llama/MainViewModel.kt deleted file mode 100644 index 45ac299..000000000 --- a/examples/llama.android/app/src/main/java/com/example/llama/MainViewModel.kt +++ /dev/null @@ -1,105 +0,0 @@ -package com.example.llama - -import android.llama.cpp.LLamaAndroid -import android.util.Log -import androidx.compose.runtime.getValue -import androidx.compose.runtime.mutableStateOf -import androidx.compose.runtime.setValue -import androidx.lifecycle.ViewModel -import androidx.lifecycle.viewModelScope -import kotlinx.coroutines.flow.catch -import kotlinx.coroutines.launch - -class MainViewModel(private val llamaAndroid: LLamaAndroid = LLamaAndroid.instance()): ViewModel() { - companion object { - @JvmStatic - private val NanosPerSecond = 1_000_000_000.0 - } - - private val tag: String? = this::class.simpleName - - var messages by mutableStateOf(listOf("Initializing...")) - private set - - var message by mutableStateOf("") - private set - - override fun onCleared() { - super.onCleared() - - viewModelScope.launch { - try { - llamaAndroid.unload() - } catch (exc: IllegalStateException) { - messages += exc.message!! - } - } - } - - fun send() { - val text = message - message = "" - - // Add to messages console. - messages += text - messages += "" - - viewModelScope.launch { - llamaAndroid.send(text) - .catch { - Log.e(tag, "send() failed", it) - messages += it.message!! - } - .collect { messages = messages.dropLast(1) + (messages.last() + it) } - } - } - - fun bench(pp: Int, tg: Int, pl: Int, nr: Int = 1) { - viewModelScope.launch { - try { - val start = System.nanoTime() - val warmupResult = llamaAndroid.bench(pp, tg, pl, nr) - val end = System.nanoTime() - - messages += warmupResult - - val warmup = (end - start).toDouble() / NanosPerSecond - messages += "Warm up time: $warmup seconds, please wait..." - - if (warmup > 5.0) { - messages += "Warm up took too long, aborting benchmark" - return@launch - } - - messages += llamaAndroid.bench(512, 128, 1, 3) - } catch (exc: IllegalStateException) { - Log.e(tag, "bench() failed", exc) - messages += exc.message!! - } - } - } - - fun load(pathToModel: String) { - viewModelScope.launch { - try { - llamaAndroid.load(pathToModel) - messages += "Loaded $pathToModel" - } catch (exc: IllegalStateException) { - Log.e(tag, "load() failed", exc) - messages += exc.message!! - } - } - } - - fun updateMessage(newMessage: String) { - message = newMessage - } - - fun clear() { - messages = listOf() - } - - fun log(message: String) { - messages += message - } -} diff --git a/examples/llama.android/app/src/main/java/com/example/llama/ui/theme/Color.kt b/examples/llama.android/app/src/main/java/com/example/llama/ui/theme/Color.kt deleted file mode 100644 index 40c30e8..000000000 --- a/examples/llama.android/app/src/main/java/com/example/llama/ui/theme/Color.kt +++ /dev/null @@ -1,11 +0,0 @@ -package com.example.llama.ui.theme - -import androidx.compose.ui.graphics.Color - -val Purple80 = Color(0xFFD0BCFF) -val PurpleGrey80 = Color(0xFFCCC2DC) -val Pink80 = Color(0xFFEFB8C8) - -val Purple40 = Color(0xFF6650a4) -val PurpleGrey40 = Color(0xFF625b71) -val Pink40 = Color(0xFF7D5260) diff --git a/examples/llama.android/app/src/main/java/com/example/llama/ui/theme/Theme.kt b/examples/llama.android/app/src/main/java/com/example/llama/ui/theme/Theme.kt deleted file mode 100644 index e742220..000000000 --- a/examples/llama.android/app/src/main/java/com/example/llama/ui/theme/Theme.kt +++ /dev/null @@ -1,70 +0,0 @@ -package com.example.llama.ui.theme - -import android.app.Activity -import android.os.Build -import androidx.compose.foundation.isSystemInDarkTheme -import androidx.compose.material3.MaterialTheme -import androidx.compose.material3.darkColorScheme -import androidx.compose.material3.dynamicDarkColorScheme -import androidx.compose.material3.dynamicLightColorScheme -import androidx.compose.material3.lightColorScheme -import androidx.compose.runtime.Composable -import androidx.compose.runtime.SideEffect -import androidx.compose.ui.graphics.toArgb -import androidx.compose.ui.platform.LocalContext -import androidx.compose.ui.platform.LocalView -import androidx.core.view.WindowCompat - -private val DarkColorScheme = darkColorScheme( - primary = Purple80, - secondary = PurpleGrey80, - tertiary = Pink80 -) - -private val LightColorScheme = lightColorScheme( - primary = Purple40, - secondary = PurpleGrey40, - tertiary = Pink40 - - /* Other default colors to override - background = Color(0xFFFFFBFE), - surface = Color(0xFFFFFBFE), - onPrimary = Color.White, - onSecondary = Color.White, - onTertiary = Color.White, - onBackground = Color(0xFF1C1B1F), - onSurface = Color(0xFF1C1B1F), - */ -) - -@composable -fun LlamaAndroidTheme( - darkTheme: Boolean = isSystemInDarkTheme(), - // Dynamic color is available on Android 12+ - dynamicColor: Boolean = true, - content: @composable () -> Unit -) { - val colorScheme = when { - dynamicColor && Build.VERSION.SDK_INT >= Build.VERSION_CODES.S -> { - val context = LocalContext.current - if (darkTheme) dynamicDarkColorScheme(context) else dynamicLightColorScheme(context) - } - - darkTheme -> DarkColorScheme - else -> LightColorScheme - } - val view = LocalView.current - if (!view.isInEditMode) { - SideEffect { - val window = (view.context as Activity).window - window.statusBarColor = colorScheme.primary.toArgb() - WindowCompat.getInsetsController(window, view).isAppearanceLightStatusBars = darkTheme - } - } - - MaterialTheme( - colorScheme = colorScheme, - typography = Typography, - content = content - ) -} diff --git a/examples/llama.android/app/src/main/java/com/example/llama/ui/theme/Type.kt b/examples/llama.android/app/src/main/java/com/example/llama/ui/theme/Type.kt deleted file mode 100644 index 0b87946..000000000 --- a/examples/llama.android/app/src/main/java/com/example/llama/ui/theme/Type.kt +++ /dev/null @@ -1,34 +0,0 @@ -package com.example.llama.ui.theme - -import androidx.compose.material3.Typography -import androidx.compose.ui.text.TextStyle -import androidx.compose.ui.text.font.FontFamily -import androidx.compose.ui.text.font.FontWeight -import androidx.compose.ui.unit.sp - -// Set of Material typography styles to start with -val Typography = Typography( - bodyLarge = TextStyle( - fontFamily = FontFamily.Default, - fontWeight = FontWeight.Normal, - fontSize = 16.sp, - lineHeight = 24.sp, - letterSpacing = 0.5.sp - ) - /* Other default text styles to override - titleLarge = TextStyle( - fontFamily = FontFamily.Default, - fontWeight = FontWeight.Normal, - fontSize = 22.sp, - lineHeight = 28.sp, - letterSpacing = 0.sp - ), - labelSmall = TextStyle( - fontFamily = FontFamily.Default, - fontWeight = FontWeight.Medium, - fontSize = 11.sp, - lineHeight = 16.sp, - letterSpacing = 0.5.sp - ) - */ -) diff --git a/examples/llama.android/app/src/main/res/drawable/ic_launcher_background.xml b/examples/llama.android/app/src/main/res/drawable/ic_launcher_background.xml deleted file mode 100644 index 07d5da9..000000000 --- a/examples/llama.android/app/src/main/res/drawable/ic_launcher_background.xml +++ /dev/null @@ -1,170 +0,0 @@ -<?xml version="1.0" encoding="utf-8"?> -<vector xmlns:android="http://schemas.android.com/apk/res/android" - android:width="108dp" - android:height="108dp" - android:viewportWidth="108" - android:viewportHeight="108"> - <path - android:fillColor="#3DDC84" - android:pathData="M0,0h108v108h-108z" /> - <path - android:fillColor="#00000000" - android:pathData="M9,0L9,108" - android:strokeWidth="0.8" - android:strokeColor="#33FFFFFF" /> - <path - android:fillColor="#00000000" - android:pathData="M19,0L19,108" - android:strokeWidth="0.8" - android:strokeColor="#33FFFFFF" /> - <path - android:fillColor="#00000000" - android:pathData="M29,0L29,108" - android:strokeWidth="0.8" - android:strokeColor="#33FFFFFF" /> - <path - android:fillColor="#00000000" - android:pathData="M39,0L39,108" - android:strokeWidth="0.8" - android:strokeColor="#33FFFFFF" /> - <path - android:fillColor="#00000000" - android:pathData="M49,0L49,108" - android:strokeWidth="0.8" - android:strokeColor="#33FFFFFF" /> - <path - android:fillColor="#00000000" - android:pathData="M59,0L59,108" - android:strokeWidth="0.8" - android:strokeColor="#33FFFFFF" /> - <path - android:fillColor="#00000000" - android:pathData="M69,0L69,108" - android:strokeWidth="0.8" - android:strokeColor="#33FFFFFF" /> - <path - android:fillColor="#00000000" - android:pathData="M79,0L79,108" - android:strokeWidth="0.8" - android:strokeColor="#33FFFFFF" /> - <path - android:fillColor="#00000000" - android:pathData="M89,0L89,108" - android:strokeWidth="0.8" - android:strokeColor="#33FFFFFF" /> - <path - android:fillColor="#00000000" - android:pathData="M99,0L99,108" - android:strokeWidth="0.8" - android:strokeColor="#33FFFFFF" /> - <path - android:fillColor="#00000000" - android:pathData="M0,9L108,9" - android:strokeWidth="0.8" - android:strokeColor="#33FFFFFF" /> - <path - android:fillColor="#00000000" - android:pathData="M0,19L108,19" - android:strokeWidth="0.8" - android:strokeColor="#33FFFFFF" /> - <path - android:fillColor="#00000000" - android:pathData="M0,29L108,29" - android:strokeWidth="0.8" - android:strokeColor="#33FFFFFF" /> - <path - android:fillColor="#00000000" - android:pathData="M0,39L108,39" - android:strokeWidth="0.8" - android:strokeColor="#33FFFFFF" /> - <path - android:fillColor="#00000000" - android:pathData="M0,49L108,49" - android:strokeWidth="0.8" - android:strokeColor="#33FFFFFF" /> - <path - android:fillColor="#00000000" - android:pathData="M0,59L108,59" - android:strokeWidth="0.8" - android:strokeColor="#33FFFFFF" /> - <path - android:fillColor="#00000000" - android:pathData="M0,69L108,69" - android:strokeWidth="0.8" - android:strokeColor="#33FFFFFF" /> - <path - android:fillColor="#00000000" - android:pathData="M0,79L108,79" - android:strokeWidth="0.8" - android:strokeColor="#33FFFFFF" /> - <path - android:fillColor="#00000000" - android:pathData="M0,89L108,89" - android:strokeWidth="0.8" - android:strokeColor="#33FFFFFF" /> - <path - android:fillColor="#00000000" - android:pathData="M0,99L108,99" - android:strokeWidth="0.8" - android:strokeColor="#33FFFFFF" /> - <path - android:fillColor="#00000000" - android:pathData="M19,29L89,29" - android:strokeWidth="0.8" - android:strokeColor="#33FFFFFF" /> - <path - android:fillColor="#00000000" - android:pathData="M19,39L89,39" - android:strokeWidth="0.8" - android:strokeColor="#33FFFFFF" /> - <path - android:fillColor="#00000000" - android:pathData="M19,49L89,49" - android:strokeWidth="0.8" - android:strokeColor="#33FFFFFF" /> - <path - android:fillColor="#00000000" - android:pathData="M19,59L89,59" - android:strokeWidth="0.8" - android:strokeColor="#33FFFFFF" /> - <path - android:fillColor="#00000000" - android:pathData="M19,69L89,69" - android:strokeWidth="0.8" - android:strokeColor="#33FFFFFF" /> - <path - android:fillColor="#00000000" - android:pathData="M19,79L89,79" - android:strokeWidth="0.8" - android:strokeColor="#33FFFFFF" /> - <path - android:fillColor="#00000000" - android:pathData="M29,19L29,89" - android:strokeWidth="0.8" - android:strokeColor="#33FFFFFF" /> - <path - android:fillColor="#00000000" - android:pathData="M39,19L39,89" - android:strokeWidth="0.8" - android:strokeColor="#33FFFFFF" /> - <path - android:fillColor="#00000000" - android:pathData="M49,19L49,89" - android:strokeWidth="0.8" - android:strokeColor="#33FFFFFF" /> - <path - android:fillColor="#00000000" - android:pathData="M59,19L59,89" - android:strokeWidth="0.8" - android:strokeColor="#33FFFFFF" /> - <path - android:fillColor="#00000000" - android:pathData="M69,19L69,89" - android:strokeWidth="0.8" - android:strokeColor="#33FFFFFF" /> - <path - android:fillColor="#00000000" - android:pathData="M79,19L79,89" - android:strokeWidth="0.8" - android:strokeColor="#33FFFFFF" /> -</vector> diff --git a/examples/llama.android/app/src/main/res/drawable/ic_launcher_foreground.xml b/examples/llama.android/app/src/main/res/drawable/ic_launcher_foreground.xml deleted file mode 100644 index 7706ab9..000000000 --- a/examples/llama.android/app/src/main/res/drawable/ic_launcher_foreground.xml +++ /dev/null @@ -1,30 +0,0 @@ -<vector xmlns:android="http://schemas.android.com/apk/res/android" - xmlns:aapt="http://schemas.android.com/aapt" - android:width="108dp" - android:height="108dp" - android:viewportWidth="108" - android:viewportHeight="108"> - <path android:pathData="M31,63.928c0,0 6.4,-11 12.1,-13.1c7.2,-2.6 26,-1.4 26,-1.4l38.1,38.1L107,108.928l-32,-1L31,63.928z"> - <aapt:attr name="android:fillColor"> - <gradient - android:endX="85.84757" - android:endY="92.4963" - android:startX="42.9492" - android:startY="49.59793" - android:type="linear"> - <item - android:color="#44000000" - android:offset="0.0" /> - <item - android:color="#00000000" - android:offset="1.0" /> - </gradient> - </aapt:attr> - </path> - <path - android:fillColor="#FFFFFF" - android:fillType="nonZero" - android:pathData="M65.3,45.828l3.8,-6.6c0.2,-0.4 0.1,-0.9 -0.3,-1.1c-0.4,-0.2 -0.9,-0.1 -1.1,0.3l-3.9,6.7c-6.3,-2.8 -13.4,-2.8 -19.7,0l-3.9,-6.7c-0.2,-0.4 -0.7,-0.5 -1.1,-0.3C38.8,38.328 38.7,38.828 38.9,39.228l3.8,6.6C36.2,49.428 31.7,56.028 31,63.928h46C76.3,56.028 71.8,49.428 65.3,45.828zM43.4,57.328c-0.8,0 -1.5,-0.5 -1.8,-1.2c-0.3,-0.7 -0.1,-1.5 0.4,-2.1c0.5,-0.5 1.4,-0.7 2.1,-0.4c0.7,0.3 1.2,1 1.2,1.8C45.3,56.528 44.5,57.328 43.4,57.328L43.4,57.328zM64.6,57.328c-0.8,0 -1.5,-0.5 -1.8,-1.2s-0.1,-1.5 0.4,-2.1c0.5,-0.5 1.4,-0.7 2.1,-0.4c0.7,0.3 1.2,1 1.2,1.8C66.5,56.528 65.6,57.328 64.6,57.328L64.6,57.328z" - android:strokeWidth="1" - android:strokeColor="#00000000" /> -</vector> diff --git a/examples/llama.android/app/src/main/res/mipmap-anydpi/ic_launcher.xml b/examples/llama.android/app/src/main/res/mipmap-anydpi/ic_launcher.xml deleted file mode 100644 index b3e26b4..000000000 --- a/examples/llama.android/app/src/main/res/mipmap-anydpi/ic_launcher.xml +++ /dev/null @@ -1,6 +0,0 @@ -<?xml version="1.0" encoding="utf-8"?> -<adaptive-icon xmlns:android="http://schemas.android.com/apk/res/android"> - <background android:drawable="@drawable/ic_launcher_background" /> - <foreground android:drawable="@drawable/ic_launcher_foreground" /> - <monochrome android:drawable="@drawable/ic_launcher_foreground" /> -</adaptive-icon> diff --git a/examples/llama.android/app/src/main/res/mipmap-anydpi/ic_launcher_round.xml b/examples/llama.android/app/src/main/res/mipmap-anydpi/ic_launcher_round.xml deleted file mode 100644 index b3e26b4..000000000 --- a/examples/llama.android/app/src/main/res/mipmap-anydpi/ic_launcher_round.xml +++ /dev/null @@ -1,6 +0,0 @@ -<?xml version="1.0" encoding="utf-8"?> -<adaptive-icon xmlns:android="http://schemas.android.com/apk/res/android"> - <background android:drawable="@drawable/ic_launcher_background" /> - <foreground android:drawable="@drawable/ic_launcher_foreground" /> - <monochrome android:drawable="@drawable/ic_launcher_foreground" /> -</adaptive-icon> diff --git a/examples/llama.android/app/src/main/res/mipmap-hdpi/ic_launcher.webp b/examples/llama.android/app/src/main/res/mipmap-hdpi/ic_launcher.webp deleted file mode 100644 index c209e78..000000000 Binary files a/examples/llama.android/app/src/main/res/mipmap-hdpi/ic_launcher.webp and /dev/null differ diff --git a/examples/llama.android/app/src/main/res/mipmap-hdpi/ic_launcher_round.webp b/examples/llama.android/app/src/main/res/mipmap-hdpi/ic_launcher_round.webp deleted file mode 100644 index b2dfe3d..000000000 Binary files a/examples/llama.android/app/src/main/res/mipmap-hdpi/ic_launcher_round.webp and /dev/null differ diff --git a/examples/llama.android/app/src/main/res/mipmap-mdpi/ic_launcher.webp b/examples/llama.android/app/src/main/res/mipmap-mdpi/ic_launcher.webp deleted file mode 100644 index 4f0f1d6..000000000 Binary files a/examples/llama.android/app/src/main/res/mipmap-mdpi/ic_launcher.webp and /dev/null differ diff --git a/examples/llama.android/app/src/main/res/mipmap-mdpi/ic_launcher_round.webp b/examples/llama.android/app/src/main/res/mipmap-mdpi/ic_launcher_round.webp deleted file mode 100644 index 62b611d..000000000 Binary files a/examples/llama.android/app/src/main/res/mipmap-mdpi/ic_launcher_round.webp and /dev/null differ diff --git a/examples/llama.android/app/src/main/res/mipmap-xhdpi/ic_launcher.webp b/examples/llama.android/app/src/main/res/mipmap-xhdpi/ic_launcher.webp deleted file mode 100644 index 948a307..000000000 Binary files a/examples/llama.android/app/src/main/res/mipmap-xhdpi/ic_launcher.webp and /dev/null differ diff --git a/examples/llama.android/app/src/main/res/mipmap-xhdpi/ic_launcher_round.webp b/examples/llama.android/app/src/main/res/mipmap-xhdpi/ic_launcher_round.webp deleted file mode 100644 index 1b9a695..000000000 Binary files a/examples/llama.android/app/src/main/res/mipmap-xhdpi/ic_launcher_round.webp and /dev/null differ diff --git a/examples/llama.android/app/src/main/res/mipmap-xxhdpi/ic_launcher.webp b/examples/llama.android/app/src/main/res/mipmap-xxhdpi/ic_launcher.webp deleted file mode 100644 index 28d4b77..000000000 Binary files a/examples/llama.android/app/src/main/res/mipmap-xxhdpi/ic_launcher.webp and /dev/null differ diff --git a/examples/llama.android/app/src/main/res/mipmap-xxhdpi/ic_launcher_round.webp b/examples/llama.android/app/src/main/res/mipmap-xxhdpi/ic_launcher_round.webp deleted file mode 100644 index 9287f50..000000000 Binary files a/examples/llama.android/app/src/main/res/mipmap-xxhdpi/ic_launcher_round.webp and /dev/null differ diff --git a/examples/llama.android/app/src/main/res/mipmap-xxxhdpi/ic_launcher.webp b/examples/llama.android/app/src/main/res/mipmap-xxxhdpi/ic_launcher.webp deleted file mode 100644 index aa7d642..000000000 Binary files a/examples/llama.android/app/src/main/res/mipmap-xxxhdpi/ic_launcher.webp and /dev/null differ diff --git a/examples/llama.android/app/src/main/res/mipmap-xxxhdpi/ic_launcher_round.webp b/examples/llama.android/app/src/main/res/mipmap-xxxhdpi/ic_launcher_round.webp deleted file mode 100644 index 9126ae3..000000000 Binary files a/examples/llama.android/app/src/main/res/mipmap-xxxhdpi/ic_launcher_round.webp and /dev/null differ diff --git a/examples/llama.android/app/src/main/res/values/colors.xml b/examples/llama.android/app/src/main/res/values/colors.xml deleted file mode 100644 index ca1931b..000000000 --- a/examples/llama.android/app/src/main/res/values/colors.xml +++ /dev/null @@ -1,10 +0,0 @@ -<?xml version="1.0" encoding="utf-8"?> -<resources> - <color name="purple_200">#FFBB86FC</color> - <color name="purple_500">#FF6200EE</color> - <color name="purple_700">#FF3700B3</color> - <color name="teal_200">#FF03DAC5</color> - <color name="teal_700">#FF018786</color> - <color name="black">#FF000000</color> - <color name="white">#FFFFFFFF</color> -</resources> diff --git a/examples/llama.android/app/src/main/res/values/strings.xml b/examples/llama.android/app/src/main/res/values/strings.xml deleted file mode 100644 index 7a9d314..000000000 --- a/examples/llama.android/app/src/main/res/values/strings.xml +++ /dev/null @@ -1,3 +0,0 @@ -<resources> - <string name="app_name">LlamaAndroid</string> -</resources> diff --git a/examples/llama.android/app/src/main/res/values/themes.xml b/examples/llama.android/app/src/main/res/values/themes.xml deleted file mode 100644 index 8a24fda..000000000 --- a/examples/llama.android/app/src/main/res/values/themes.xml +++ /dev/null @@ -1,5 +0,0 @@ -<?xml version="1.0" encoding="utf-8"?> -<resources> - - <style name="Theme.LlamaAndroid" parent="android:Theme.Material.Light.NoActionBar" /> -</resources> diff --git a/examples/llama.android/app/src/main/res/xml/backup_rules.xml b/examples/llama.android/app/src/main/res/xml/backup_rules.xml deleted file mode 100644 index 148c18b..000000000 --- a/examples/llama.android/app/src/main/res/xml/backup_rules.xml +++ /dev/null @@ -1,13 +0,0 @@ -<?xml version="1.0" encoding="utf-8"?><!-- - Sample backup rules file; uncomment and customize as necessary. - See https://developer.android.com/guide/topics/data/autobackup - for details. - Note: This file is ignored for devices older that API 31 - See https://developer.android.com/about/versions/12/backup-restore ---> -<full-backup-content> - <!-- - <include domain="sharedpref" path="."/> - <exclude domain="sharedpref" path="device.xml"/> ---> -</full-backup-content> diff --git a/examples/llama.android/app/src/main/res/xml/data_extraction_rules.xml b/examples/llama.android/app/src/main/res/xml/data_extraction_rules.xml deleted file mode 100644 index 0c4f95c..000000000 --- a/examples/llama.android/app/src/main/res/xml/data_extraction_rules.xml +++ /dev/null @@ -1,19 +0,0 @@ -<?xml version="1.0" encoding="utf-8"?><!-- - Sample data extraction rules file; uncomment and customize as necessary. - See https://developer.android.com/about/versions/12/backup-restore#xml-changes - for details. ---> -<data-extraction-rules> - <cloud-backup> - <!-- TODO: Use <include> and <exclude> to control what is backed up. - <include .../> - <exclude .../> - --> - </cloud-backup> - <!-- - <device-transfer> - <include .../> - <exclude .../> - </device-transfer> - --> -</data-extraction-rules> diff --git a/ggml/include/gguf.h b/ggml/include/gguf.h index 79ee202..5bc1a7a 100644 --- a/ggml/include/gguf.h +++ b/ggml/include/gguf.h @@ -78,7 +78,6 @@ extern "C" { GGML_API struct gguf_context * gguf_init_empty(void); GGML_API struct gguf_context * gguf_init_from_file(const char * fname, struct gguf_init_params params); - //GGML_API struct gguf_context * gguf_init_from_buffer(..); GGML_API void gguf_free(struct gguf_context * ctx); @@ -200,3 +199,8 @@ extern "C" { #ifdef __cplusplus } #endif + +#if defined(__cplusplus) +#include <streambuf> +GGML_API struct gguf_context * gguf_init_from_buffer(std::basic_streambuf<char>& streambuf, struct gguf_init_params params); +#endif diff --git a/ggml/src/gguf.cpp b/ggml/src/gguf.cpp index 8cc4ef1..58d06dd 100644 --- a/ggml/src/gguf.cpp +++ b/ggml/src/gguf.cpp @@ -216,14 +216,79 @@ struct gguf_context { void * data = nullptr; }; -struct gguf_reader { +struct gguf_bytes_reader { + /// @brief Reads up to `count` objects into the array `buffer`. + /// The position of the underlying stream implementation is advanced + /// by the number of characters read. + /// + /// @note If an error occurs, the resulting value of the underlying stream + /// position indicator is indeterminate. + virtual size_t read(void * buffer, size_t size, size_t count) = 0; + + /// @brief Seeks to a position aligned to the given alignment boundary. + /// @return The current position after alignment, or 0 on error. + virtual size_t align(size_t alignment) = 0; + + virtual ~gguf_bytes_reader() = 0; +}; + +gguf_bytes_reader::~gguf_bytes_reader() {} + +struct gguf_bytes_buffer_reader : public gguf_bytes_reader { + gguf_bytes_buffer_reader(std::basic_streambuf<char> & streambuf) : streambuf(streambuf), offset(0) {} + + ~gguf_bytes_buffer_reader() {} + + size_t read(void * buffer, size_t size, size_t count) override { + size_t total_size = size * count; + auto bytes_read = streambuf.sgetn(static_cast<char*>(buffer), total_size); + offset += bytes_read; + return bytes_read; + } + + size_t align(size_t alignment) override { + size_t new_offset = GGML_PAD(offset, alignment); + size_t seek_offset = new_offset - offset; + + auto result = streambuf.pubseekoff(seek_offset, std::ios_base::cur); + if (result == std::streampos(-1)) { + return 0; + } + offset = new_offset; + return offset; + } + + private: + std::basic_streambuf<char> & streambuf; + size_t offset; +}; + +struct gguf_bytes_file_reader : public gguf_bytes_reader { + gguf_bytes_file_reader(FILE * file) : file(file) {} + + ~gguf_bytes_file_reader() {} + + size_t read(void * buffer, size_t size, size_t count) override { return fread(buffer, 1, size * count, file); } + + size_t align(size_t alignment) override { + if (fseek(file, GGML_PAD(ftell(file), alignment), SEEK_SET) != 0) { + return 0; + } + return ftell(file); + } + + private: FILE * file; +}; - gguf_reader(FILE * file) : file(file) {} +struct gguf_reader { + gguf_bytes_reader& bytes_reader; + + gguf_reader(gguf_bytes_reader& bytes_reader) : bytes_reader(bytes_reader) {} template <typename T> bool read(T & dst) const { - return fread(&dst, 1, sizeof(dst), file) == sizeof(dst); + return bytes_reader.read(&dst, 1, sizeof(dst)) == sizeof(dst); } template <typename T> @@ -278,11 +343,11 @@ struct gguf_reader { return false; } dst.resize(size); - return fread(dst.data(), 1, dst.length(), file) == dst.length(); + return bytes_reader.read(dst.data(), 1, dst.length()) == dst.length(); } bool read(void * dst, const size_t size) const { - return fread(dst, 1, size, file) == size; + return bytes_reader.read(dst, 1, size) == size; } }; @@ -316,8 +381,8 @@ bool gguf_read_emplace_helper(const struct gguf_reader & gr, std::vector<struct return true; } -struct gguf_context * gguf_init_from_file_impl(FILE * file, struct gguf_init_params params) { - const struct gguf_reader gr(file); +namespace { +struct gguf_context * gguf_init_from_reader_impl(const struct gguf_reader& gr, struct gguf_init_params params) { struct gguf_context * ctx = new gguf_context; bool ok = true; @@ -610,15 +675,14 @@ struct gguf_context * gguf_init_from_file_impl(FILE * file, struct gguf_init_par GGML_ASSERT(int64_t(ctx->info.size()) == n_tensors); // we require the data section to be aligned, so take into account any padding - if (fseek(file, GGML_PAD(ftell(file), ctx->alignment), SEEK_SET) != 0) { - GGML_LOG_ERROR("%s: failed to seek to beginning of data section\n", __func__); + // store the current file offset - this is where the data section starts + ctx->offset = gr.bytes_reader.align(ctx->alignment); + if (ctx->offset == 0) { + GGML_LOG_ERROR("%s: failed to align data section\n", __func__); gguf_free(ctx); return nullptr; } - // store the current file offset - this is where the data section starts - ctx->offset = ftell(file); - // compute the total size of the data section, taking into account the alignment { ctx->size = 0; @@ -729,6 +793,13 @@ struct gguf_context * gguf_init_from_file_impl(FILE * file, struct gguf_init_par return ctx; } +} + +struct gguf_context * gguf_init_from_file_impl(FILE * file, struct gguf_init_params params) { + gguf_bytes_file_reader bytes_reader(file); + gguf_reader reader(bytes_reader); + return gguf_init_from_reader_impl(reader, params); +} struct gguf_context * gguf_init_from_file(const char * fname, struct gguf_init_params params) { FILE * file = ggml_fopen(fname, "rb"); @@ -743,6 +814,12 @@ struct gguf_context * gguf_init_from_file(const char * fname, struct gguf_init_p return result; } +struct gguf_context * gguf_init_from_buffer(std::basic_streambuf<char> & streambuf, struct gguf_init_params params) { + gguf_bytes_buffer_reader bytes_reader(streambuf); + gguf_reader reader(bytes_reader); + return gguf_init_from_reader_impl(reader, params); +} + void gguf_free(struct gguf_context * ctx) { if (ctx == nullptr) { return;
- Add code to be able to load a gguf file from a variant (memory or disk). - Some structs simplify how to load a file and keep track of the pointers (which are now in the same struct).
Move the loader code, that process a file after it has been loaded into memory and populate its own attributes, to a reusable method.
Add new C++ function to Llama main header to load from a single memory buffer, and propagate changes to internal calls/constructors.
Add a submodule with re-usable code for memory loading tests (single buffer).
Add some automatic tests that load from memory (single buffer)
The gguf-split utility now generates a `.txt` listing all tensors. Useful both for manual inspection/debugging and for incremental tensor loading where its not possible to know tensors present in other split files (the information is critical to handle optional tensors).
Add a flag to the tool to ensure some tensor names are always followed by another tensor and not at the end of a shard. This ensures the shard will not be released when the tensor is processed, and avoid missing-file failures of duplicate tensors that are re-referenced a few tensors later (typically token_embd.weight / output).
Show to which shards belongs each tensor
A file buffer that can be fulfilled using string keys. The extract method waits until the file is provided.
Handles the logic for incrementally loading files and tensors is model shards.
Refactor backend buffer creation (for model loading) into functions.
- The function now takes size_data instead of the member attribute. - Sanity checks of file pointer handles These two changes will be useful when calling `load_all_data` multiple times during incremental shard load.
Adapt the loader and model load to incrementally load files and upload tensors.
Add functions to Llama.cpp public headers to asynchronously load shards.
Add a submodule with re-usable code for memory loading tests (multiple buffers).
Add some automatic tests that load from memory (multiple async splits) diff --git a/ggml/include/gguf.h b/ggml/include/gguf.h index 3471d4b..377fc60de 100644 --- a/ggml/include/gguf.h +++ b/ggml/include/gguf.h @@ -200,7 +200,7 @@ extern "C" { } #endif -#if defined(__cplusplus) && __cplusplus >= 201703L +#if defined(__cplusplus) #include <ios> GGML_API struct gguf_context * gguf_init_from_buffer(std::basic_streambuf<char>& streambuf, struct gguf_init_params params); #endif diff --git a/tests/CMakeLists.txt b/tests/CMakeLists.txt index f1e5c22..87127a1 100644 --- a/tests/CMakeLists.txt +++ b/tests/CMakeLists.txt @@ -202,6 +202,7 @@ llama_build_and_test(test-backend-ops.cpp) llama_build_and_test(test-model-load-cancel.cpp LABEL "model") llama_build_and_test(test-model-load-disk.cpp LABEL "model") llama_build_and_test(test-model-load-memory.cpp LABEL "model") +llama_build_and_test(test-model-load-memory-split.cpp LABEL "model") llama_build_and_test(test-autorelease.cpp LABEL "model") if (NOT GGML_BACKEND_DL) diff --git a/tests/test-model-load-memory-split.cpp b/tests/test-model-load-memory-split.cpp new file mode 100644 index 000000000..2d3dd21 --- /dev/null +++ b/tests/test-model-load-memory-split.cpp @@ -0,0 +1,76 @@ +#include "get-model.h" +#include "llama-cpp.h" +#include "load-into-memory.h" + +#include <cstdlib> +#include <thread> +#include <vector> + +using namespace common_load_into_memory; + +int main(int argc, char * argv[]) { + auto * model_path = get_model_or_exit(argc, argv); + + if (!is_split_file(model_path)) { + printf("Skipping not-split model %s\n", model_path); + return EXIT_SUCCESS; + } + + // Manually load into a memory buffer first + llama_file_entry tensor_list_file = load_tensor_list_file(model_path); + std::vector<llama_file_entry> files = load_files_into_streambuf(model_path); + + llama_backend_init(); + auto params = llama_model_params{}; + params.use_mmap = false; + params.progress_callback = [](float progress, void * ctx) { + (void) ctx; + fprintf(stderr, "%.2f%% ", progress * 100.0f); + // true means: Don't cancel the load + return true; + }; + + printf("Loading model from %zu files\n", files.size()); + + std::vector<const char *> file_paths; + for (size_t i = 0; i < files.size(); i++) { + printf("Found file %s \n", files[i].path.c_str()); + file_paths.push_back(files[i].path.c_str()); + } + + const char * async_load_context = "test-model-load"; + std::thread fulfill_thread([&files, &tensor_list_file, &async_load_context]() { + const bool success = llama_model_load_fulfill_split_future(tensor_list_file.path.c_str(), async_load_context, + std::move(tensor_list_file.streambuf)); + printf("Fulfilling tensor list file %s: %s\n", tensor_list_file.path.c_str(), success ? "success" : "failure"); + if (!success) { + exit(EXIT_FAILURE); + } + for (size_t i = 0; i < files.size(); i++) { + const bool success = llama_model_load_fulfill_split_future(files[i].path.c_str(), async_load_context, + std::move(files[i].streambuf)); + printf("Fulfilling file %s: %s\n", files[i].path.c_str(), success ? "success" : "failure"); + if (!success) { + exit(EXIT_FAILURE); + } + } + }); + fprintf(stderr, "Loading model from splits\n"); + auto * model = llama_model_load_from_split_futures(file_paths.data(), file_paths.size(), async_load_context, + tensor_list_file.path.c_str(), params); + fulfill_thread.join(); + + fprintf(stderr, "\n"); + + if (model == nullptr) { + fprintf(stderr, "Failed to load model\n"); + llama_backend_free(); + return EXIT_FAILURE; + } + + fprintf(stderr, "Model loaded successfully\n"); + llama_model_free(model); + llama_backend_free(); + + return EXIT_SUCCESS; +}
98d10a3 to
71c63a4
Compare
Author
|
Closing since the intent of this PR was to test CI is green. (see run https://github.com/tetherto/qvac-ext-lib-llama.cpp/actions/runs/18721917256). |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Test that CI passes for the changes we want to submit to upstream ggml project.
Based on latest published version of Llama.cpp (as of 22th Oct 2025).