Business datasets are often too large to process entirely. What is the best method for sampling data to build an efficient vector store?