Machine Learning Upgrade: a Data Scientist's Guide to MLOps, LLMs, and ML Infrastructure by Kehrer Kristen;Kaiser Caleb; & Caleb Kaiser

Machine Learning Upgrade: a Data Scientist's Guide to MLOps, LLMs, and ML Infrastructure by Kehrer Kristen;Kaiser Caleb; & Caleb Kaiser

Author:Kehrer, Kristen;Kaiser, Caleb; & Caleb Kaiser [Kehrer, Kristen & Kaiser, Caleb]
Language: eng
Format: epub
Publisher: John Wiley & Sons, Incorporated
Published: 2024-07-04T00:00:00+00:00


The second thing you'll need to improve your inference pipeline is a metric to optimize. With LLMs, this is a tricky task. Many of the traditional natural language processing metrics, like Recall-Oriented Understudy for Gisting Evaluation (ROUGE) scores, are just too simple in their heuristics to accurately score LLMs. ROUGE is a set of metrics used to evaluate the quality of automatic summarization systems. There are multiple variations, including ROUGE-N, ROUGE-L, ROUGE-W, and more. One of the best approaches research teams have taken recently is to use humans as direct evaluators, but this too creates problems, not the least of which is the associated cost of manually scoring samples.

Because of this, most researchers are stuck implementing custom scoring functions for their particular task, often combining different metrics like BERTScore, ROUGE, and custom benchmarks. With code generation, you have the advantage of being able to use unit tests to evaluate whether the code works, and that is exactly what you'll be doing in this next exercise.

Your task is to build a pipeline that, given a description of a Python function and some associated unit tests, will generate an acceptable piece of code.

To test your pipeline, you'll use the following prompt template:

code_gen_template = """#INSTRUCTION: Write a Python function named {name} that {description}. Make sure to include all necessary imports. #RESPONSE """ code_gen_template_w_tests = """#INSTRUCTION: Write a Python function named {name} that {description}. Make sure to include all necessary imports. The function {name} will be evaluated with the following unit tests: {tests} #RESPONSE """

You'll also need some prompts and associated unit tests. The full code for the unit tests is available at this book's GitHub, but in general, the unit tests look like this:

class TestGenerateImage(unittest.TestCase): def test_valid_input(self): width, height = 200, 300 image = generate_image(f'{width}x{height}') self.assertEqual(image.size, (width, height))

They are accompanied by a variable containing all of the code for the unit tests as a string. You can store all of this information, along with your prompts, in a list like so:

TESTS = [ { "name": "generate_image(dimensions)", "description": "takes a string containing the dimensions of an image, like '200x300', and generates an image of those dimensions using 3 random colors, before finally returning the image object.", "tests": image_tests, "tests_class": TestGenerateImage }, { "name": "evaluate_expression(expression)", "description": "takes a string containing a mathematical equation, parses the equation, and returns its evaluated result.", "tests": math_tests, "tests_class": TestEvaluateExpression }, { "name": "merge_k_lists(lists)", "description": "takes an array of k linked-lists lists, each sorted in ascending order, and merges all the linked-lists into one sorted linked-list, returning the final sorted linked-list.", "test": merge_k_tests, "tests_class": TestMergeKLists } ]

Now, to perform inference, you'll need a pipeline, including nodes for your prompt and for evaluating your output, as shown in Listing 4.3.



Download



Copyright Disclaimer:
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.
Popular ebooks
In-Memory Analytics with Apache Arrow by Matthew Topol(2694)
Data Forecasting and Segmentation Using Microsoft Excel by Fernando Roque(2691)
PostgreSQL 14 Administration Cookbook by Simon Riggs(2218)
Cloud Auditing Best Practices: Perform Security and IT Audits across AWS, Azure, and GCP by building effective cloud auditing plans by Shinesa Cambric Michael Ratemo(1615)
Architects of Intelligence_The Truth About AI From the People Building It by Martin Ford(1239)
In-Memory Analytics with Apache Arrow: Perform fast and efficient data analytics on both flat and hierarchical structured data by Matthew Topol(1036)
Mastering Azure Virtual Desktop: The Ultimate Guide to the Implementation and Management of Azure Virtual Desktop by Ryan Mangan(1013)
Automated Machine Learning in Action by Qingquan Song Haifeng Jin Xia Hu(902)
Python GUI Programming with Tkinter, 2nd edition by Alan D. Moore(870)
Ansible for Real-Life Automation - A complete Ansible handbook filled with practical IT automation use cases (2022) by Packt(741)
Learn Wireshark - A definitive guide to expertly analyzing protocols and troubleshooting networks using Wireshark - 2nd Edition (2022) by Packt(734)
Data Engineering with Scala and Spark by Eric Tome Rupam Bhattacharjee David Radford(414)
Introduction to Algorithms, Fourth Edition by unknow(361)
ABAP Development for SAP HANA by Unknown(358)
Automated Machine Learning in Action by Qingquan Song & Haifeng Jin & Xia Hu(302)
Kubernetes Secrets Handbook by Emmanouil Gkatziouras | 
Rom Adams
 | Chen Xi(284)
Asynchronous Programming in Rust by Carl Fredrik Samson;(257)
Learn Enough Developer Tools to Be Dangerous: Git Version Control, Command Line, and Text Editors Essentials by Michael Hartl(255)
Machine Learning for Imbalanced Data by Kumar Abhishek Dr. Mounir Abdelaziz(250)
The AWK Programming Language by Aho Alfred V. Kernighan Brian W. Weinberger Peter J. & Brian W. Kernighan & Peter J. Weinberger(238)