信息提取是一种高级的 AI 应用场景,它可以从复杂的数据或文本中提取出关键信息,使信息更加结构化和易于理解。这个场景在数据处理和信息管理中非常有价值,特别适用于以下情形:
- 从长篇内容中提取关键数据:例如,将网页内容转换为表格形式,或者从大量文本中提取重要信息。
- 按特定格式分类信息:比如对文章内容进行信息分类和标签化。
信息提取示例
在信息提取应用中,AI 可以按照用户的需求对大量文本进行结构化处理。例如,将一篇长篇文章中的关键信息提取并按照特定格式呈现。这不仅能提高信息的可读性,也有助于后续的数据分析和处理。
复制Extract the important entities mentioned in the article below. First extract all company names, then extract all people names, then extract specific topics which fit the content and finally extract general overarching themes Desired format: Company names: <comma_separated_list_of_company_names> People names: -||- Specific topics: -||- General themes: -||- Text: """Powering Next Generation Applications with OpenAI Codex Codex is now powering 70 different applications across a variety of use cases through the OpenAI API. May 24, 2022 4 minute read OpenAI Codex, a natural language-to-code system based on GPT-3, helps turn simple English instructions into over a dozen popular coding languages. Codex was released last August through our API and is the principal building block of GitHub Copilot. Warp is a Rust-based terminal, reimagined from the ground up to help both individuals and teams be more productive in the command-line. Terminal commands are typically difficult to remember, find and construct. Users often have to leave the terminal and search the web for answers and even then the results might not give them the right command to execute. Warp uses Codex to allow users to run a natural language command to search directly from within the terminal and get a result they can immediately use. “Codex allows Warp to make the terminal more accessible and powerful. Developers search for entire commands using natural language rather than trying to remember them or assemble them piecemeal. Codex-powered command search has become one of our game changing features.” —Zach Lloyd, Founder, Warp Machinet helps professional Java developers write quality code by using Codex to generate intelligent unit test templates. Machinet was able to accelerate their development several-fold by switching from building their own machine learning systems to using Codex. The flexibility of Codex allows for the ability to easily add new features and capabilities saving their users time and helping them be more productive. “Codex is an amazing tool in our arsenal. Not only does it allow us to generate more meaningful code, but it has also helped us find a new design of product architecture and got us out of a local maximum.” —Vladislav Yanchenko, Founder, Machinet"""
Extract the important entities mentioned in the article below. First extract all company names, then extract all people names, then extract specific topics which fit the content and finally extract general overarching themes Desired format: Company names: <comma_separated_list_of_company_names> People names: -||- Specific topics: -||- General themes: -||- Text: “””Powering Next Generation Applications with OpenAI Codex Codex is now powering 70 different applications across a variety of use cases through the OpenAI API. May 24, 2022 4 minute read OpenAI Codex, a natural language-to-code system based on GPT-3, helps turn simple English instructions into over a dozen popular coding languages. Codex was released last August through our API and is the principal building block of GitHub Copilot. Warp is a Rust-based terminal, reimagined from the ground up to help both individuals and teams be more productive in the command-line. Terminal commands are typically difficult to remember, find and construct. Users often have to leave the terminal and search the web for answers and even then the results might not give them the right command to execute. Warp uses Codex to allow users to run a natural language command to search directly from within the terminal and get a result they can immediately use. “Codex allows Warp to make the terminal more accessible and powerful. Developers search for entire commands using natural language rather than trying to remember them or assemble them piecemeal. Codex-powered command search has become one of our game changing features.” —Zach Lloyd, Founder, Warp Machinet helps professional Java developers write quality code by using Codex to generate intelligent unit test templates. Machinet was able to accelerate their development several-fold by switching from building their own machine learning systems to using Codex. The flexibility of Codex allows for the ability to easily add new features and capabilities saving their users time and helping them be more productive. “Codex is an amazing tool in our arsenal. Not only does it allow us to generate more meaningful code, but it has also helped us find a new design of product architecture and got us out of a local maximum.” —Vladislav Yanchenko, Founder, Machinet”””
输出
通过格式词阐述所需输出格式
在信息提取过程中,一个重要的技巧是在 prompt 中明确指出所需的输出格式。这种做法可以指导 AI 更准确地提取和组织信息。
例如,以下是一个信息提取的 prompt 示例,要求 AI 将文章中的信息提取并按特定格式输出:
复制Extract the important entities mentioned in the article below. First extract all company names, then extract all people names, then extract specific topics which fit the content and finally extract general overarching themes Desired format: Company names: <comma_separated_list_of_company_names> People names: <comma_separated_list_of_people_names> Specific topics: <comma_separated_list_of_specific_topics> General themes: <comma_separated_list_of_general_themes> Text: """ Thank you so much, Fred, for that lovely introduction. And thanks to the Atlantic Council for hosting me today. The course of the global economy over the past two years has been shaped by COVID-19 and our efforts to fight the pandemic. It’s now evident, though, that the war between Russia and Ukraine has redrawn the contours of the world economic outlook. Vladimir Putin’s unprovoked attack on Ukraine and its people is taking a devastating human toll, with lives tragically lost, families internally displaced or becoming refugees, and communities and cities destroyed. ... """
Extract the important entities mentioned in the article below. First extract all company names, then extract all people names, then extract specific topics which fit the content and finally extract general overarching themes Desired format: Company names: <comma_separated_list_of_company_names> People names: <comma_separated_list_of_people_names> Specific topics: <comma_separated_list_of_specific_topics> General themes: <comma_separated_list_of_general_themes> Text: “”” Thank you so much, Fred, for that lovely introduction. And thanks to the Atlantic Council for hosting me today. The course of the global economy over the past two years has been shaped by COVID-19 and our efforts to fight the pandemic. It’s now evident, though, that the war between Russia and Ukraine has redrawn the contours of the world economic outlook. Vladimir Putin’s unprovoked attack on Ukraine and its people is taking a devastating human toll, with lives tragically lost, families internally displaced or becoming refugees, and communities and cities destroyed. … “””
这种方法不仅适用于信息总结,还可以用于更复杂的数据提取和分类任务。
使用特殊符号分隔指令和文本
在处理大量文本时,使用特殊符号如三引号“`”来区分指令和待处理文本是一个有效的技巧。这可以帮助 AI 更清晰地理解用户的需求,并减少错误或不相关的输出。
例如,以下是一个改进后的信息提取 prompt:
复制Please extract key information from the following text and summarize it. Text: """ (长篇文章内容) """
Please extract key information from the following text and summarize it. Text: “”” (长篇文章内容) “””
在开发允许用户输入并要求 AI 提取信息的应用时,这种技巧尤其有用。
结合其他技巧实现高级信息提取
通过结合前面介绍的技巧,比如增加示例、设定特定角色或格式,可以进一步提升 AI 在信息提取任务中的效果。这样的组合使用可以使输出更加符合用户的具体需求,从而提升整体的应用体验和效率。
阅读全文
温馨提示: