0
  • 聊天消息
  • 系統(tǒng)消息
  • 評論與回復
登錄后你可以
  • 下載海量資料
  • 學習在線課程
  • 觀看技術(shù)視頻
  • 寫文章/發(fā)帖/加入社區(qū)
會員中心
創(chuàng)作中心

完善資料讓更多小伙伴認識你,還能領(lǐng)取20積分哦,立即完善>

3天內(nèi)不再提示

簡述開放域長格式問答系統(tǒng)的進步與挑戰(zhàn)

Tensorflowers ? 來源:Google Research 研究員 Aurko ? 作者:Google Research 研究員 ? 2021-05-31 10:02 ? 次閱讀

發(fā)布人:Google Research 研究員 Aurko Roy

開放域長格式問答 (LFQA) 是自然語言處理 (NLP) 的一項基礎挑戰(zhàn),涉及檢索與給定問題相關(guān)的文檔,并使用這些文檔來生成一段詳盡答案。在事實型開放域問答 (QA) 中,簡單的短語或?qū)嶓w便足以回答問題。雖然我們近期在這一方面取得了顯著進展,但在長格式問答領(lǐng)域中卻做得遠遠不夠。盡管如此,LFQA 仍是一項非常重要的任務,特別是它能提供一個測試平臺來衡量生成文本模型的真實性。但是,當前的基準和評估指標真的能在 LFQA 方面取得進展嗎?

在“在長格式問答領(lǐng)域取得進展的障礙”(Hurdles to Progress in Long-form Question Answering)(將在 NAACL 2021 會議上發(fā)表)中,我們介紹了一種新的開放域長格式問答系統(tǒng),它利用了 NLP 的兩項最新進展:

1.最先進的稀疏注意力模型(例如 Routing Transformer(RT)),能夠?qū)⒒谧⒁饬Φ哪P蛿U展至長序列;

2.基于檢索的模型(例如 REALM),有助于檢索與給定查詢相關(guān)的維基百科文章。

Routing Transformer

https://www.mitpressjournals.org/doi/full/10.1162/tacl_a_00353

為獲得更多的事實依據(jù),對于檢索到的與給定問題相關(guān)的一些維基百科文章,我們的系統(tǒng)會在答案生成之前將從中獲得的信息結(jié)合起來 ELI5 是唯一一個可用于長格式問答的大規(guī)模公開數(shù)據(jù)集,我們的系統(tǒng)在該數(shù)據(jù)集上取得了突破性進展。

ELI5

https://ai.facebook.com/blog/longform-qa/

不過,雖然這個系統(tǒng)在公開排行榜上名列前茅,但我們發(fā)現(xiàn) ELI5 數(shù)據(jù)集及其相關(guān)評估指標的一些趨勢令人擔憂。特別要強調(diào)的是,我們發(fā)現(xiàn) 1) 幾乎沒有證據(jù)表明模型實際使用了它們所要求的檢索;2) 平凡基線(例如輸入復制)擊敗了現(xiàn)代系統(tǒng),如 RAG/BART+DPR;以及 3) 數(shù)據(jù)集中存在大量訓練/驗證重疊。我們的論文針對每一個問題提出了緩解策略。

輸入復制

https://eval.ai/web/challenges/challenge-page/689/leaderboard/1908#leaderboardrank-6

文本生成

NLP 模型的核心要件是 Transformer 架構(gòu),其序列中的每個 Token 都會關(guān)注序列中的其他所有 Toekn,從而形成一個隨序列長度呈二次增長的模型。RT 模型引入了一種基于內(nèi)容的動態(tài)稀疏注意力機制,將 Transformer 模型中的注意力復雜度從 n2 降到了 n1.5( 其中 n 是序列長度),使其能夠擴展到長序列。這使得每個單詞都可以關(guān)注整個文本中 任何地方的其他相關(guān)單詞, 而不像 Transformer XL 等類似方法,一個單詞只能關(guān)注其附近的單詞。

RT 發(fā)揮作用的關(guān)鍵在于每個 Token 對其他 Token 的關(guān)注通常是冗余的,并且可以通過結(jié)合局部和全局注意力進行估算。局部注意力允許每個 Token 在模型的幾個層上建立一個局部表征,其中每個 Token 關(guān)注一個局部鄰域,從而達到局部的一致性和流暢性。作為對局部注意力的補充,RT 模型還使用了小批量 k-均值集群, 使每個 Token 只關(guān)注一組最相關(guān)的 Token 。

我們以語言建模為目標,使用 ProjectGutenberg(PG-19) 數(shù)據(jù)集預先訓練了一個 RT 模型,即在給定前面所有單詞的情況下,讓該模型學會預測下一個單詞,從而能夠生成流利的段落長文本。

ProjectGutenberg(PG-19)

https://deepmind.com/blog/article/A_new_model_and_dataset_for_long-range_memory

信息檢索

為了證明 RT 模型在 LFQA 任務中的有效性,我們將其與 REALM 中檢索到的內(nèi)容結(jié)合使用。REALM 模型(Guu 等人于 2020 年發(fā)布)是基于檢索的模型,使用最大內(nèi)積搜索來檢索與特定查詢或問題相關(guān)的維基百科文章。我們對該模型進行了微調(diào),以便根據(jù)自然問題數(shù)據(jù)集作出事實型問答。REALM 利用 BERT 模型學習問題的良好表征,并使用 SCANN 檢索與問題表征具有高度主題相似性的維基百科文章。接著進行端到端訓練,以最大程度地提高 QA 任務的對數(shù)似然值。

通過使用對比損失,我們進一步提高了 REALM 檢索的質(zhì)量。其背后的想法是讓問題表征更靠近其基本事實答案,并與其他小批量答案有所不同。這樣可以確保,當系統(tǒng)使用此問題表征檢索相關(guān)項目時,會返回與基本事實答案“類似”的文章。我們稱這種檢索器為對比型-REALM 或 c-REALM。

對比損失

https://towardsdatascience.com/contrastive-loss-explaned-159f2d4a87ec

評估

我們使用 ELI5 數(shù)據(jù)集對該模型進行了長格式問答測試。ELI5 數(shù)據(jù)集是 KILT 基準的一部分, 也是唯一 公開可用的大規(guī)模 LFQA 數(shù)據(jù)集。KILT 基準使用準確率 (R-Prec) 衡量文本檢索質(zhì)量,并使用 ROUGE-L 衡量文本生成質(zhì)量。將這兩項得分相加即可得出一個 KILT R-L 得分,該得分決定了模型在排行榜上的排名。在 KILT 的 ELI5 數(shù)據(jù)集上,我們對預先訓練好的 RT 模型和 c-REALM 檢索進行了微調(diào)。

我們提交的內(nèi)容在 ELI5 長格式問答 KILT 排行榜上名列前茅,綜合 KILT R-L 得分為 2.36。對比先前 BART + DPR 排行榜的排名(KILT R-L 得分為 1.9)有了提升,同時具有與排行榜上其他模型相似的參數(shù)數(shù)量。在文本生成質(zhì)量方面,我們發(fā)現(xiàn) Rouge-L 與 T5、BART + DPR 和 RAG 相比,得分分別提高了 4.11、5.78 和 9.14。

排行榜

https://eval.ai/web/challenges/challenge-page/689/leaderboard/1908

RT 模型生成的示例

Q1

Why are almost all boats white?

Boats are generally painted white for obvious reasons, as they are inexpensive and easy to clean. Ships are also often painted to provide a visible visual contrast to the sea. Boats also often have high visibility ports to allow for easy international trade.

In addition, many boats have an air-breathing function that makes them easier to spot and stay afloat, which is a good thing if you're a pirate or trying to get into some country's waters. Lastly, many ships are built in the last few decades, so they are not as exotic looking as they were thirty years ago.

And lastly, many ships are made out of materials that are easy to work with, and so many of them are painted white as a practical matter. There are many, many more reasons for why ships are white, but those are the ones I know of. I'm sure there are many other reasons, but those are the ones I know of.

Edit: I forgot to mention that the other answers about visibility and easy to see ports are also true. Ships are painted to appear to be larger, and more expensive. That makes them appear more impressive, and so they are more likely to be bought by people who want to show off their wealth and status.

Q2

What causes the trail behind jets at high altitude?

The same thing that causes the clouds you see at the start of a storm to form in the first place. The air pressure changes as the jet moves through the air, causing it to cool down and condense into water droplets. These droplets are in fact a lot bigger than the water droplets in the clouds, so when the plane passes through them, they can cause the air pressure to change, causing the water droplets to condense into clouds.

This is what you see when you look at a cloud formation from the ground. In addition to this, when the plane passes through the clouds, it also moves through the air, causing it to change its temperature, causing the water droplets to condense and form clouds.

These clouds are often quite large, and can block out the sun. This is what you see when you look at a contrail. Finally, the water vapor that forms at high altitudes is cooled by the air above, and as it cools, it condenses into clouds. This is what you see when you look at a cumulonimbus cloud. These clouds are even larger than the water droplets in the clouds, and they block out the sun.

Q3

How do shampoo and conditioner work?

The key ingredients in shampoo and conditioner are usually surfactants, they tend to attract oil from your skin. Shampoo will usually contain an anti-dandruff agent (dandruff is caused by fungi in your skin which cause dandruff). They also contain some moisturizing agents (moisturizers are things that stay in your skin longer and protect your scalp from cleaning),

And a lot of anti-dandruff (dandruff is a side effect of high doses of both surfactants and moisturizers) Conditioner will contain moisturizing agents, like a skin moisturizer. They tend to cause oils to be separated from your hair (e.g. wash your hair with conditioner instead of shampoo). They both have a foaming agent that makes them foam up (usually Hydrogen Peroxide). They also have a foaming agent that stops dirt from attaching to your hair. Some conditioner will also suppress the action of the shampoo (to help prevent dirt from attaching to your hair).

Q4

When glancing at a clock, why does the first second after glancing at it sometimes feel longer than the rest?

This is known as chronostasis. Your brain gets used to seeing the second hand move after the first, so it ignores the first one. If you were to look at the clock a lot, you'd see the second hand move a lot more than the first, and you'd be able to tell the difference between the second and the first.

The second hand will appear to move much slower than the first hand because your brain is used to seeing the second hand move a lot more than the first. If you were to go on a long trip, you'd get used to the second hand moving a lot more than the first, and you'd have no problem telling the difference between the second and the first hand.

It's kind of like how you don't notice your nose when you're driving, but if you look at it for a while, you start to notice it. Your brain is just getting used to seeing the nose move a lot more than the other, and it can't tell the difference between the two. It's kind of like that with your eyes.

Q5

Why do Britain and other English empire countries still bow to monarchs? What real purpose does the queen serve?

Depends what you define as "purpose". There is no specific purpose for the British monarch - rather, they are figureheads that serve a role. The role is to have a high degree of influence over a country, though the rules of the monarchy can somewhat vary from country to country, and the ceremony of "kissing the royal corpse" does not always involve kissing the monarch. Whether that's the only reason for the custom, or if it was the only reason,

I can't say, but that's the reasoning. When the U.K. was in full power, the monarch was a puppet of Parliament, and the powers of the monarch were transferred to the Prime Minister and thus to the Prime Minister's deputy, who then became the Prime Minister. Since then, Parliament has been able to vote on legislation that goes through the monarch,

although they may still act as the monarch's representative in negotiating treaties, which can have very very deep consequences. The Queen's role, as a representative of Britain, doesn't necessarily involve her formal approval of any of the laws or legislation that goes through Parliament, though.

Q6

What exactly is fire, in detail? How can light and heat come from something we can't really touch?

Fire is the chemical reaction of fuel (oxygen) with (what we call) impurities (ash, soot, oil etc). These impurities are created due to a number of things. If you were to study the chemical composition of a log (or a candle) and analyse it you would discover the chemical formula (A, C, and G) which is a simplified version of what actually happens. There is always fuel, and there is always the ash/soot. Wood is made up of mostly carbon and hydrogen and these are the fuels that create the fire. Hydrogen and oxygen are reactive, and they form water (H2O) and CO2 (CO2). If we split a log into smaller pieces, we start to see the process of combustion as it occurs. The carbon is burned, and with the explosion of CO, H2O and H2CO2, we get water and carbon dioxide (mostly). If we smash a log together, we start to see what happens when we also hit another log with another log...imagine smashing two sand castles together. We see how the sand castles and brick castles form and collapse at different rates...then it starts to burn. The smoke, ash and flames are the actual combustion of fuel.

Q7

What goes on in those tall tower buildings owned by major banks?

The actual buildings are not there for profit but for show. There are many real world historical buildings that have little to no effect other than being a symbol of an economic entity or symbol of an era or location, and nothing more. For example look at Sears, Sears didn't care what went on inside, it was all about the \_appearance\_ of its location, the prestige of the location, the facilities and so on. It didn't care about how long it took it to operate, it was about how much people would pay to go see it. Sears was a landmark as a cultural movement and other big companies followed suit, so if you want to see a building you've never seen before, you have to go see Sears, just like you have to see a Toyota Camry for Toyota Camry. They used to be all about building new factories, some of them if I recall, but now that they're bigger, that means that more factory jobs are coming to them. You've probably seen them in stores as stores where people buy and sell stuff, so there aren't that many places for them to come from. Instead, it's just for show, a symbol of rich people.

在 LFQA 領(lǐng)域取得進展的障礙

不過,雖然此處介紹的 RT 系統(tǒng)在公開排行榜中名列前茅,但對模型和 ELI5 數(shù)據(jù)庫的詳細分析仍揭示了一些令人擔憂的趨勢。

Train/Valid Overlap

Many held-out questions are paraphrased in the training set. Best answer to similar train questions gets 27.4 ROUGE-L.

Lack of Grounding

Conditioning answer generation on random documents instead of relevant ones does not measurably impact its factual correctness. Longer outputs get higher ROUGE-L.

我們發(fā)現(xiàn),幾乎沒有任何證據(jù)表明模型會將其文本生成實際定位到檢索文檔中。與 Wikipedia 中的隨機檢索搭配使用的微調(diào) RT 模型(例如,隨機檢索 + RT),幾乎與 c-REALM + RT 模型(24.2 與 24.4 ROUGE-L)表現(xiàn)得一樣好。在訓練、驗證和測試 ELI5 數(shù)據(jù)集時,我們還發(fā)現(xiàn)了很多的重疊(幾個問題相互解釋),因此可能不再需要檢索。KILT 基準會單獨衡量檢索和生成的質(zhì)量,但不確定文本生成是否會在實際情況中使用檢索。

與 RAG 和 BART + DPR 相比,平凡基線會獲得更高的 Rouge-L 分數(shù)

此外,在使用 Rouge-L 指標和平凡無意義基線(如隨機訓練集答案和輸入復制)來評估文本生成質(zhì)量的過程中,我們發(fā)現(xiàn)了一些問題,并導致 Rouge-L 分數(shù)相對較高(甚至超過了 BART + DPR 和 RAG)。

結(jié)論

我們?yōu)榛?Routing Transformers 和 REALM 的長格式問答推出了一個系統(tǒng),該系統(tǒng)在關(guān)于 ELI5 的 KILT 排行榜中名列前茅。但是,詳細的分析揭示了存在的一些問題,即無法使用基準來顯示有意義的建模進展。我們希望社區(qū)共同合作,一起解決這些問題,以便研究人員向正確的高峰攀登,在這個充滿挑戰(zhàn)但十分重要的任務中取得有意義的進展。

致謝

Routing Transformer 是 Aurko Roy、Mohammad Saffar、Ashish Vaswani 和 David Grangier 等人進行團隊協(xié)作的結(jié)果。有關(guān)開放域長格式問答的后續(xù)工作是由 Kalpesh Krishna、Aurko Roy 和 Mohit Iyyer 協(xié)作完成的。我們要感謝 Vidhisha Balachandran、Niki Parmar 和 Ashish Vaswani 提供的多條實用意見,感謝 REALM 團隊 (Kenton Lee、Kelvin Guu、Ming-Wei Chang 和 Zora Tung) 在代碼庫方面提供的幫助以及多條實用意見,這些意見幫助我們進一步完善了實驗。

我們非常感謝 Tu Vu 針對 QQP 分類器提供的幫助,這些分類器用于在 ELI5 訓練集和測試集中檢測解釋。感謝 Jules Gagnon-Marchand 和 Sewon Min 對檢查 ROUGE-L 邊界提供的有用實驗建議。最后,感謝 Shufan Wang、Andrew Drozdov、Nader Akoury 以及 UMass NLP 小組的其他成員針對項目的不同階段提出的實用意見和建議。

編輯:jq

聲明:本文內(nèi)容及配圖由入駐作者撰寫或者入駐合作網(wǎng)站授權(quán)轉(zhuǎn)載。文章觀點僅代表作者本人,不代表電子發(fā)燒友網(wǎng)立場。文章及其配圖僅供工程師學習之用,如有內(nèi)容侵權(quán)或者其他違規(guī)問題,請聯(lián)系本站處理。 舉報投訴
  • 數(shù)據(jù)集
    +關(guān)注

    關(guān)注

    4

    文章

    1209

    瀏覽量

    24798
  • nlp
    nlp
    +關(guān)注

    關(guān)注

    1

    文章

    489

    瀏覽量

    22079

原文標題:開放域長格式問答系統(tǒng)的進步與挑戰(zhàn)

文章出處:【微信號:tensorflowers,微信公眾號:Tensorflowers】歡迎添加關(guān)注!文章轉(zhuǎn)載請注明出處。

收藏 人收藏

    評論

    相關(guān)推薦

    基于華為云 Flexus 云服務器 X 搭建部署——AI 知識庫問答系統(tǒng)(使用 1panel 面板安裝)

    ???對于企業(yè)來講為什么需要華為云 Flexus X 來搭建自己的知識庫問答系統(tǒng)??? 【重塑知識邊界,華為云 Flexus 云服務器 X 引領(lǐng)開源問答新紀元!】 ???解鎖知識新動力,華為云
    的頭像 發(fā)表于 01-17 09:45 ?168次閱讀
    基于華為云 Flexus 云服務器 X 搭建部署——AI 知識庫<b class='flag-5'>問答</b><b class='flag-5'>系統(tǒng)</b>(使用 1panel 面板安裝)

    儀器知識問答小課堂

    關(guān)于儀器設備實驗中的各種知識問題的問答
    的頭像 發(fā)表于 12-27 16:21 ?133次閱讀
    儀器知識<b class='flag-5'>問答</b>小課堂

    混合示波器的原理和應用

    ,從而進行深入的測量和分析。 二、應用 捕獲和分析復雜信號:混合示波器能夠同時捕獲時間相關(guān)的模擬、數(shù)字和射頻信號,從而獲得完整的系統(tǒng)級觀測。這使得工程師能夠快速解決復雜的設計問題,如定位和分析電路中
    發(fā)表于 12-27 15:54

    一文了解底盤控之制動功能

    1底盤控基礎1.1底盤控的概念1.2線控底盤技術(shù)1)線控制動系統(tǒng)2)線控轉(zhuǎn)向系統(tǒng)2制動系統(tǒng)功能2.1制動
    的頭像 發(fā)表于 12-13 16:46 ?937次閱讀
    一文了解底盤<b class='flag-5'>域</b>控之制動功能

    TE一站式解決方案,助您“”見未來

    在汽車電子系統(tǒng)中, “”的概念源于車輛功能的不斷增加和電子電氣架構(gòu)的迭代升級。傳統(tǒng)的分布式控制架構(gòu)難以高效管理眾多功能,因此,車輛系統(tǒng)逐漸被劃分為不同的“功能”(如動力
    的頭像 發(fā)表于 11-04 10:20 ?322次閱讀

    50萬獎金池!開放原子大賽——第二屆OpenHarmony創(chuàng)新應用挑戰(zhàn)賽正式啟動

    第二屆OpenHarmony創(chuàng)新應用挑戰(zhàn)賽作為開放原子大賽旗下的重要賽項,聚焦 OpenHarmony應用開發(fā),致力提升開發(fā)者的動手實踐能力與開發(fā)創(chuàng)新應用的能力。 賽項要求開發(fā)者
    發(fā)表于 10-24 15:40

    短文6:關(guān)于功率因素的有趣問答

    2個關(guān)于功率因素的有趣問答。
    的頭像 發(fā)表于 09-23 12:22 ?225次閱讀

    【?嵌入式機電一體化系統(tǒng)設計與實現(xiàn)?閱讀體驗】+磁力輸送機系統(tǒng)設計的創(chuàng)新與挑戰(zhàn)

    磁力輸送機系統(tǒng)設計的創(chuàng)新與挑戰(zhàn) 在現(xiàn)代工程技術(shù)領(lǐng)域,磁力輸送機系統(tǒng)作為一項前沿技術(shù),正逐漸成為提高物流效率、減少能耗、增強系統(tǒng)穩(wěn)定性的關(guān)鍵解決方案。本報告基于文獻[22]的介紹,深入探
    發(fā)表于 09-14 22:44

    浪潮信息趙帥:開放計算創(chuàng)新 應對Scaling Law挑戰(zhàn)

    Scaling Law帶來的AI基礎設施Scale up和Scale out的挑戰(zhàn),數(shù)據(jù)中心需要以開放創(chuàng)新加速算力系統(tǒng)、管理和基礎設施的全向Scale進程,推動AI產(chǎn)業(yè)的創(chuàng)新發(fā)展。 ? 開源
    的頭像 發(fā)表于 08-15 16:02 ?327次閱讀
    浪潮信息趙帥:<b class='flag-5'>開放</b>計算創(chuàng)新 應對Scaling Law<b class='flag-5'>挑戰(zhàn)</b>

    can數(shù)據(jù)幀有哪幾個組成

    CAN總線上傳輸數(shù)據(jù)的格式。一個典型的CAN數(shù)據(jù)幀由多個組成,包括幀起始、仲裁、控制、數(shù)據(jù)、CRC
    的頭像 發(fā)表于 07-24 15:14 ?1173次閱讀

    CAN數(shù)據(jù)幀的各個及其作用

    CAN(Controller Area Network)是一種用于汽車電子系統(tǒng)中的通信協(xié)議,它具有高可靠性、實時性和靈活性等特點。在CAN通信中,數(shù)據(jù)幀是最基本的通信單元,用于傳輸信息。 概述
    的頭像 發(fā)表于 07-24 15:10 ?1617次閱讀

    llm模型有哪些格式

    LLM(Large Language Model,大型語言模型)是一種深度學習模型,主要用于處理自然語言處理(NLP)任務。LLM模型的格式多種多樣,以下是一些常見的LLM模型格式
    的頭像 發(fā)表于 07-09 09:59 ?718次閱讀

    請問CAN數(shù)據(jù)的數(shù)據(jù)格式該如何定義?以什么標準定義?

    最近需要實現(xiàn)上位機與下位機的CAN通訊,現(xiàn)在雙方苦于不知該如何定義數(shù)據(jù)格式,尤其是傳輸浮點數(shù)據(jù)~~ 起初經(jīng)過討論直接根據(jù)個人主觀意愿定義了雙方的數(shù)據(jù)協(xié)議,,,其后,boss要求采用標準進行定義
    發(fā)表于 05-09 07:20

    開放原子開源大賽OpenHarmony智能化應用生態(tài)挑戰(zhàn)賽決賽路成功舉辦

    軟件定義世界,開源共筑未來。在江蘇省工信廳、市工信局、開放原子開源基金會及相關(guān)單位的指導和支持下,4月19-20日,由中軟國際教育科技集團聯(lián)合舉辦的開放原子開源大賽-OpenHarmony智能化應用生態(tài)挑戰(zhàn)賽決賽路演在鹽城市大數(shù)
    的頭像 發(fā)表于 04-24 09:48 ?446次閱讀

    什么是智能汽車第六?

    在集中式EEA中,博世五大劃分最為經(jīng)典:動力(Power Train)、底盤(Chassis)、車身(Body/Comfort)、座艙
    發(fā)表于 04-20 09:46 ?639次閱讀
    什么是智能汽車第六<b class='flag-5'>域</b>?