반응형
파이썬 자연어 처리 샘플 문서 받아오기 nltk news¶
- reuters 뉴스기사를 샘플로 모아둔 데이터입니다.
- 각 파일(행)은 뉴스 문서입니다.
- 각 문서는 하나 이상의 카테고리로 분류될 수 있습니다.
- 총 10,788개의 뉴스 문서가 파일에 있습니다.
In [1]:
from nltk.corpus import reuters
In [2]:
# 뉴스문서 파일 확인
file_name = reuters.fileids()
len(file_name)
Out[2]:
10788
In [3]:
# catgories 메서드와 raw 메서드를 할용한 문서 카테고리와 내용 확인
for file in file_name[:10]:
print(reuters.categories(file))
print(reuters.raw(file))
print("-"*50)
['trade'] ASIAN EXPORTERS FEAR DAMAGE FROM U.S.-JAPAN RIFT Mounting trade friction between the U.S. And Japan has raised fears among many of Asia's exporting nations that the row could inflict far-reaching economic damage, businessmen and officials said. They told Reuter correspondents in Asian capitals a U.S. Move against Japan might boost protectionist sentiment in the U.S. And lead to curbs on American imports of their products. But some exporters said that while the conflict would hurt them in the long-run, in the short-term Tokyo's loss might be their gain. The U.S. Has said it will impose 300 mln dlrs of tariffs on imports of Japanese electronics goods on April 17, in retaliation for Japan's alleged failure to stick to a pact not to sell semiconductors on world markets at below cost. Unofficial Japanese estimates put the impact of the tariffs at 10 billion dlrs and spokesmen for major electronics firms said they would virtually halt exports of products hit by the new taxes. "We wouldn't be able to do business," said a spokesman for leading Japanese electronics firm Matsushita Electric Industrial Co Ltd <MC.T>. "If the tariffs remain in place for any length of time beyond a few months it will mean the complete erosion of exports (of goods subject to tariffs) to the U.S.," said Tom Murtha, a stock analyst at the Tokyo office of broker <James Capel and Co>. In Taiwan, businessmen and officials are also worried. "We are aware of the seriousness of the U.S. Threat against Japan because it serves as a warning to us," said a senior Taiwanese trade official who asked not to be named. Taiwan had a trade trade surplus of 15.6 billion dlrs last year, 95 pct of it with the U.S. The surplus helped swell Taiwan's foreign exchange reserves to 53 billion dlrs, among the world's largest. "We must quickly open our markets, remove trade barriers and cut import tariffs to allow imports of U.S. Products, if we want to defuse problems from possible U.S. Retaliation," said Paul Sheen, chairman of textile exporters <Taiwan Safe Group>. A senior official of South Korea's trade promotion association said the trade dispute between the U.S. And Japan might also lead to pressure on South Korea, whose chief exports are similar to those of Japan. Last year South Korea had a trade surplus of 7.1 billion dlrs with the U.S., Up from 4.9 billion dlrs in 1985. In Malaysia, trade officers and businessmen said tough curbs against Japan might allow hard-hit producers of semiconductors in third countries to expand their sales to the U.S. In Hong Kong, where newspapers have alleged Japan has been selling below-cost semiconductors, some electronics manufacturers share that view. But other businessmen said such a short-term commercial advantage would be outweighed by further U.S. Pressure to block imports. "That is a very short-term view," said Lawrence Mills, director-general of the Federation of Hong Kong Industry. "If the whole purpose is to prevent imports, one day it will be extended to other sources. Much more serious for Hong Kong is the disadvantage of action restraining trade," he said. The U.S. Last year was Hong Kong's biggest export market, accounting for over 30 pct of domestically produced exports. The Australian government is awaiting the outcome of trade talks between the U.S. And Japan with interest and concern, Industry Minister John Button said in Canberra last Friday. "This kind of deterioration in trade relations between two countries which are major trading partners of ours is a very serious matter," Button said. He said Australia's concerns centred on coal and beef, Australia's two largest exports to Japan and also significant U.S. Exports to that country. Meanwhile U.S.-Japanese diplomatic manoeuvres to solve the trade stand-off continue. Japan's ruling Liberal Democratic Party yesterday outlined a package of economic measures to boost the Japanese economy. The measures proposed include a large supplementary budget and record public works spending in the first half of the financial year. They also call for stepped-up spending as an emergency measure to stimulate the economy despite Prime Minister Yasuhiro Nakasone's avowed fiscal reform program. Deputy U.S. Trade Representative Michael Smith and Makoto Kuroda, Japan's deputy minister of International Trade and Industry (MITI), are due to meet in Washington this week in an effort to end the dispute. -------------------------------------------------- ['grain'] CHINA DAILY SAYS VERMIN EAT 7-12 PCT GRAIN STOCKS A survey of 19 provinces and seven cities showed vermin consume between seven and 12 pct of China's grain stocks, the China Daily said. It also said that each year 1.575 mln tonnes, or 25 pct, of China's fruit output are left to rot, and 2.1 mln tonnes, or up to 30 pct, of its vegetables. The paper blamed the waste on inadequate storage and bad preservation methods. It said the government had launched a national programme to reduce waste, calling for improved technology in storage and preservation, and greater production of additives. The paper gave no further details. -------------------------------------------------- ['crude', 'nat-gas'] JAPAN TO REVISE LONG-TERM ENERGY DEMAND DOWNWARDS The Ministry of International Trade and Industry (MITI) will revise its long-term energy supply/demand outlook by August to meet a forecast downtrend in Japanese energy demand, ministry officials said. MITI is expected to lower the projection for primary energy supplies in the year 2000 to 550 mln kilolitres (kl) from 600 mln, they said. The decision follows the emergence of structural changes in Japanese industry following the rise in the value of the yen and a decline in domestic electric power demand. MITI is planning to work out a revised energy supply/demand outlook through deliberations of committee meetings of the Agency of Natural Resources and Energy, the officials said. They said MITI will also review the breakdown of energy supply sources, including oil, nuclear, coal and natural gas. Nuclear energy provided the bulk of Japan's electric power in the fiscal year ended March 31, supplying an estimated 27 pct on a kilowatt/hour basis, followed by oil (23 pct) and liquefied natural gas (21 pct), they noted. -------------------------------------------------- ['corn', 'grain', 'rice', 'rubber', 'sugar', 'tin', 'trade'] THAI TRADE DEFICIT WIDENS IN FIRST QUARTER Thailand's trade deficit widened to 4.5 billion baht in the first quarter of 1987 from 2.1 billion a year ago, the Business Economics Department said. It said Janunary/March imports rose to 65.1 billion baht from 58.7 billion. Thailand's improved business climate this year resulted in a 27 pct increase in imports of raw materials and semi-finished products. The country's oil import bill, however, fell 23 pct in the first quarter due to lower oil prices. The department said first quarter exports expanded to 60.6 billion baht from 56.6 billion. Export growth was smaller than expected due to lower earnings from many key commodities including rice whose earnings declined 18 pct, maize 66 pct, sugar 45 pct, tin 26 pct and canned pineapples seven pct. Products registering high export growth were jewellery up 64 pct, clothing 57 pct and rubber 35 pct. -------------------------------------------------- ['palm-oil', 'veg-oil'] INDONESIA SEES CPO PRICE RISING SHARPLY Indonesia expects crude palm oil (CPO) prices to rise sharply to between 450 and 550 dlrs a tonne FOB sometime this year because of better European demand and a fall in Malaysian output, Hasrul Harahap, junior minister for tree crops, told Indonesian reporters. Prices of Malaysian and Sumatran CPO are now around 332 dlrs a tonne CIF for delivery in Rotterdam, traders said. Harahap said Indonesia would maintain its exports, despite making recent palm oil purchases from Malaysia, so that it could possibly increase its international market share. Indonesia, the world's second largest producer of palm oil after Malaysia, has been forced to import palm oil to ensure supplies during the Moslem fasting month of Ramadan. Harahap said it was better to import to cover a temporary shortage than to lose export markets. Indonesian exports of CPO in calendar 1986 were 530,500 tonnes, against 468,500 in 1985, according to central bank figures. -------------------------------------------------- ['ship'] AUSTRALIAN FOREIGN SHIP BAN ENDS BUT NSW PORTS HIT Tug crews in New South Wales (NSW), Victoria and Western Australia yesterday lifted their ban on foreign-flag ships carrying containers but NSW ports are still being disrupted by a separate dispute, shipping sources said. The ban, imposed a week ago over a pay claim, had prevented the movement in or out of port of nearly 20 vessels, they said. The pay dispute went before a hearing of the Arbitration Commission today. Meanwhile, disruption began today to cargo handling in the ports of Sydney, Newcastle and Port Kembla, they said. The industrial action at the NSW ports is part of the week of action called by the NSW Trades and Labour Council to protest changes to the state's workers' compensation laws. The shipping sources said the various port unions appear to be taking it in turn to work for a short time at the start of each shift and then to walk off. Cargo handling in the ports has been disrupted, with container movements most affected, but has not stopped altogether, they said. They said they could not say how long the disruption will go on and what effect it will have on shipping movements. -------------------------------------------------- ['coffee', 'lumber', 'palm-oil', 'rubber', 'veg-oil'] INDONESIAN COMMODITY EXCHANGE MAY EXPAND The Indonesian Commodity Exchange is likely to start trading in at least one new commodity, and possibly two, during calendar 1987, exchange chairman Paian Nainggolan said. He told Reuters in a telephone interview that trading in palm oil, sawn timber, pepper or tobacco was being considered. Trading in either crude palm oil (CPO) or refined palm oil may also be introduced. But he said the question was still being considered by Trade Minister Rachmat Saleh and no decision on when to go ahead had been made. The fledgling exchange currently trades coffee and rubber physicals on an open outcry system four days a week. "Several factors make us move cautiously," Nainggolan said. "We want to move slowly and safely so that we do not make a mistake and undermine confidence in the exchange." Physical rubber trading was launched in 1985, with coffee added in January 1986. Rubber contracts are traded FOB, up to five months forward. Robusta coffee grades four and five are traded for prompt delivery and up to five months forward, exchange officials said. The trade ministry and exchange board are considering the introduction of futures trading later for rubber, but one official said a feasibility study was needed first. No decisions are likely until after Indonesia's elections on April 23, traders said. Trade Minister Saleh said on Monday that Indonesia, as the world's second largest producer of natural rubber, should expand its rubber marketing effort and he hoped development of the exchange would help this. Nainggolan said that the exchange was trying to boost overseas interest by building up contacts with end-users. He said teams had already been to South Korea and Taiwan to encourage direct use of the exchange, while a delegation would also visit Europe, Mexico and some Latin American states to encourage participation. Officials say the infant exchange has made a good start although trading in coffee has been disappointing. Transactions in rubber between the start of trading in April 1985 and December 1986 totalled 9,595 tonnes, worth 6.9 mln dlrs FOB, plus 184.3 mln rupiah for rubber delivered locally, the latest exchange report said. Trading in coffee in calendar 1986 amounted to only 1,905 tonnes in 381 lots, valued at 6.87 billion rupiah. Total membership of the exchange is now nine brokers and 44 traders. -------------------------------------------------- ['grain', 'wheat'] SRI LANKA GETS USDA APPROVAL FOR WHEAT PRICE Food Department officials said the U.S. Department of Agriculture approved the Continental Grain Co sale of 52,500 tonnes of soft wheat at 89 U.S. Dlrs a tonne C and F from Pacific Northwest to Colombo. They said the shipment was for April 8 to 20 delivery. -------------------------------------------------- ['gold'] WESTERN MINING TO OPEN NEW GOLD MINE IN AUSTRALIA Western Mining Corp Holdings Ltd <WMNG.S> (WMC) said it will establish a new joint venture gold mine in the Northern Territory at a cost of about 21 mln dlrs. The mine, to be known as the Goodall project, will be owned 60 pct by WMC and 40 pct by a local W.R. Grace and Co <GRA> unit. It is located 30 kms east of the Adelaide River at Mt. Bundey, WMC said in a statement It said the open-pit mine, with a conventional leach treatment plant, is expected to produce about 50,000 ounces of gold in its first year of production from mid-1988. Annual ore capacity will be about 750,000 tonnes. -------------------------------------------------- ['acq'] SUMITOMO BANK AIMS AT QUICK RECOVERY FROM MERGER Sumitomo Bank Ltd <SUMI.T> is certain to lose its status as Japan's most profitable bank as a result of its merger with the Heiwa Sogo Bank, financial analysts said. Osaka-based Sumitomo, with desposits of around 23.9 trillion yen, merged with Heiwa Sogo, a small, struggling bank with an estimated 1.29 billion dlrs in unrecoverable loans, in October. But despite the link-up, Sumitomo President Koh Komatsu told Reuters he is confident his bank can quickly regain its position. "We'll be back in position in first place within three years," Komatsu said in an interview. He said that while the merger will initially reduce Sumitomo's profitability and efficiency, it will vastly expand Sumitomo's branch network in the Tokyo metropolitan area where it has been relatively weak. But financial analysts are divided on whether and how quickly the gamble will pay off. Some said Sumitomo may have paid too much for Heiwa Sogo in view of the smaller bank's large debts. Others argue the merger was more cost effective than creating a comparable branch network from scratch. The analysts agreed the bank was aggressive. It has expanded overseas, entered the lucrative securities business and geared up for domestic competition, but they questioned the wisdom of some of those moves. "They've made bold moves to put everything in place. Now it's largely out of their hands," said Kleinwort Benson Ltd financial analyst Simon Smithson. Among Sumitomo's problems are limits placed on its move to enter U.S. Securities business by taking a share in American investment bank Goldman, Sachs and Co. Sumitomo last August agreed to pay 500 mln dlrs for a 12.5 pct limited partnership in the bank, but for the time being at least, the Federal Reserve Board has forbidden them to exchange personnel, or increase the business they do with each other. "The tie-up is widely looked on as a lame duck because the Fed was stricter than Sumitomo expected," said one analyst. But Komatsu said the move will pay off in time. "U.S. Regulations will change in the near future and if so, we can do various things. We only have to wait two or three years, not until the 21st century," Komatsu said. Komatsu is also willing to be patient about possible routes into the securities business at home. Article 65 of the Securities and Exchange Act, Japan's version of the U.S. Glass-Steagall Act, separates commercial from investment banking. But the walls between the two are crumbling and Komatsu said he hopes further deregulation will create new opportunities. "We need to find new business chances," Komatsu said. "In some cases these will be securities related, in some cases trust bank related. That's the kind of deregulation we want." Until such changes occur, Sumitomo will focus on such domestic securities business as profitable government bond dealing and strengthening relations with Meiko Securities Co Ltd, in which it holds a five pct share, Komatsu said. He said Sumitomo is cautiously optimistic about entering the securities business here through its Swiss universal bank subsidiary, Banca del Gottardo. The Finance Ministry is expected to grant licences to securities subsidiaries of U.S. Commercial banks soon, following a similar decision for subsidiaries of European universal banks in which the parent holds a less than 50 pct. But Komatsu is reluctant to push hard for a similar decision on a Gottardo subsidiary. "We don't want to make waves. We expect this will be allowed in two or three years," he said. Like other city banks, Sumitomo is also pushing to expand lending to individuals and small and medium businesses to replace disappearing demand from big business, he added. The analysts said Sumitomo will have to devote a lot of time to digesting its most recent initiatives, including the merger with ailing Heiwa Sogo. "It's (Sumitomo) been bold in its strategies," said Kleinwort's Smithson. "After that, it's a question of absorbing and juggling around. It will be the next decade before we see if the strategy is right or wrong." --------------------------------------------------
워드 카운트 분석 예시¶
- 카테고리가 trade 인 뉴스만 수집
In [4]:
trade_list = []
for file in file_name:
if "trade" in reuters.categories(file):
trade_list.append(reuters.raw(file))
- 토큰화 > 품사 구분 > 명사 추출
In [5]:
from nltk.tokenize import word_tokenize
from nltk.tag import pos_tag
# 토믄화
token_list = word_tokenize(",".join(trade_list))
# 품사 구분
pos_tag_list = pos_tag(token_list)
# 명사 추출
nouns_list = [word[0] for word in pos_tag_list if word[1] == "NN"]
- 워드카운트
In [6]:
from nltk import Text
import matplotlib.pyplot as plt
text = Text(nouns_list)
text.plot(20)
plt.show()
반응형
'python' 카테고리의 다른 글
파이썬을 텍스트 데이터 엘라스틱서치 업로드 (0) | 2023.06.27 |
---|---|
파이썬 한번이라도 같이 등장한 값 끼리 모으기 (0) | 2023.06.27 |
파이썬 딕셔너리 min / max 적용 (0) | 2023.06.25 |
sklearn을 활용하여 아이리스 데이터 분류 모델 만들기 (0) | 2023.06.25 |
Input vector should be 1-D. (0) | 2023.06.23 |
댓글