'Python/Crawling' 카테고리의 글 목록

Python/Crawling

Python,crawling, bs4, openpyxl, datetime, string, 상승주 가져오기 2019.10.06
Python,crawling, bs4,pandas,주식 엑셀로 가져오기 2019.10.02
Python,crawling, selenium 로그인하기 2019.10.02
Python,crawling, bs4,인기검색어 가져오기 2019.10.02

Python,crawling, bs4, openpyxl, datetime, string, 상승주 가져오기

까리남 2019. 10. 6. 20:51

2019. 10. 6. 20:51

코드:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57

import bs4
import urllib.request
import openpyxl
from datetime import date
import string
 
today = date.today()
a = []
b = []
c = []
def get_company():
    url = "https://finance.naver.com/sise/sise_rise.nhn?sosok=1"
    html = urllib.request.urlopen(url)
    bsobj = bs4.BeautifulSoup(html, "html.parser")
    tltle = bsobj.find_all("a", {"class": "tltle"})
    for i in tltle:
        a.append(i.text)
    return
 
def get_price():
    url = "https://finance.naver.com/sise/sise_rise.nhn?sosok=1"
    html = urllib.request.urlopen(url)
    bsobj = bs4.BeautifulSoup(html, "html.parser")
    price = bsobj.find_all("td", {"class": "number"})
    for i in price:
        b.append(i.text)
    for index, value in enumerate(b):
        if index%10 == 0:
            c.append(value.replace(",", ""))
    return
 
 
def upload_every_morning():
    get_company()
    get_price()
    wb = openpyxl.Workbook()
    sheet = wb.active
    sheet.append(["company", "price"])
    for i in range(0, 320):
        sheet.append([a[i], c[i]])
    wb.save("E:\\3_2\\python\\Tkinter\\stock_data\\" + str(today) + "_morning.xlsx")
    return
 
def upload_every_final():
    get_company()
    get_price()
    wb = openpyxl.Workbook()
    sheet = wb.active
    sheet.append(["company", "price"])
    for i in range(0, 320):
        sheet.append([a[i], c[i]])
    wb.save("E:\\3_2\\python\\Tkinter\\stock_data\\" + str(today) + "_final.xlsx")
    return
 
upload_every_morning()
upload_every_final()
 
Colored by Color Scripter

cs

실행결과:

설명:

today = date.today()

today()함수를 이용하여 오늘 날짜를 today변수에 저장

get_company()

get_price()

회사명과 주가를 크롤링. 자세한 설명은 아래 포스팅 참고

Python,crawling, bs4,pandas,주식 엑셀로 가져오기

한국 거래소에 올라와 있는 엑셀파일을 다운받는 것을 구현해보았다. 코드: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 import requests imp..

coding-0830.tistory.com

wb = openpyxl.Workbook()

openpyxl.Workbook()함수를 이용하여 엑셀 작업 시작 (엑셀을 만드는 작업)

sheet = wb.active

뭔말인지 모름

sheet.append(["company", "price"])

append()함수를 이용하여 엑셀 파일에서 이미 입력이 되어있는 셀 바로 밑 줄에 내용 추가, 현재는 빈 상태이기 때문에 가장 윗줄에 입력이 됨.

for i in range(0, 320):

sheet.append([a[i], c[i]])

회사명이 저장되어있는 a배열과 주가가 저장되어있는 c배열의 내용을 입력.

wb.save("E:\\3_2\\python\\Tkinter\\stock_data\\" + str(today) + "_final.xlsx")

save()함수를 이용하여 해당 주소에 파일 저장.

'Python > Crawling' 카테고리의 다른 글

Python,crawling, bs4,pandas,주식 엑셀로 가져오기 (0)	2019.10.02
Python,crawling, selenium 로그인하기 (0)	2019.10.02
Python,crawling, bs4,인기검색어 가져오기 (0)	2019.10.02

Python,crawling, bs4,pandas,주식 엑셀로 가져오기

까리남 2019. 10. 2. 20:55

2019. 10. 2. 20:55

한국 거래소에 올라와 있는 엑셀파일을 다운받는 것을 구현해보았다.

코드:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39

import requests
import pandas as pd
from io import BytesIO
 
def get_Excel(tdate):
    gen_req_url = "http://marketdata.krx.co.kr/contents/COM/GenerateOTP.jspx"
    query_str_params = {
        'name': 'fileDown',
        'filetype': 'xls',
        'url': 'MKD/13/1302/13020402/mkd13020402',
        'market_gubun': 'ALL',
        'lmt_tp': '1',
        'sect_tp_cd': 'ALL',
        'schdate': tdate,
        'pagePath': '/contents/MKD/13/1302/13020402/MKD13020402.jsp'
    }
    r = requests.get(gen_req_url, query_str_params)
    gen_req_url = 'http://file.krx.co.kr/download.jspx'
    headers = {
        'Referer': 'http://marketdata.krx.co.kr/mdi'
    }
    form_data = {
        'code': r.content
    }
    r = requests.post(gen_req_url, form_data, headers=headers)
    r.content
    df = pd.read_excel(BytesIO(r.content))
    file_dir = "E:/3_2/python/Crawlling/data/"
    file_name = str(tdate)+'.xls'
    df.to_excel(file_dir + file_name)
    print(tdate, " crawlling")
    return
 
for year in range(2018,2019):
    for month in range(1,13):
        for day in range(1,32):
            tdate = year * 10000 + month * 100 + day * 1
            if tdate <= 20191002:
                get_Excel(tdate)
Colored by Color Scripter

cs

 

실행결과 :

다운받아지는 모습이다.

엑셀파일 모습이다.

'Python > Crawling' 카테고리의 다른 글

Python,crawling, bs4, openpyxl, datetime, string, 상승주 가져오기 (0)	2019.10.06
Python,crawling, selenium 로그인하기 (0)	2019.10.02
Python,crawling, bs4,인기검색어 가져오기 (0)	2019.10.02

Python,crawling, selenium 로그인하기

까리남 2019. 10. 2. 20:54

2019. 10. 2. 20:54

수정 전

 
 
코드 :


1
2
3
4
5
6
7

from selenium import webdriver
driver = webdriver.Chrome(r"C:\Users\JW\Desktop\chromedriver_win32\chromedriver.exe")
driver.get("https://www.hansung.ac.kr/web/www/login")
driver.find_element_by_name('_58_login').send_keys('1433047')
driver.find_element_by_name('_58_password').send_keys('')
driver.find_element_by_class_name('btn_login').click()
 
Colored by Color Scripter

cs

 
실행결과 : 안됨.
================= RESTART: C:\Users\JW\Desktop\python\1-1.py ================= 
Traceback (most recent call last): 
  File "C:\Users\JW\Desktop\python\1-1.py", line 6, in  
    driver.find_element_by_class_name('btn_login').click() 
  File "C:\Users\JW\AppData\Local\Programs\Python\Python37\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 564, in find_element_by_class_name 
    return self.find_element(by=By.CLASS_NAME, value=name) 
  File "C:\Users\JW\AppData\Local\Programs\Python\Python37\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 978, in find_element 
    'value': value})['value'] 
  File "C:\Users\JW\AppData\Local\Programs\Python\Python37\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 321, in execute 
    self.error_handler.check_response(response) 
  File "C:\Users\JW\AppData\Local\Programs\Python\Python37\lib\site-packages\selenium\webdriver\remote\errorhandler.py", line 242, in check_response 
    raise exception_class(message, screen, stacktrace) 
selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"css selector","selector":".btn_login"} 
  (Session info: chrome=77.0.3865.90)

수정 후

코드 :

1
2
3
4
5
6
7
8

from selenium import webdriver
driver = webdriver.Chrome(r"C:\Users\JW\Desktop\chromedriver_win32\chromedriver.exe")
driver.get("https://www.hansung.ac.kr/web/www/login")
driver.find_element_by_name('_58_login').send_keys('1433047')
driver.find_element_by_name('_58_password').send_keys('')
driver.find_element_by_xpath("""//*[@id="loginUnited"]/form/input[7]""").click()
 
 
Colored by Color Scripter

cs

 
실행결과 : 로그인 성공
 
설명 : 
from selenium import webdriver
selenium모듈을 이용할 것이다.
 
driver = webdriver.Chrome(r"C:\Users\JW\Desktop\chromedriver_win32\chromedriver.exe")
크롬의 웹드라이버를 가져와 driver라는 변수에 저장.
 
driver.get("https://www.hansung.ac.kr/web/www/login")
get()함수를 이용하여 매개변수로 받은 url주소를 가져온다.
 
페이지에 들어가 검사를 통해서 id박스에 해당하는 곳을 확인했다. id가 있으니 iname으로 받아온다.
driver.find_element_by_name('_58_login').send_keys('1433047')
find_element_by_name()함수를 이용하여 id가 매개벼수인 곳을 찾고 .send_keys()함수를 이용하여 그 곳에 매개벼수를 입력해주었다.
 
driver.find_element_by_name('_58_password').send_keys('')
비밀버호도 동일하게 코딩
 
driver.find_element_by_xpath("""//*[@id="loginUnited"]/form/input[7]""").click()
수정 전에 실패했던 내용이다. find_element_by_xpath()함수를 이용하여 버튼을 찾고 click()함수를 통해서 클릭을 해준다.
xpath를 받아오는법은 
이렇게 받아오면 
//*[@id="loginUnited"]/form/input[7]  이런게 복사가 된다
이 것을 이유는 모르겠지만 ''' ''' (따움표 3개씩)안에 넣어주면 된다.
 
 
 

'Python > Crawling' 카테고리의 다른 글

Python,crawling, bs4, openpyxl, datetime, string, 상승주 가져오기 (0)	2019.10.06
Python,crawling, bs4,pandas,주식 엑셀로 가져오기 (0)	2019.10.02
Python,crawling, bs4,인기검색어 가져오기 (0)	2019.10.02

Python,crawling, bs4,인기검색어 가져오기

까리남 2019. 10. 2. 20:53

2019. 10. 2. 20:53

코드 :

1
2
3
4
5
6
7
8
9

import bs4
import urllib.request
url = "http://naver.com"
html = urllib.request.urlopen(url)
bsobj = bs4.BeautifulSoup(html, "html.parser")
realtime_hotkeyword = bsobj.find_all("span", {"class":"ah_k"})
for keyword in realtime_hotkeyword:
    print(keyword.text)
 
Colored by Color Scripter

cs

 
실행결과 :

설명 :

import bs4

import urllib.request

bs4와 request모듈을 사용한다.

url = "http://naver.com"

html = urllib.request.urlopen(url)

urllib.request.urlopen()함수를 이용하여 url을 html이라는 변수에 저장했다.

bsobj = bs4.BeautifulSoup(html, "html.parser")

bs4.BeautifulSoup(매개변수, "html.parser") 함수를 통하여 파싱을 하고 bsobj라는 변수에 저장했다. 현재에는 해당 페이지의 모든 html문서가 저장되어있는 상황. print(bsobj)로 확인가능.

realtime_hotkeyword = bsobj.find_all("span", {"class":"ah_k"})

인기검색어에 해당하는 곳을 찾아보면 span태그안에 class 이름이 정해져있는 것을 알 수 있다.bsobj.find_all()함수를 이용하여 인기검색어들을 realtime_hotkeyword이라는 변수에 저장.

for keyword in realtime_hotkeyword:

print(keyword.text)

for문을 이용하여 모두 출력.

'Python > Crawling' 카테고리의 다른 글

Python,crawling, bs4, openpyxl, datetime, string, 상승주 가져오기 (0)	2019.10.06
Python,crawling, bs4,pandas,주식 엑셀로 가져오기 (0)	2019.10.02
Python,crawling, selenium 로그인하기 (0)	2019.10.02

PREV 이전 1 NEXT 다음

컴공과 최지웅

Python/Crawling

Python,crawling, bs4, openpyxl, datetime, string, 상승주 가져오기

'Python > Crawling' 카테고리의 다른 글

Python,crawling, bs4,pandas,주식 엑셀로 가져오기

'Python > Crawling' 카테고리의 다른 글

Python,crawling, selenium 로그인하기

'Python > Crawling' 카테고리의 다른 글

Python,crawling, bs4,인기검색어 가져오기

'Python > Crawling' 카테고리의 다른 글

+ Recent posts

티스토리툴바