๋ฐ์—”์œผ๋กœ ์„ฑ์žฅ์ค‘ ๐ŸŒฑ

Python/[๊ธฐ์ดˆ ๊ฐ•์˜ ์ •๋ฆฌ]

python ๊ธฐ์ดˆ 8

์จ๋ฐ 2023. 2. 26. 19:08

 

โœ๐Ÿป ๋ฐฐ์šด์ 

 

CSRFToken ์— ๋Œ€ํ•ด ์ž˜ ๋ชฐ๋ž์—ˆ๋Š”๋ฐ, ์™œ ์‚ฌ์šฉํ•˜๋Š”์ง€, ์–ด๋–ป๊ฒŒ ์ฒ˜๋ฆฌํ•ด ์ฃผ์–ด์•ผ ํ•˜๋Š”์ง€ ๋ฐฐ์šธ ์ˆ˜ ์žˆ์—ˆ๋‹ค.๊ทธ๋ฆฌ๊ณ  GS25 ํŽธ์˜์  ๋งค์žฅ์กฐํšŒ ํฌ๋กค๋ง์„ ์ง„ํ–‰ํ–ˆ๋Š”๋ฐ, ๋ฐ์ดํ„ฐ์— ์ ‘๊ทผํ•˜๋Š” ๋ฐฉ์‹๋“ค์ด ๊ฝค ์–ด๋ ค์›Œ์„œ ๋” ๊ณต๋ถ€ํ•ด์•ผ ํ•  ๊ฒƒ ๊ฐ™๋‹ค...

 

 

 

 


 

 

 

๋“ค์–ด๊ฐ€๋ฉฐ

 

GS25 ํŽธ์˜์  ๋งค์žฅ ๊ฒ€์ƒ‰ ์›น ํŽ˜์ด์ง€ ํฌ๋กค๋ง ๊ธฐ๋ฐ˜์œผ๋กœ ๊ธ€์„ ์ž‘์„ฑํ–ˆ์Šต๋‹ˆ๋‹ค. ์ฐธ๊ณ ๋ถ€ํƒ๋“œ๋ฆฝ๋‹ˆ๋‹ค :)

 

 

 

 

 

CSRFToken

 

์ด๋ฒˆ์—๋Š” GS25 ํŽธ์˜์ ์— ๋Œ€ํ•œ ๋ฐ์ดํ„ฐ๋ฅผ ๊ฐ€์ ธ์˜ฌ ๊ฒƒ์ด๋‹ค.

 

 

 

 

GS25 ๊ฐ™์€ ๊ฒฝ์šฐ, URL ์„ ํ™•์ธํ•ด๋ณด๋ฉด CSRFToken ์ด๋ผ๋Š” ๊ฒƒ์„ ๋ณผ ์ˆ˜ ์žˆ๋‹ค.

 

'CSRF' ๋ผ๋Š” ๊ฑด ์‚ฌ์ด๋ฒ„ ๊ณต๊ฒฉ ์ค‘ ํ•˜๋‚˜์ธ๋ฐ, ์ด๋ฅผ ๋ฐฉ์ง€ํ•˜๊ธฐ ์œ„ํ•ด์„œ CSRF ํ† ํฐ์ด๋ผ๋Š” ๊ฒƒ์„ ์‚ฌ์šฉ์ž์—๊ฒŒ ๋ฐ›์•„ ์ž˜๋ชป๋œ ์š”์ฒญ์ธ์ง€ ์•„๋‹Œ์ง€ ํ™•์ธํ•˜๊ธฐ ์œ„ํ•ด ์กด์žฌํ•˜๋Š” ๊ฒƒ์ด๋‹ค.

 

๊ทธ๋ž˜์„œ, ํฌ๋กค๋ง ํ•  ๋•Œ URL ์—์„œ CSRFToken ์ด๋ผ๋Š” ๊ฒŒ ๋ณด์ด๋ฉด ์ด ๊ฐ’์„ ๋”ฐ๋กœ ์œ ๋™์ ์œผ๋กœ ๋ฐ›๊ฒŒ ๋งŒ๋“ค์–ด์ฃผ์–ด์•ผ ํ•œ๋‹ค.

 

 

import requests
from bs4 import BeatifulSoup as BS

url = "http://gs25.gsretail.com/gscvs/ko/store-services/locations#;"

r = requests.get(url)

bs = BS(r.text)

# ์ ‘์†ํ•  ๋•Œ๋งˆ๋‹ค ํ‚ค ๋ฐœ๊ธ‰์„ ํ•ด์ค€๋‹ค.
csrf = bs.find("form", id="CSRFForm").find("input")['value']

 

csrf ํ† ํฐ ๊ฐ™์€ ๊ฒฝ์šฐ, '์„ธ์…˜'์ด ์œ ์ง€๋œ ์ƒํƒœ์—๋งŒ ์œ ํšจํ•˜๊ธฐ ๋•Œ๋ฌธ์— ๊ณ„์† ์„ธ์…˜์„ ์œ ์ง€ํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•ด์•ผํ•œ๋‹ค.

 

 

payload = {"pageNum"  : "1",
"pageSize"  : "100",
"searchShopName"  : "",
"searchSido"  : "11",
"searchGugun"  : "",
"searchDong"  : "",
"searchType"  : "",
"searchTypeService"  : "0",
"searchTypeToto"  : "0",
"searchTypeCafe25"  : "0",
"searchTypeInstant"  : "0",
"searchTypeDrug"  : "0",
"searchTypeSelf25"  : "0",
"searchTypePost"  : "0",
"searchTypeATM"  : "0",
"searchTypeWithdrawal"  : "0",
"searchTypeTaxrefund"  : "0",
"searchTypeSmartAtm"  : "0",
"searchTypeSelfCookingUtensils"  : "0",
"searchTypeDeliveryService"  : "0",
}

post_url = "http://gs25.gsretail.com/gscvs/ko/store-services/locationList?CSRFToken={}"

with requests.Session() as s: # ์„ธ์…˜ ์—ฐ๊ฒฐ ์ƒํƒœ๋กœ ๋งŒ๋“ค๊ธฐ
    r = s.get(url)
    bs = BS(r.text)
    csrf = bs.find("form", id="CSRFForm").find("input")['value']
    payload['pageSize'] = 5000
    r2 = s.post(post_url.format(csrf), data=payload)

 

 

 

์ „๊ตญ GS25 ํŽธ์˜์  ๋งค์žฅ ์ง€์—ญ ์กฐํšŒํ•˜๊ธฐ

 

์œ„์— ๊ฒƒ์„ ์ข…ํ•ฉํ•˜์—ฌ ์ ์šฉํ•˜๊ณ  ์ง€์—ญ๋“ค์„ ์กฐํšŒํ•˜๋ฉด ์•„๋ž˜์™€ ๊ฐ™๋‹ค.

 

import json
import pandas as pd

# ์ง€์—ญ๊ตฌ๋ถ„ '์‹œ/๋„'์˜ key, value ๊ฐ’ ์ €์žฅ
master = {x.text : x['value'] for x in bs.find("select", id="stb1").findAll("option")[1:]}

post_url = "http://gs25.gsretail.com/gscvs/ko/store-services/locationList?CSRFToken={}"
total = []

with requests.Session() as s:
    r = s.get(url)
    bs = BS(r.text)
    csrf = bs.find("form", id="CSRFForm").find("input")['value']
    
    for code in master.values():
        payload['pageSize'] = 5000
        payload['searchSido'] = code
        r2 = s.post(post_url.format(csrf), data=payload)
        total.append(pd.DataFrame(json.loads(r2.json())['results']))

 

 

์„ธ์…˜์„ ๊ณ„์† ์œ ์ง€ํ•˜๋„๋ก ๋งŒ๋“ค๊ณ , ๊ฒ€์ƒ‰ ๊ฒฐ๊ณผ๋“ค์„ ์ €์žฅํ•œ๋‹ค.

 

gs = pd.DataFrame(json.loads(r2.json())['results'])

# gs ์— total df ๋ถ™์ด๊ธฐ
gs = pd.concat(total)

# ์ฃผ์†Œ ๋ฐ์ดํ„ฐ์—์„œ ์‹œ/๋„ ์ด๋ฆ„๋งŒ ์ž˜๋ผ ์ €์žฅํ•˜๊ธฐ
gs['์‹œ'] = gs['address'].apply(lambda x : x.split()[0])

gs['์‹œ'].value_counts() # ์‹œ/๋„ ๋ณ„๋กœ GS25 ํŽธ์˜์  ๊ฐœ์ˆ˜

 

์ด๋ ‡๊ฒŒ, ์‹œ/๋„ ๋ณ„๋กœ GS25 ํŽธ์˜์  ๊ฐœ์ˆ˜๋„ ๊ตฌํ•  ์ˆ˜ ์žˆ๋‹ค.

 

 

 

 

์ „๊ตญ GS25 ํŽธ์˜์  ๋งค์žฅ ์„œ๋น„์Šค ํ™•์ธํ•˜๊ธฐ

 

์›ํ•˜๋Š” ๋ฐ์ดํ„ฐ๋ฅผ ์–ป๊ธฐ ์œ„ํ•ด column ๋“ค์— ์ ‘๊ทผํ•˜๋ฉฐ, ์ƒˆ๋กœ์šด ์ •๋ณด๋ฅผ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค.

 

gs['offeringService'].apply(lambda x : 'drug' in x) # ๊ฒฐ์ธก์น˜๊ฐ€ ์žˆ์–ด์„œ error ๋ฐœ์ƒ

gs.loc[gs['offeringService'].notnull(), '์˜๋ฃŒ'] = gs[gs['offeringService'].notnull()]['offeringService'].apply(lambda x : 'Y' if 'drug' in x else 'N')

gs['์˜๋ฃŒ'].value_counts(normalize=True) # ์ „๊ตญ gs ์•ฝ ํŒŒ๋Š” ๋งค์žฅ ๋น„์œจ

# 11 ๊ฐœ ์„œ๋น„์Šค ๋‹คํ•˜๋Š” ํŽธ์˜์  ํ™•์ธ
gs.loc[gs['offeringService'].notnull(), '์˜๋ฃŒ'] = gs[gs['offeringService'].notnull()]['offeringService'].apply(lambda x : len(x) == 11)

# 11 ๊ฐœ ์„œ๋น„์Šค๊ฐ€ ๋ญ”์ง€ ๋ณด๊ธฐ
gs.loc[gs['offeringService'].notnull(), '์˜๋ฃŒ'] = gs[gs['offeringService'].notnull()]['offeringService'].apply(lambda x : len(x) == 11)['offeringServie'].values()

'Python > [๊ธฐ์ดˆ ๊ฐ•์˜ ์ •๋ฆฌ]' ์นดํ…Œ๊ณ ๋ฆฌ์˜ ๋‹ค๋ฅธ ๊ธ€

python ๊ธฐ์ดˆ 10  (0) 2023.03.03
python ๊ธฐ์ดˆ 9  (0) 2023.03.01
python ๊ธฐ์ดˆ 7  (0) 2023.02.26
python ๊ธฐ์ดˆ 6  (0) 2023.02.26
python ๊ธฐ์ดˆ 5  (0) 2023.02.26