๋ฐ์—”์œผ๋กœ ์„ฑ์žฅ์ค‘ ๐ŸŒฑ

Python/[์›น ํฌ๋กค๋ง]

[ํŽธ์˜์  ํฌ๋กค๋ง] 7-ELEVEN

์จ๋ฐ 2023. 3. 24. 12:15

๐Ÿšฉ PLAN

 

 

1. 7-Eleven ํŽธ์˜์  ๋งค์žฅ ์ •๋ณด ์›น ํŽ˜์ด์ง€์—์„œ ํŒŒ์ด์ฌ์„ ํ†ตํ•ด ๋ฐ์ดํ„ฐ๋ฅผ ๊ฐ€์ ธ์˜จ๋‹ค.

 

2. ๊ฐ€์ ธ์˜จ ๋ฐ์ดํ„ฐ๋ฅผ DataFrame ์œผ๋กœ ๋งŒ๋“ค๊ณ , ๋กœ์ปฌ์— pickle ํŒŒ์ผ๋กœ ์ €์žฅํ•œ๋‹ค.

 

 

 

 

[ ๋Œ€์ƒ ์‚ฌ์ดํŠธ ]

 

http://www.7-eleven.co.kr/

 

 

 

 


 

 

๋จผ์ €, ๋Œ€์ƒ ์‚ฌ์ดํŠธ์˜ ๊ตฌ์กฐ๋ฅผ ์‚ดํŽด๋ณด์•˜๋‹ค.

 

 

 

์„ธ๋ธ์ผ๋ ˆ๋ธ ๊ฐ™์€ ๊ฒฝ์šฐ, ๋ฉ”์ธ ์›น ํŽ˜์ด์ง€์—์„œ ์ ํฌ ์ฐพ๊ธฐ ๋ฒ„ํŠผ์„ ๋ˆ„๋ฅด๋ฉด, ๋งค์žฅ ์ •๋ณด๋ฅผ ํŒ์—… ํ˜•ํƒœ๋กœ ์ œ๊ณตํ•ด์ฃผ๊ณ  ์žˆ๋‹ค.

 

CU์ฒ˜๋Ÿผ ์ง€์—ญ๋ณ„๋กœ ๊ฒ€์ƒ‰ํ•ด์„œ ๋ฐ์ดํ„ฐ๋ฅผ ๊ฐ€์ ธ์™€์•ผ ํ•˜๋Š” ํ˜•์‹์ด๋‹ค. (์‹œ/๋„ > ๊ตฌ/๊ตฐ)

 

๊ฐœ๋ฐœ์ž ๋ชจ๋“œ๋ฅผ ํ‚จ ํ›„, ์ž„์˜๋กœ ์„œ์šธ > ์ค‘๊ตฌ ๋กœ ์„ ํƒํ•œ ํ›„, ๊ฒ€์ƒ‰ ๋ฒ„ํŠผ์„ ํ†ตํ•ด ์šฐ๋ฆฌ๊ฐ€ ์›ํ•˜๋Š” ๋ฐ์ดํ„ฐ๋ฅผ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค.

 

payload ๋ฅผ ์‚ดํŽด๋ณด๋ฉฐ ๋ณ€๊ฒฝํ•ด์ฃผ์–ด์•ผ ํ•  key ๋ฅผ ํ™•์ธํ•ด๋ณด๊ณ , Preview๋ฅผ ํ†ตํ•ด ์–ด๋–ค ์‹์œผ๋กœ ์ •๋ณด๋ฅผ ๋ฐ›์„ ์ˆ˜ ์žˆ๋Š”์ง€ ๋ฏธ๋ฆฌ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค.

 

๋ฐ์ดํ„ฐ๋ฅผ ์š”์ฒญํ•ด์„œ ๋ฐ›๊ฒŒ ๋˜๋ฉด, ์œ„์™€ ๊ฐ™์ด html ํ˜•ํƒœ๋กœ ์ •๋ณด๋ฅผ ์ฃผ๊ณ  ์žˆ๋‹ค.

๊ทธ๋ฆฌ๊ณ  ์‹œ/๋„, ์‹œ/๊ตฐ/๊ตฌ ์ •๋ณด๋Š” api ๋ฅผ ํ†ตํ•ด, ์˜ค๋ฅธ์ชฝ ์ด๋ฏธ์ง€์™€ ๊ฐ™์ด ํ™•์ธํ•˜์—ฌ ์–ป์„ ์ˆ˜ ์žˆ๋‹ค. 

 

 

๋‚˜๋Š” ์ด๋Ÿฐ flow ๋กœ ์ ‘๊ทผํ•˜์—ฌ ์•„๋ž˜์™€ ๊ฐ™์€ ์ฝ”๋“œ๋กœ ์ž‘์—…ํ–ˆ๋‹ค.

 

 

โŒจ๏ธ Code

 

import requests
import pandas as pd
import json
import pickle
from tqdm import tqdm
from bs4 import BeautifulSoup as BS


# ์‹œ/๋„ ๋ฐ์ดํ„ฐ  -> ์‹œ/๊ตฐ/๊ตฌ ๋ฐ์ดํ„ฐ -> api ๋ฐ์ดํ„ฐ

url = "https://www.7-eleven.co.kr/util/storeLayerPop.asp"

se = requests.get(url)

bs = BS(se.text)

sido = bs.select("#storeLaySido > option")

sido_total = []

for x in sido[1:]:
    sido_total.append(x.string)


# ์‹œ/๊ตฐ/๊ตฌ ๋ฐ์ดํ„ฐ

gugun_url = "https://www.7-eleven.co.kr/library/asp/StoreGetGugun.asp"
gugun_pay = {
    "Sido": "",
    "selName": "storeLayGu",
}

payload = {
    "storeSido": "",
    "storeLayGu": "",
    "hiddentext" : "none",
}

store_name = []
store_address = []
store_service = []

store_df = pd.DataFrame()

for idx, y in enumerate(tqdm(sido_total)):
    gugun_pay['Sido'] = y
    
    se = requests.post(gugun_url, data=gugun_pay)
    bs = BS(se.text)
    
    gugun = bs.select("option")
    
    gugun_total = []
    
    for z in gugun[1:]:
        gugun_total.append(z.string)
    
    # ํˆ์˜์  ๋ฐ์ดํ„ฐ ๊ฐ€์ ธ์˜ค๊ธฐ
    
    # ์„ธ์ข…์ธ ๊ฒฝ์šฐ๋Š” ์˜ˆ์™ธ์ฒ˜๋ฆฌ (์‹œ/๊ตฐ/๊ตฌ ์—†์Œ)
    if y == '์„ธ์ข…':
        payload['storeLaySido'] = y
        payload['storeLayGu'] = ""
        
        se = requests.post(url, data=payload)
                    
        bs = BS(se.text)
        
        for v in bs.find("div", class_="list_stroe").findAll("li"):
            
            if v.findAll("span")[0].text.strip() == "๊ฒ€์ƒ‰ ๊ฒฐ๊ณผ๊ฐ€ ์—†์Šต๋‹ˆ๋‹ค.":
                break
                
            # ๋งค์žฅ๋ช…
            store_name.append(v.findAll("span")[0].text.strip())
                              
            # ๋งค์žฅ ์ฃผ์†Œ     
            store_address.append(v.findAll("span")[1].string.strip())
                   
            # ๋งค์žฅ ์„œ๋น„์Šค    
            service = []
                
            for ss in v.find("span").findAll("img"):
                service.append(ss['alt'])
                
            store_service.append(service)
            
    else:
        
        for idx, k in enumerate(tqdm(gugun_total)):
            payload['storeLaySido'] = y
            payload['storeLayGu'] = k

            se = requests.post(url, data=payload)
                    
            bs = BS(se.text)

            for t in bs.find("div", class_="list_stroe").findAll("li"):
                
                if t.findAll("span")[0].text.strip() == "๊ฒ€์ƒ‰ ๊ฒฐ๊ณผ๊ฐ€ ์—†์Šต๋‹ˆ๋‹ค.":
                    break
                
                # ๋งค์žฅ๋ช…
                store_name.append(t.findAll("span")[0].text.strip())
                
                # ๋งค์žฅ ์ฃผ์†Œ
                store_address.append(t.findAll("span")[1].string.strip())
                
                # ๋งค์žฅ ์„œ๋น„์Šค  
                service = []
                
                for s in t.find("span").findAll("img"):
                    service.append(s['alt'])
                
                store_service.append(service)


store_df['๋งค์žฅ๋ช…'] = store_name
store_df['์ฃผ์†Œ'] = store_address
store_df['์„œ๋น„์Šค'] = store_service

store_df.to_pickle("./7-ELEVEN_store_service.pkl")

 

 

๐Ÿ“‹ DataFrame

 

์œ„ ์ฝ”๋“œ๋ฅผ ํ†ตํ•ด, ์ตœ์ข… DataFrame ์€ ์•„๋ž˜์™€ ๊ฐ™์€ ํ˜•ํƒœ๋กœ ์ €์žฅํ•  ์ˆ˜ ์žˆ๋‹ค.