๋ฐ์—”์œผ๋กœ ์„ฑ์žฅ์ค‘ ๐ŸŒฑ

Python/[๊ธฐ์ดˆ ๊ฐ•์˜ ์ •๋ฆฌ]

python ๊ธฐ์ดˆ 9

์จ๋ฐ 2023. 3. 1. 18:57

โœ๐Ÿป ๋ฐฐ์šด์ 

 

์ •๊ทœ์‹์„ ํ†ตํ•ด ํฌ๋กค๋ง์‹œ, ํŠน์ •ํ•œ ๊ฐ’์— ์ ‘๊ทผํ•˜์—ฌ ์ฐพ๋Š” ๋ฐฉ๋ฒ•์ด ๊ทธ๋ƒฅ ์ฐพ๋Š” ๊ฒƒ ๋ณด๋‹ค ๋” ๊ฐ„ํŽธํ•˜๋‹ค๋Š” ๊ฒƒ์„ ์•Œ๊ฒŒ ๋˜์—ˆ๋‹ค. 

์กฐ๊ธˆ์”ฉ ํฌ๋กค๋ง์— ๋Œ€ํ•ด ์ดํ•ดํ•˜๊ณ  ์žˆ๋Š” ๊ณผ์ •์ด ๋˜์–ด๊ฐ€๊ณ  ์žˆ๋‹ค.

 

 

 

 


 

 

 

๋“ค์–ด๊ฐ€๋ฉฐ

 

์ด๋ฒˆ์—๋Š” ์ •๊ทœ์‹์„ ํ†ตํ•ด ํฌ๋กค๋ง์— ์›ํ•˜๋Š” ๊ฐ’๋“ค์„ ์‰ฝ๊ฒŒ ์ฐพ์•„๋‚ด๊ณ , CU ํŽธ์˜์  ๋งค์žฅ ์ฐพ๊ธฐ ์›น ํŽ˜์ด์ง€ ๊ธฐ๋ฐ˜์œผ๋กœ ๊ธ€์„ ์ž‘์„ฑํ–ˆ์Šต๋‹ˆ๋‹ค.

 

 

 

 

 

์ •๊ทœ์‹ ๊ธฐ์ดˆ

 

 

ํ•œ๊ฐœ์˜ ๋ฌธ์ž

^  → ๋ฌธ์ž์—ด ์‹œ์ž‘๊ณผ ๋งค์นญ or [ ] ์•ˆ์—์„œ ๋ฐ˜๋Œ€๋ฅผ ์˜๋ฏธ

$  → ๋ฌธ์ž์—ด ๋งˆ์ง€๋ง‰์„ ๋งค์นญ

 

[ ]   → ์ง‘ํ•ฉ(์•ˆ์— ์žˆ๋Š” ๊ฒƒ ์ค‘ ํ•˜๋‚˜๋ฅผ ์˜๋ฏธ)

|   → ๋˜๋Š”(OR)

( )  → ์ •๊ทœ์‹ ๊ทธ๋ฃน ๋ฌถ๊ธฐ

 

\w  → ์ˆซ์ž ๋˜๋Š” ๋ฌธ์ž(ํ•œ๊ธ€ ์ œ์™ธ)

 

\d  → ์ˆซ์ž

\D  → ์ˆซ์ž๊ฐ€ ์•„๋‹Œ ๋ชจ๋“  ๋ฌธ์ž

 

\s  → ๋„์–ด์“ฐ๊ธฐ

\S  → ๋„์–ด์“ฐ๊ธฐ ์•„๋‹Œ ๋ชจ๋“  ๋ฌธ์ž

 

 

+   → 1๋ฒˆ ์ด์ƒ ํŒจํ„ด

*  → 0๋ฒˆ ์ด์ƒ ํŒจํ„ด

?   → 0 ๋˜๋Š” 1๋ฒˆ ํŒจํ„ด ๋ฐœ์ƒ

 

 

{ i }   → i ๋ฒˆ ๋ฐ˜๋ณต 

{ i, j }   → i ๋ฒˆ ๋ถ€ํ„ฐ j ๋ฒˆ๊นŒ์ง€ ๋ฐ˜๋ณต ํ—ˆ์šฉ

 

 

 

 

์ •๊ทœ์‹ ์—ฐ์Šต

 

 

์ •๊ทœ์‹์„ ์—ฐ์Šตํ•˜๊ธฐ ์œ„ํ•œ ๋ฌธ์ œ๋“ค์„ ํ’€์–ด๋ณด์•˜๋‹ค.

 

 

text_famous = """201901 Dost thou love life? Then do not squander time, for that is the stuff life is made of. (Benjamin Franklin) ๊ทธ๋Œ€๋Š” ์ธ์ƒ์„ ์‚ฌ๋ž‘ํ•˜๋Š”๊ฐ€? ๊ทธ๋ ‡๋‹ค๋ฉด ์‹œ๊ฐ„์„ ๋‚ญ๋น„ํ•˜์ง€ ๋ง๋ผ, ์‹œ๊ฐ„์ด์•ผ๋ง๋กœ ์ธ์ƒ์„ ํ˜•์„ฑํ•˜๋Š” ์žฌ๋ฃŒ์ด๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค. (๋ฒค์ž๋ฏผ ํ”„๋žญํด๋ฆฐ)  
201902 Life is like riding a bicycle. To keep your balance you must keep moving. (Albert Einstein) ์ธ์ƒ์€ ์ž์ „๊ฑฐ๋ฅผ ํƒ€๋Š” ๊ฒƒ๊ณผ ๊ฐ™๋‹ค. ๊ท ํ˜•์„ ์žก์œผ๋ ค๋ฉด ์›€์ง์—ฌ์•ผ ํ•œ๋‹ค. (์•Œ๋ฒ„ํŠธ ์•„์ธ์Šˆํƒ€์ธ) 
201903 Life is a tragedy when seen in close-up, but a comedy in long-shot. (Charlie Chaplin)  ์ธ์ƒ์€ ๊ฐ€๊นŒ์ด์„œ ๋ณด๋ฉด ๋น„๊ทน์ด์ง€๋งŒ ๋ฉ€๋ฆฌ์„œ ๋ณด๋ฉด ํฌ๊ทน์ด๋‹ค (์ฐฐ๋ฆฌ ์ฑ„ํ”Œ๋ฆฐ) 
201904 Dream as if you'll live forever. Live as if you'll die today. (James Dean) ์˜์›ํžˆ ์‚ด ๊ฒƒ์ฒ˜๋Ÿผ ๊ฟˆ๊พธ๊ณ  ์˜ค๋Š˜ ์ฃฝ์„ ๊ฒƒ์ฒ˜๋Ÿผ ์‚ด์•„๋ผ. (์ œ์ž„์Šค ๋”˜) 
201905 Life is an endless series of trainwrecks with only brief commercial like breaks of happiness. (Deadpool) ์ธ์ƒ์ด๋ž€ ๊ดด๋กœ์›€์˜ ์—ฐ์†์ด๊ณ , ํ–‰๋ณต์€ ๊ด‘๊ณ ์ฒ˜๋Ÿผ ์งง๋‹ค. (๋ฐ๋“œํ’€)"""

 

text_famous ๋ฅผ ์ด์šฉํ•˜์—ฌ ์ •๊ทœ์‹ ์‚ฌ์šฉ์„ ํ•ด๋ณด์ž.

 

import re

p2 = re.compile("[a-zA-Z].+[a-zA-z][.]") # ์ •๊ทœ์‹ ์ €์žฅ
p2.findall(text_famous) # p2 ์ •๊ทœ์‹์— ํ•ด๋‹น ๋˜๋Š” ๋ชจ๋“  ๊ฒƒ ์ฐพ๊ธฐ

 

p2 ๋Š” ์–ด๋–ค ์˜์–ด ๋ฌธ์ž์™€ ๋ชจ๋“  ๋ฌธ์ž์—ด์ด 1๋ฒˆ ์ด์ƒ์ด๊ณ  ๋˜ ์–ด๋–ค ์˜์–ด ๋ฌธ์ž์™€ ๋งˆ์ง€๋ง‰์ด ์ˆœ์ˆ˜ํ•œ . ์œผ๋กœ ๋๋‚˜๋Š” ๊ฒƒ์„ ์ฐพ๋Š” ์ •๊ทœ์‹์ด๋‹ค.

text_famous ์—์„œ ํ•ด๋‹น ์ •๊ทœ์‹์— ํ•ด๋‹น๋˜๋Š” ๋ชจ๋“  ๋ฌธ์ž์—ด์„ ์ฐพ๋Š”๋‹ค. ( = ๋ชจ๋“  ์˜์–ด ๋ฌธ์žฅ์ด . ์œผ๋กœ ๋๋‚˜๋Š” ๋ฌธ์žฅ๋“ค์ด list ๋กœ ์ €์žฅ)

 

 

import re

p3 = re.compile("[๊ฐ€-ํžฃ].+[๊ฐ€-ํžฃ][.]") # ์ •๊ทœ์‹ ์ €์žฅ
p3.findall(text_famous) # p3 ์ •๊ทœ์‹์— ํ•ด๋‹น ๋˜๋Š” ๋ชจ๋“  ๊ฒƒ ์ฐพ๊ธฐ

 

p3 ๋Š” ์–ด๋–ค ํ•œ๊ธ€๊ณผ ๋ชจ๋“  ๋ฌธ์ž์—ด์ด 1๋ฒˆ ์ด์ƒ์ด๊ณ  ๋˜ ์–ด๋–ค ํ•œ๊ธ€๊ณผ ๋งˆ์ง€๋ง‰์ด ์ˆœ์ˆ˜ํ•œ . ์œผ๋กœ ๋๋‚˜๋Š” ๊ฒƒ์„ ์ฐพ๋Š” ์ •๊ทœ์‹์ด๋‹ค.

text_famous ์—์„œ ํ•ด๋‹น ์ •๊ทœ์‹์— ํ•ด๋‹น๋˜๋Š” ๋ชจ๋“  ๋ฌธ์ž์—ด์„ ์ฐพ๋Š”๋‹ค. ( = ๋ชจ๋“  ํ•œ๊ธ€ ๋ฌธ์žฅ์ด . ์œผ๋กœ ๋๋‚˜๋Š” ๋ฌธ์žฅ๋“ค์ด list ๋กœ ์ €์žฅ)

 

 

 

 

 

 

daum ๊ธฐ์‚ฌ ๊ธฐ์ž ์ด๋ฉ”์ผ ์ •๊ทœ์‹์œผ๋กœ ์ถ”์ถœํ•˜๊ธฐ

 

 

daum ๊ธฐ์‚ฌ์— ์ ํ˜€์žˆ๋Š” ๊ธฐ์ž ์ด๋ฉ”์ผ๋“ค์„ ํŒŒ์ด์ฌ ์ •๊ทœ์‹์„ ํ†ตํ•ด ๊ฐ€์ ธ์™€๋ณด์ž.

 

import requests
import re

url = "https://v.daum.net/v/20230227100502684"
r = requests.get(url)

p = re.compile("[\w._%+-]+@[\w.-]+\.[a-zA-z]{2, 4}")

list(set(p.findall(r.text)))

 

์ด๋ฉ”์ผ ์ •๊ทœ์‹ ๊ฐ™์€ ๊ฒฝ์šฐ, ์šฐ๋ฆฌ๊ฐ€ ํ”ํžˆ ์•Œ๊ณ  ์žˆ๋Š” ์ด๋ฉ”์ผ ํ˜•์‹์„ ์ƒ๊ฐํ•ด์•ผ ํ•œ๋‹ค.

์˜ˆ๋ฅผ ๋“ค๋ฉด, abc@naver.com ๋ง๊ณ ๋„ abc@hanyang.ac.kr ์ฒ˜๋Ÿผ ๋’ค์— ๋„๋ฉ”์ธ ํ˜•ํƒœ๊ฐ€ ๋‹ค๋ฅผ ์ˆ˜ ์žˆ์–ด ์ด ์ ๊นŒ์ง€ ๊ณ ๋ คํ•ด์„œ ์ž‘์„ฑํ•˜๋„๋ก ํ•˜์ž.

 

 

 

 

 

CU ํŽธ์˜์  ๋งค์žฅ ์ •๋ณด ๊ฐ€์ ธ์˜ค๊ธฐ : ๋งค์žฅ๋ช…, ์ฃผ์†Œ, ์—ฐ๋ฝ์ฒ˜, ์„œ๋น„์Šค

 

 

CU ํŽธ์˜์  ์›น ์‚ฌ์ดํŠธ์—์„œ ๋งค์žฅ์•ˆ๋‚ด > ๋งค์žฅ์ฐพ๊ธฐ์—์„œ ์ž„์˜ ์ง€์—ญ์„ ์„ ํƒํ•˜์—ฌ ๊ฒ€์ƒ‰ ๋ฒ„ํŠผ์„ ๋ˆ„๋ฅธ๋‹ค.

 

 

 

๊ฐœ๋ฐœ์ž ๋ชจ๋“œ๋ฅผ ํ†ตํ•ด, ๋„คํŠธ์›Œํฌ ํƒญ์—์„œ ์ƒˆ๋กญ๊ฒŒ list_Ajax.do ๋ผ๋Š” ํŒŒ์ผ์„ ๋ณผ ์ˆ˜ ์žˆ๊ณ  requests URL๊ณผ method ๋ฅผ ๋ณผ ์ˆ˜ ์žˆ๋‹ค. 

 

 

๋„คํŠธ์›Œํฌ ํƒญ์˜ payload ๋ฅผ ํ™•์ธํ•˜๋ฉด, ์ด๋Ÿฐ ์ •๋ณด๋“ค์„ API ๋กœ ์ œ๊ณตํ•˜๊ณ  ์žˆ์Œ์„ ์•Œ ์ˆ˜ ์žˆ๋‹ค.

 

 

import requests

cu_url = "https://cu.bgfretail.com/store/list_Ajax.do"

payload = {"pageIndex" : "1",
"listType" : "",
"jumpoCode" : "",
"jumpoLotto" : "",
"jumpoToto" : "",
"jumpoCash" : "",
"jumpoHour" : "",
"jumpoCafe" : "",
"jumpoDelivery" : "",
"jumpoBakery" : "",
"jumpoFry" : "",
"jumpoMultiDevice" : "",
"jumpoPosCash" : "",
"jumpoBattery" : "",
"jumpoAdderss" : "",
"jumpoSido" : "๊ฒฝ๊ธฐ๋„",
"jumpoGugun" : "๊ฐ€ํ‰๊ตฐ",
"jumpodong" : "๊ฐ€ํ‰์",
"user_id" : "",
"jumpoName" : "",}

r = requests.post(cu_url, data=payload)

r.text # ํ™•์ธ์šฉ

 

์—ฌ๊ธฐ๊นŒ์ง€ ์ง„ํ–‰ํ•œ ๋‹ค์Œ, ๋ถ„์„ํ•˜๊ธฐ ์œ„ํ•ด bs4 ๋ฅผ ํ†ตํ•ด ์›ํ•˜๋Š” ๋ฐ์ดํ„ฐ๋ฅผ ๊ฐ€์ ธ์˜ฌ ์ˆ˜ ์žˆ๋„๋ก ์ฐพ์•„๋ด์•ผ ํ•œ๋‹ค.

 

 

from bs4 import BeautifulSoup as BS

bs = BS(r.text)

bs # ํ™•์ธ์šฉ


tmp = bs.find("div", class_="detail_store").findAll("tr")[1]

 

ํ•„์š”ํ•œ ๋ฐ์ดํ„ฐ๊ฐ€ tmp ์•ˆ์— ์žˆ์œผ๋‚˜, ์ œ๊ณตํ•˜๋Š” ์„œ๋น„์Šค๊ฐ€ ๋ฌด์—‡์ธ์ง€ ์ •๋ฆฌํ•˜๊ณ  ํ™•์ธํ•  ํ•„์š”๊ฐ€ ์žˆ๋‹ค.

 

 

์—ฌ๊ธฐ์„œ ๋ณด๋ฉด class name ์— on ์ด ์žˆ๋Š” ๊ฒƒ๊ณผ ์—†๋Š” ๊ฒƒ์ด ์žˆ๋‹ค.

on ์ด ์žˆ๋‹ค๋ฉด ํ•ด๋‹น ์ง€์ ์—์„œ ํ•˜๊ณ  ์žˆ๋Š” ์„œ๋น„์Šค๋ผ๋Š” ์˜๋ฏธ์ด๋‹ค.

 

๊ทผ๋ฐ, ๋‚ด๊ฐ€ ํ˜„์žฌ ๋ณด๊ณ  ์žˆ๋Š” ์›น ํŽ˜์ด์ง€์— ํŽธ์˜์  ์ •๋ณด๋Š” 1ํŽ˜์ด์ง€ ํ•œ์ •์ด๋ผ์„œ, ๋‹ค์Œ ํŽ˜์ด์ง€๋ฅผ ๋„˜๊ฒจ์ฃผ๊ณ  ๋‹ค์‹œ ์ •๋ณด๋ฅผ ๋ฐ›์•„์™€์•ผ ํ•˜๋Š”๋ฐ ์ผ๋‹จ 1ํŽ˜์ด์ง€์— ํ•ด๋‹นํ•˜๋Š” ์ •๋ณด๋งŒ ๋Œ๊ณ  ์˜ฌ ๊ฒƒ์ด๋‹ค. (๋‚˜์ค‘์— ์ œ๋Œ€๋กœ ์ทจํ•ฉ ํ•ด๋ณด๋„๋ก ํ•˜๊ฒ ๋‹ค.)

 

import re


# ์ „์ฒด ์ฝ”๋“œ์—์„œ service ๊ฐ€ ์•„๋‹ˆ๋ผ sevice ๋กœ ํ•ด์•ผํ•œ๋‹ค.

p = re.compile(("sevice[0-9]{2} on")) # sevice class ์ฐพ๊ธฐ

tmp.finaAll("li", p) # ํ™•์ธ์šฉ

p2 = re.compile("_([0-9a-zA-Z]+)\.png") # ์„œ๋น„์Šค ๋ช… ์ฐพ๊ธฐ

p2.findall(str(tmp.findAll("li", p))) # png ๋ช… ์ฐพ์Œ

p2.findall(str(tmp.findAll("li", p)))[0].split(".")[0] # img ํŒŒ์ผ . ๊ธฐ์ค€ ์™ผ์ชฝ

 

์ด์ œ ์„œ๋น„์Šค ํ•ญ๋ชฉ๋“ค์„ ์–ด๋–ป๊ฒŒ ์ถ”์ถœํ• ์ง€ ์•Œ์•˜์œผ๋‹ˆ, ์ด์ œ ์ถ”์ถœํ•˜์—ฌ ์ €์žฅํ•˜๋ฉด ๋œ๋‹ค.

 

 

service = [p2.findall(str(x))[0] for x in tmp.findAll("li", p)]

for y in bs.find("div", class_="detail_store").findAll("tr")[1:]:
    print([p2.findall(str(x))[0] for x in y.findAll("li", p)])

 

์„œ๋น„์Šค๋„ ์ด์ œ ์ด๋Ÿฐ ์‹์œผ๋กœ, ์ •์ƒ์ ์ธ ๊ฐ’์ด ๋“ค์–ด๊ฐˆ ์ˆ˜ ์žˆ๋Š”์ง€ ํ™•์ธ๋ถ€ํ„ฐ ํ•˜์ž.

 

 

service_t = []
for y in bs.find("div", class_="detail_store").findAll("tr")[1:]:
    service_t.append([p2.findall(str(x))[0] for x in y.findAll("li", p)])

 

์ด์ œ, ์ œ๊ณตํ•˜๋Š” ์„œ๋น„์Šค ํ•ญ๋ชฉ๋„ ์ €์žฅํ•˜์˜€๋‹ค.

 

 

 

๊ทธ ๋‹ค์Œ์œผ๋กœ๋Š”, ์ด์ œ ๋งค์žฅ๋ช… ๋ฐ ์—ฐ๋ฝ์ฒ˜, ์ฃผ์†Œ๋ฅผ ๊ฐ€์ ธ์˜ค๋„๋ก ํ•˜์ž.

 

 

 

 

๋งค์žฅ ์ •๋ณด ๊ฐ™์€ ๊ฒฝ์šฐ, html ์„ ํ™•์ธํ•ด๋ณด๋ฉด, ํ…Œ์ด๋ธ” ํ˜•ํƒœ๋กœ ์ €์žฅ๋˜์–ด ์žˆ๋Š” ๊ฒƒ์„ ๋ณผ ์ˆ˜ ์žˆ๋‹ค.

 

์ด ๋•Œ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋Š” ๋ฉ”์„œ๋“œ๊ฐ€ pandas ์˜ read_html() ์ด๋‹ค.

 

 

import pandas as pd

store = pd.read_html(r.text)[0]

 

 

html ์˜ ํ…Œ์ด๋ธ” ๊ตฌ์กฐ๋ฅผ ๋ถ„์„ํ•˜์—ฌ ๋ฐ์ดํ„ฐ ์š”์†Œ๋“ค์„ DataFrame ์œผ๋กœ ๋งŒ๋“ค์–ด์ค€๋‹ค.

 

 

 

์ด์ œ, store DataFrame ์— ์•„๊นŒ ์œ„์—์„œ ๋งŒ๋“  ์„œ๋น„์Šค ํ•ญ๋ชฉ๋„ ๋„ฃ์–ด์ฃผ๋„๋ก ํ•˜์ž.

 

 

store['์„œ๋น„์Šค'] = service_t

 

store df ์— '์„œ๋น„์Šค' column ๋ช…์ด ์ƒ๊ธฐ๊ณ ,  ํ•ด๋‹น column ์— service_t ๊ฐ€ ์ˆœ์ฐจ์ ์œผ๋กœ ์ €์žฅ๋œ๋‹ค.

 

 

 

CU ๊ฐ™์€ ๊ฒฝ์šฐ, ๋งค์žฅ๋ช…๊ณผ ์—ฐ๋ฝ์ฒ˜ ๊ทธ๋ฆฌ๊ณ  ์ฃผ์†Œ์™€ ์„œ๋น„์Šค๊ฐ€ ๊ตฌ๋ถ„ ๋˜์–ด ์žˆ์ง€ ์•Š๊ธฐ ๋•Œ๋ฌธ์—, ๊ตฌ๋ถ„ํ•˜๋Š” ์ž‘์—…๋„ ํ•ด์ฃผ์–ด์•ผ ํ•œ๋‹ค.

 

๋จผ์ €, ๋งค์žฅ๋ช…๊ณผ ์—ฐ๋ฝ์ฒ˜๋ฅผ ๋ถ„๋ฆฌํ•ด๋ณด์ž.

 

# ๋งค์žฅ ์—ฐ๋ฝ์ฒ˜ ์ •๊ทœ์‹
phone = re.compile("[0-9]{2,3}-[0-9]{3,4}-[0-9]{3,4}")

store['๋งค์žฅ๋ช… / ์—ฐ๋ฝ์ฒ˜'].apply(lambda x : phone.findall(x)) # ์—ฐ๋ฝ์ฒ˜๊ฐ€ ์—†๋Š” ๊ฒฝ์šฐ๋Š” ๊ณต๋ฐฑ์ด๋‹ค.

# ์—ฐ๋ฝ์ฒ˜ column ์ถ”๊ฐ€ํ•˜๊ธฐ
# ์—ฐ๋ฝ์ฒ˜๊ฐ€ ์—†๋Š” ๊ฒฝ์šฐ ์ฒ˜๋ฆฌ๋ฅผ ํ•ด์ฃผ์ž
# None ๊ฐ’์„ ๋„ฃ์–ด์ฃผ์ž
store['์—ฐ๋ฝ์ฒ˜'] = store['๋งค์žฅ๋ช… / ์—ฐ๋ฝ์ฒ˜'].apply(lambda x : phone.findall(x)[0] if len(phone.findall(x)) > 0 else None)

 

 

๋งค์žฅ ์ •๊ทœ์‹์„ ํ†ตํ•ด ์—ฐ๋ฝ์ฒ˜๋ฅผ ๊ฐ€์ ธ์™€ ํ™•์ธํ•˜๊ณ , ์—ฐ๋ฝ์ฒ˜๊ฐ€ ์—†๋Š” ๊ฒฝ์šฐ None ๊ฐ’์„ ์ €์žฅํ•ด์ฃผ์—ˆ๋‹ค.

 

 

์ด๋ฒˆ์—๋Š”, ๋งค์žฅ๋ช…์„ ๋ถ„๋ฆฌํ•ด๋ณด์ž.

 

# ๋งค์žฅ๋ช… ์ •๊ทœ์‹
store_p = re.compile("[๊ฐ€-ํžฃ]+์ ")

# ๊ฐ™์€ ๋ฐฉ์‹์œผ๋กœ, ๋งค์žฅ๋ช… column ์ถ”๊ฐ€ํ•˜๊ธฐ
store['๋งค์žฅ๋ช…'] = store['๋งค์žฅ๋ช… / ์—ฐ๋ฝ์ฒ˜'].apply(lambda x : store_p.findall(x)[0])

 

๊ฐ™์€ ๋ฐฉ์‹์œผ๋กœ ๋งค์žฅ๋ช…๋„ ์ €์žฅํ•ด์ฃผ์—ˆ๋‹ค.

 

 

 

์ด์ œ, ๋งค์žฅ๋ช…๊ณผ ์—ฐ๋ฝ์ฒ˜๋Š” ๋”ฐ๋กœ ๋ถ„๋ฆฌ๋˜์–ด ์ €์žฅ๋˜์—ˆ์œผ๋‹ˆ, ๊ธฐ์กด์˜ column ์„ ์ œ๊ฑฐํ•ด์ฃผ์ž.

 

# axis = 1 ์ผ ๋•Œ๋Š”, column ์— ์ ์šฉ (0 ์ผ๋•Œ๋Š” index, ์ฆ‰ 'ํ–‰'์œผ๋กœ ์ ์šฉ๋œ๋‹ค.)
store.drop("๋งค์žฅ๋ช… / ์—ฐ๋ฝ์ฒ˜", axis = 1, inplace = True)

 

์—ฌ๊ธฐ์„œ ์ฃผ์˜ํ•  ์ ์€, inplace ๋ฅผ True ๋กœ ์ ์šฉ์‹œํ‚ค๋ฉด ๋ฐ”๋กœ store ์— ์ ์šฉ๋œ๋‹ค.

๋”ฐ๋ผ์„œ, ์‚ฌ์šฉํ•˜๊ธฐ ์ „์— ํ™•์ธ ์ ˆ์ฐจ๋ฅผ ๊ผญ ๊ฑฐ์น˜๋Š” ์Šต๊ด€์„ ๋“ค์—ฌ์•ผ ํ•œ๋‹ค.

 

 

์ด์ œ, ์ฃผ์†Œ ๋ฐ ์„œ๋น„์Šค column ๋ช…์„ ์ฃผ์†Œ๋กœ ๋ฐ”๊พธ์–ด์ฃผ์ž.

 

# ์ฃผ์†Œ / ๋งค์žฅ ์œ ํ˜• ๋ฐ ์„œ๋น„์Šค -> ์ฃผ์†Œ ๋กœ column ๋ช… ๋ฐ”๊พธ๊ธฐ
store.rename(columns={"์ฃผ์†Œ / ๋งค์žฅ ์œ ํ˜• ๋ฐ ์„œ๋น„์Šค" : "์ฃผ์†Œ"}, inplace=True)

 

column ์˜ ์ด๋ฆ„์„ ๋ฐ”๊พธ๊ณ  ์‹ถ์„ ๋•Œ, rename() ๋ฉ”์„œ๋“œ๋ฅผ ํ†ตํ•ด ๋ฐ”๊ฟ€ ์ˆ˜ ์žˆ๋‹ค.

 

 

 

๋งˆ์ง€๋ง‰์œผ๋กœ, column ๋“ค์˜ ์ˆœ์„œ๋ฅผ ์ง€์ •ํ•ด์„œ ๋ฐ”๊พธ๊ณ  ์‹ถ๋‹ค๋ฉด,

 

store = store[['๋งค์žฅ๋ช…', '์ฃผ์†Œ', '์„œ๋น„์Šค']]

 

์ด๋ ‡๊ฒŒ ๊ฐ„๋‹จํ•˜๊ฒŒ ์ง€์ •ํ•ด์ฃผ๋ฉด ๋œ๋‹ค.

 

 

 

์—ฌ๊ธฐ๊นŒ์ง€, store df ์— '๋งค์žฅ๋ช…, ์ฃผ์†Œ, ์„œ๋น„์Šค' ์ˆœ์œผ๋กœ ์ •๋ณด๊ฐ€ ์ €์žฅ๋˜์—ˆ๋‹ค.

 

 

 

 

 

CU ํŽธ์˜์  ๋งค์žฅ ์ •๋ณด ๊ฐ€์ ธ์˜ค๊ธฐ : ์‹œ,๋„ / ์‹œ,๊ตฐ,๊ตฌ / ์,๋ฉด,๋™

 

์ด์ œ, ์‹œ,๋„ ๋ชฉ๋ก์„ ๊ฐ€์ ธ์˜ค์ž.

 

 

์‹œ, ๋„ ๊ฐ™์€ ๊ฒฝ์šฐ๋Š” ์›น ํŽ˜์ด์ง€ html ์—์„œ ํ™•์ธํ•  ์ˆ˜ ์žˆ๊ธฐ ๋•Œ๋ฌธ์—, ๊ฐ„๋‹จํ•˜๊ฒŒ ์ ‘๊ทผํ•  ์ˆ˜ ์žˆ๋‹ค.

 

cu_url = "https://cu.bgfretail.com/store/list.do?category=store"

city = [x.text for x in BeautifulSoup(requests.get(cu_url).text)\
    .find("div", class_="search_wrap").findAll("option")][1:-2]

 

BS ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ html ์„ ๋ถ„์„ํ•˜์—ฌ ์ด๋ ‡๊ฒŒ ์‹œ,๋„ ๋ฅผ ์–ป์–ด์˜ฌ ์ˆ˜ ์žˆ๋‹ค.

 

 

 

๋‹ค์Œ์œผ๋กœ๋Š”, ์‹œ, ๊ตฐ, ๊ตฌ ๋ชฉ๋ก์„ ๊ฐ€์ ธ์˜ค๋„๋ก ํ•˜์ž.

 

 

 

์›น ํŽ˜์ด์ง€์—์„œ ์‹œ, ๋„ ์„ ํƒ์„ ์ž„์˜๋กœ ํ•˜๋ฉด, ์ƒˆ๋กœ์šด GugunList.do ๊ฐ€ network ํƒญ์— ์ƒ๊ธด๋‹ค.

Request URL, Request Method ๊ฐ€ ๋‚˜์™€์žˆ์Œ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค.

๋˜ํ•œ, payload ๋ฅผ ํ™•์ธํ•˜๋ฉด ์šฐ๋ฆฌ๊ฐ€ ํ•„์š”ํ•œ ์‹œ, ๋„ ์ •๋ณด๊ฐ€ ์žˆ์Œ์„ ๋ณผ ์ˆ˜ ์žˆ๋‹ค.

 

 

city = "https://cu.bgfretail.com/store/GugunList.do"

city_pay = {"pageIndex" : "1",
"listType" : "",
"jumpoCode" : "",
"jumpoLotto" : "",
"jumpoToto" : "",
"jumpoCash" : "",
"jumpoHour" : "",
"jumpoCafe" : "",
"jumpoDelivery" : "",
"jumpoBakery" : "",
"jumpoFry" : "",
"jumpoMultiDevice" : "",
"jumpoPosCash" : "",
"jumpoBattery" : "",
"jumpoAdderss" : "",
"jumpodong" : "",
"user_id" : "",
"sido" : "์„œ์šธํŠน๋ณ„์‹œ",
"Gugun" : "",
"jumpoName" : "",}


request.post(city, data=city_pay).text # ์—ฌ๊ธฐ์„œ, json ํ˜•ํƒœ์ธ ๊ฒƒ์„ ๋ณผ ์ˆ˜ ์žˆ๋‹ค.
# ํฌ๊ฒŒ GugunList ๊ฐ€ key, ๋ฆฌ์ŠคํŠธ ๊ตฌ์กฐ๊ฐ€ value
# ์ž‘๊ฒŒ๋Š” CODE_NAME ์ด key, ๊ตฌ ์ด๋ฆ„์ด value

requests.post(city, data=city_pay).json()['GugunList'] # json ์œผ๋กœ ๋ถˆ๋Ÿฌ์„œ, value๋ฅผ ๋ณธ๋‹ค.
# ๋ฆฌ์ŠคํŠธ ๊ตฌ์กฐ๊ฐ€ ๋‚˜์˜ค๊ฒŒ ๋œ๋‹ค.

# CODE_NAME key๋กœ value ๋ฅผ ์–ป์–ด ์„œ์šธํŠน๋ณ„์‹œ์˜ ๊ตฌ ์ •๋ณด๋ฅผ ์–ป์„ ์ˆ˜ ์žˆ๋‹ค.
[x['CODE_NAME'] for x in requests.post(dong, data=dong_pay).json()['GugunList']]

 

์ง€๊ธˆ๊นŒ์ง€ ์ •๋ณด๋กœ ์„œ์šธํŠน๋ณ„์‹œ์˜ ๊ตฌ ์ •๋ณด๋ฅผ ์–ป๋Š” ๋ฐ๊นŒ์ง€ ์ง„ํ–‰ํ•  ์ˆ˜ ์žˆ๋‹ค.

 

 

 

์ด์ œ, ๊ฐ ์, ๋ฉด, ๋™์˜ ๋™ ์ •๋ณด๋ฅผ ์–ป์–ด๋ณด์ž.

 

 

 

์‹œ,๊ตฐ,๊ตฌ ๋ฐฉ๋ฒ•๊ณผ ๋งˆ์ฐฌ๊ฐ€์ง€๋กœ ์,๋ฉด,๋™๋„ ๊ฐ™์€ ๋ฐฉ๋ฒ•์œผ๋กœ ์ง„ํ–‰ํ•œ๋‹ค.

 

dong = "https://cu.bgfretail.com/store/DongList.do"

dong_pay = {"pageIndex": "1",
"listType": "",
"jumpoCode": "",
"jumpoLotto": "",
"jumpoToto": "",
"jumpoCash": "",
"jumpoHour": "",
"jumpoCafe": "",
"jumpoDelivery": "",
"jumpoBakery": "",
"jumpoFry": "",
"jumpoMultiDevice": "",
"jumpoPosCash": "",
"jumpoBattery": "",
"jumpoAdderss": "",
"jumpoSido": "๊ฒฝ๊ธฐ๋„",
"jumpoGugun": "๊ฐ€ํ‰๊ตฐ",
"jumpodong": "",
"user_id": "",
"sido": "๊ฒฝ๊ธฐ๋„",
"Gugun": "๊ฐ€ํ‰๊ตฐ",
"jumpoName": "",}


# ๊ฒฝ๊ธฐ๋„ ๊ฐ€ํ‰๊ตฐ์˜ ๋™ ์ •๋ณด ๊ฐ€์ ธ์˜ค๊ธฐ
[x['CODE_NAME'] for x in requests.post(dong, data=dong_pay).json()['GugunList']]

 

 

 

 

'Python > [๊ธฐ์ดˆ ๊ฐ•์˜ ์ •๋ฆฌ]' ์นดํ…Œ๊ณ ๋ฆฌ์˜ ๋‹ค๋ฅธ ๊ธ€

python ๊ธฐ์ดˆ 11  (0) 2023.03.03
python ๊ธฐ์ดˆ 10  (0) 2023.03.03
python ๊ธฐ์ดˆ 8  (0) 2023.02.26
python ๊ธฐ์ดˆ 7  (0) 2023.02.26
python ๊ธฐ์ดˆ 6  (0) 2023.02.26