程序師世界是廣大編程愛好者互助、分享、學習的平台,程序師世界有你更精彩!
首頁
編程語言
C語言|JAVA編程
Python編程
網頁編程
ASP編程|PHP編程
JSP編程
數據庫知識
MYSQL數據庫|SqlServer數據庫
Oracle數據庫|DB2數據庫
您现在的位置: 程式師世界 >> 編程語言 >  >> 更多編程語言 >> Python

I heard that people who learn Python font anti-crawling have opened this blog, free font anti-crawling, picture font anti-crawling

編輯:Python
< h3> ️ & have spentFreely & have spentActual combat scenario < / h3>< div> we met a font crawl site, freely.The site of the font the climb not with the font file to realize, but based on image + CSS, as shown in the figure below.< / div>< br>< img SRC="/ / img.inotgo.com/imagesLocal/202207/30/202207302340499919_0.png" Alt=null loading=lazy>< br>< div> used here & have spentCSS Background migration technology to realize digital display.< / div>< br>< div> font as shown in the following picture.< / div>< br>< img SRC="/ / img.inotgo.com/imagesLocal/202207/30/202207302340499919_1.png" Alt=null loading=lazy>< br>< div> images in proportion to the width and height are & have spent< / div>< div>< code> 300 * 28 < / code>< / div>< div>, including & have spent300 & have spentPixels of proportion was placed & have spent10 & have spentNumbers, that is, each & have spent30 & have spentA number, a pixel measured interval is & have spent21.4 & have spentA pixel.< / div>< br>< div> follow-up can refer to do distinguish the value.< / div>< br>< div> below also need to make sure every time refresh, whether image changes.< / div>< br>< div> refresh and changed & have spent< / div>< div>  ̄ & have spent- & have spent ̄ | | < / div>< br>< div> but the principles are the same, is to get the picture, and then parse the corresponding pictures, by & have spentOCR Technology, the identification text.< / div>< h3> ️ & have spentFreely & have spentReal coding < / h3>< div> to get the source code, address parsing images.< / div>< br>< code> import Requests < br /> from Lxml Import Etree < br /> headers = & have spent{< br /> & have spent&quot;User - Agent&quot;: & have spent&quot;Mozilla / 5.0 & have spent(Windows NT 10.0; Win64; X64) & have spentAppleWebKit / 537.36 & have spent(KHTML, & have spentLike Gecko) & have spentChrome / 101.0.4951.54 & have spentSafari / 537.36 & amp;quot;< br />} < br />< br /> res = & have spentRequests. The get (' https://www.ziroom.com/z/ 'headers = headers) < br /> tree = & have spentEtree. HTML (res) text) < br /> img_style & have spent= & have spentTree. Xpath (& amp;quot;/ / span [@ class = 'num'] / @ style&Quot;)[0] < br /> # & have spentDon't have regular, direct interception string < br /> print (len (' background - image: & have spentThe url (/ / ')) < br /> print (len (');Background - position: & have spent42.8 px ')) < br /> # & have spentDon't have regular, direct interception string < br /> img_src & have spent= & have spentImg_style [24: len (img_style) - 30] < br />< / code>< br>< div> then by & have spentOCR Software to identify relevant information, and then extracted.< / div>< br>< code> # & have spentDownload the image file, through & have spentOCR Identify the digital < br />< br /> import Ddddocr < br /> ocr = & have spentDdddocr. Ddddocr () < br />< br /> res = & have spentRequests. The get (' https:// '+ img_src, headers = headers) < br /> # & have spentPrint (res) content) < br /> # & have spentWith Open (". / images/num_img1. PNG ', 'wb) & have spentAs F: < br /> # & have spentF.w rite (res) content) < br /> res = & have spentOCR. Classification (res) content) < br /> print (res) < br />< / code>< br>< div> test identification Numbers for & have spent< / div>< div> 5471380629 < / div>< div>, then to be apart.< / div>< br>< strong>< div> test capture images address is found, the picture address mistakes sometimes, suggest you still use regular expressions to obtain.< / div>< / strong>< br>< div> last corresponding relationship is coordinate with digital < / div>< br>< ul>< li>< div>< code> 21.4 < / code>< / div>< div> : the first number < / div>< / li>< / ul>< ul>< li>< div>< code> 42.8 < / code>< / div>< div> : the second number < / div>< / li>< / ul>< ul>< li>< div>< code> 64.2 < / code>< / div>< div> : the third digital < / div>< / li>< / ul>< br>< div> the rest follow this principle.< / div>< br>< strong>< div> & have spentYou're reading & have spent< / div>< div> "dream erasers" < / div>< div> & have spentBlog & have spentAfter reading, can a little small once & have spentFind errors, correct me directly in the comments section & have spentThe first & eraser have spent<Font Color = red>670 & lt;/ font> The original blog post < / div>< / strong>< br>
  1. 上一篇文章:
  2. 下一篇文章:
Copyright © 程式師世界 All Rights Reserved