[VB]如何提取網(wǎng)頁的數(shù)據(jù)
當(dāng)前位置:點(diǎn)晴教程→知識(shí)管理交流
→『 技術(shù)文檔交流 』
用XML和HTMLDocument處理的 Visual Basic code
Private Sub Command1_Click()Dim XMLObject As Object, HTMLDoc As ObjectDim SendStr As String, HTMLStr As StringDim DataInfo As String, S As Long, E As LongDim Info(66) As String, TempArray() As StringDim X As Long, Y As Long, I As Long, TempStr As StringDim TitleMaxByte As Long, TitleByte As Long'初始化變量Y = 0I = 0TitleMaxByte = 0TempStr = ""'通過XML取得網(wǎng)頁數(shù)據(jù)內(nèi)容Set XMLObject = CreateObject("Microsoft.XMLHTTP")Set HTMLDoc = CreateObject("htmlfile")XMLObject.open "GET", "http://quotes.money.163.com/corp/1034/code=600221.html", FalseXMLObject.setRequestHeader "CONTENT-TYPE", "application/x-www-form-urlencoded"XMLObject.Send SendStrHTMLStr = StrConv(XMLObject.ResponseBody, vbUnicode)'通過HTMLDocument對(duì)象分析出網(wǎng)頁內(nèi)包含的文本HTMLDoc.body.innerHTML = HTMLStrDataInfo = HTMLDoc.body.innerText '從網(wǎng)頁中取得全部文本信息'取得相關(guān)的資料位置S = InStr(1, DataInfo, "報(bào)表日期")E = InStr(S, DataInfo, "主編信箱")'提取資料文本DataInfo = Mid(DataInfo, S, E - S - 4)'將文本分割成以行為單位的數(shù)組TempArray = Split(DataInfo, vbCrLf)'為了讓最后輸出的文本在格式上比較好看,所以就取出信息字段的最大字節(jié)數(shù)作為格式化標(biāo)準(zhǔn)For X = 0 To 66Info(X) = RTrim(TempArray(X)) '將右邊的空格符去掉TitleByte = LenB(StrConv(Info(X), vbFromUnicode)) '取字段標(biāo)題字節(jié)數(shù)If TitleByte > TitleMaxByte Then TitleMaxByte = TitleByte '紀(jì)錄最大字節(jié)數(shù)Next X'將標(biāo)題內(nèi)容統(tǒng)一格式化為最大字節(jié)數(shù),以空格填充For X = 0 To 66'判斷如果是大類標(biāo)題就不處理If Right(Info(X), 1) <> ":" ThenTitleByte = LenB(StrConv(Info(X), vbFromUnicode)) '取當(dāng)前處理的字段標(biāo)題字節(jié)數(shù)Info(X) = Info(X) & String(TitleMaxByte - TitleByte, " ") & vbTab '用空格填充標(biāo)題內(nèi)容End IfNext X'將數(shù)據(jù)放入字段行數(shù)組中For X = 67 To UBound(TempArray)If Y >= 67 Then Y = 0: I = I + 1'判斷如果是大類標(biāo)題就不處理If Right(Info(Y), 1) <> ":" ThenIf I = 0 ThenInfo(Y) = Info(Y) & TempArray(X)ElseInfo(Y) = Info(Y) & "," & TempArray(X)End IfEnd IfY = Y + 1Next X'將處理好的行文本集合到一個(gè)文本變量中For X = 0 To UBound(Info)If Len(TempStr) = 0 ThenTempStr = Info(X)ElseTempStr = TempStr & vbCrLf & Info(X)End IfNext X'輸出文本Text1.Text = TempStrEnd Sub其實(shí)效率差不多的,只是少了下載圖片和處理顯示網(wǎng)頁的時(shí)間, 用WebBrowser的方法我這里測試的是7秒,用這個(gè)方法是5秒。 不過這種方法理論上說是要快點(diǎn)。 該文章在 2014/3/25 0:19:13 編輯過 |
關(guān)鍵字查詢
相關(guān)文章
正在查詢... |