public static void validateMP3File(byte[] song) throws IOException, InvalidFileTypeException {\n InputStream file = new ByteArrayInputStream(song);\n byte[] startOfFile = new byte[1024];\n file.read(startOfFile);\n String id3 = new String(startOfFile);\n String tag = id3.substring(0, 3);\n if ( ! \"ID3\".equals(tag) ) {\n throw new InvalidFileTypeException();\n }\n}\n<\/pre>\nSaving the File<\/h2>\n All that remains now is to save the file. We can use the link text to name the file. The .mp3 extension may be added if necessary. A FileOutputStream writes the bytes to the new file.<\/p>\n
try {\n validateMP3File(bytes);\n \n String savedFileName = link.text();\n if (!savedFileName.endsWith(\".mp3\")) savedFileName.concat(\".mp3\");\n FileOutputStream fos = new FileOutputStream(savedFileName);\n fos.write(bytes);\n fos.close();\n\n System.out.println(\"File has been downloaded.\");\n} catch (IOException e) {\n\/\/... \n<\/pre>\nHere is the full source for the JsoupDemoTest class:<\/p>\n
package com.robgravelle.jsoupdemo;\n\nimport static org.jsoup.Jsoup.parse;\n\nimport java.io.ByteArrayInputStream;\nimport java.io.FileOutputStream;\nimport java.io.IOException;\nimport java.io.InputStream;\nimport java.net.URL;\n\nimport org.jsoup.Jsoup;\nimport org.jsoup.nodes.Document;\nimport org.jsoup.nodes.Element;\nimport org.jsoup.select.Elements;\n\npublic class JsoupDemoTest {\n private final static String URL_TO_PARSE = \"http:\/\/robgravelle.com\/albums\/\";\n private final static String LINK = \"t=60\";\n @SuppressWarnings(\"serial\")\n private static class InvalidFileTypeException extends Exception {}\n \n public static void main(String[] args) throws IOException {\n \/\/these two lines are only required if your Internet\n \/\/connection uses a proxy server\n \/\/System.setProperty(\"http.proxyHost\", \"my.proxy.server\");\n \/\/System.setProperty(\"http.proxyPort\", \"8081\");\n URL url = new URL(URL_TO_PARSE);\n Document doc = parse(url, 30000);\n \n Elements links = doc.select(\"a[href$=\" + LINK + \"]\");\n int linksSize = links.size();\n if (linksSize > 0) {\n if (linksSize > 1) {\n System.out.println(\"Warning: more than one link found. Downloading first match.\");\n }\n Element link = links.first();\n String linkUrl = link.attr(\"abs:href\");\n \/\/Thanks to Jeremy Chung for maxBodySize solution\n \/\/http:\/\/jmchung.github.io\/blog\/2013\/10\/25\/how-to-solve-jsoup-does-not-get-complete-html-document\/\n byte[] bytes = Jsoup.connect(linkUrl)\n .header(\"Accept-Encoding\", \"gzip, deflate\")\n .userAgent(\"Mozilla\/5.0 (Windows NT 6.1; WOW64; rv:23.0) Gecko\/20100101 Firefox\/23.0\")\n .referrer(URL_TO_PARSE)\n .ignoreContentType(true)\n .maxBodySize(0)\n .timeout(600000)\n .execute()\n .bodyAsBytes();\n \n try {\n validateMP3File(bytes);\n \n String savedFileName = link.text();\n if (!savedFileName.endsWith(\".mp3\")) savedFileName.concat(\".mp3\");\n FileOutputStream fos = new FileOutputStream(savedFileName);\n fos.write(bytes);\n fos.close();\n \n System.out.println(\"File has been downloaded.\");\n } catch (IOException e) {\n System.err.println(\"Could not read the file at '\" + linkUrl + \"'.\");\n }\n catch (InvalidFileTypeException e) {\n System.err.println(\"'\" + linkUrl + \"' does not appear to point to an MP3 file.\");\n }\n }\n else {\n System.out.println(\"Could not find the link ending with '\" + LINK + \"' in web page.\");\n }\n }\n \n public static void validateMP3File(byte[] song) throws IOException, InvalidFileTypeException {\n InputStream file = new ByteArrayInputStream(song);\n byte[] startOfFile = new byte[6];\n file.read(startOfFile);\n String id3 = new String(startOfFile);\n \/\/String tag = id3.substring(0, 3);\n if ( ! \"ID3\".equals(id3) ) {\n throw new InvalidFileTypeException();\n }\n }\n \n \/\/validateMP3File() is based on this method\n public static void getMP3Metadata(byte[] song) {\n try {\n InputStream file = new ByteArrayInputStream(song);\n int size = (int)song.length;\n byte[] startOfFile = new byte[1024];\n file.read(startOfFile);\n String id3 = new String(startOfFile);\n String tag = id3.substring(0, 3);\n if (\"ID3\".equals(tag)) {\n System.out.println(\"Title: \" + id3.substring(3, 32));\n System.out.println(\"Artist: \" + id3.substring(33, 62));\n System.out.println(\"Album: \" + id3.substring(63, 91));\n System.out.println(\"Year: \" + id3.substring(93, 97));\n } else\n System.out.println(\"does not contain\" + \" ID3 information.\");\n file.close();\n } catch (Exception e) {\n System.out.println(\"Error - \" + e.toString());\n }\n }\n}\n\n\n<\/pre>\nConclusion<\/h2>\n The Jsoup library offers a virtually unlimited number of applications for page scraping and resource fetching via website hyperlinks. If you’ve come up with your own creative uses for it, by all means share. It might just get featured in an up-coming article!<\/p>\n","protected":false},"excerpt":{"rendered":"
Fetch Hyperlinked Files using Jsoup In the Download Linked Resources using Jsoup tutorial, we learned how to select a specific hyperlink element based on a unique attribute value in order to download a linked MP3. In today’s conclusion, we’ll cover how to extract the absolute URL from the first link in the Elements Collection and […]<\/p>\n","protected":false},"author":90,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[30624],"tags":[6499,3448],"b2b_audience":[29],"b2b_industry":[52],"b2b_product":[133,107,98],"acf":[],"yoast_head":"\n
Fetch Hyperlinked Files using Jsoup | HTML Goodies<\/title>\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n\t \n\t \n\t \n