Asked  10 Months ago    Answers:  5   Viewed   6 times

I need to download a file, save it in a folder while keeping the original filename from the website.

url <- "http://www.seg-social.es/prdi00/idcplg?IdcService=GET_FILE&dID=187112&dDocName=197533&allowInterrupt=1"

From a web browser, if you click on that link, you get to download an excel file with this filename:

AfiliadosMuni-02-2015.xlsx

I know I can easily download it with the command download.file in R like this:

download.file(url, "test.xlsx", method = "curl")

But what I really need for my script is to download it keeping the original filename intact. I also know I can do this with curl from my console like this.

curl -O -J $"http://www.seg-social.es/prdi00/idcplg?IdcService=GET_FILE&dID=187112&dDocName=197533&allowInterrupt=1"

But, again, I need this within an R script. Is there a way similar to the one above but in R? I have looked into the RCurl package but I couldn't find a solution.

 Answers

3

You could always do something like:

library(httr)
library(stringr)

# alternate way to "download.file"
fil <- GET("http://www.seg-social.es/prdi00/idcplg?IdcService=GET_FILE&dID=187112&dDocName=197533&allowInterrupt=1", 
           write_disk("tmp.fil"))
# get what name the site suggests it shld be
fname <- str_match(headers(fil)$`content-disposition`, ""(.*)"")[2]
# rename
file.rename("tmp.fil", fname)
Monday, August 23, 2021
 
Sean
 
4

In short, I'd like to implement a WebDav system and would think the easiest solution would be to store uploaded files with their original name

This is pretty wide question but to make answer short: NEVER TRUST USER PROVIDED DATA. You must always do server side validation and sanitization, otherwise you will be hacked sooner or later.

Original file name is sent by client, so it can be anything. Here's some ideas of what I'd try to send you as "original" file names knowing you are so
carefree: ../../../../etc/passwd or ../../config/db.php. Handle as-it-comes. Enjoy :)

EDIT

I should have mentioned things I've considered -- sanitizing filenames

Sanitized file name is not an original file name any more. However there's approach you could consider here to meed your goal and still stay safe. You could validate/sanitize original file names and if after that it still the same as it came from user, you can keep the file and retain the original name. If it is not, then you should reject the file upload as whole. At the end fo the day you will have only files that you can allow to be accessed with original file names via other API/interfaces.

EDIT

I've considered chmod-ing the containing folder to prevent execution

This is bad security. You should rather keep files in the folder that is not accessible directly instead.

Thursday, April 1, 2021
 
vuliad
 
3

java.io.File methods will refer to files on the master where Jenkins is running, and not in the current workspace on the slave machine (in case this job indeed runs on a slave machine).

To refer to files on the slave machine, you should use the readFile method

def log = readFile("${WORKSPACE}/${LOG}");
Thursday, July 29, 2021
 
1

FINAL EDIT: After emailing the great Duncan Temple Lang I have a solution:

library(RCurl)
library(RJSONIO)
postForm("http://api.website/v1/access?",
         .opts = list(postfields = toJSON(list(text = "Hello World!", level = "Noob")),
                      httpheader = c('Content-Type' = 'application/json', Accept = 'application/json'),
                      userpwd = "Username:Password",
                      ssl.verifypeer = FALSE))

My previous post is below but it has all now been superseded by the edit above (I've left the below for reference):

I would like to know the answer to this question myself. even looking at TARehman's answer I still can't work out how to do it (my lack of understanding is no doubt the root).

If you download curl from its website (I used the generic-win binary version), then here's one way of doing it in R via the 'system' command which is like using the window's command line window but from R:

output <- system('C:/+/curl-7.27.0-rtmp-ssh2-ssl-sspi-zlib-idn-static-bin-w32/curl  -k -u "username:password" -d "{\"text\":\"Hello World!\",\"level\":\"Noob\"}" -H "Content-Type: application/json" -H "Accept: application/json" "http://api.website/v1/access?"', intern = TRUE))

The above works with one of the API's I use. You can then use

I really hope you find an answer because it would be useful for me to know too.

EDIT: My attempt using an actual API with RCurl via @TARehman's example:

Here is the curl code which works fine:

x = system('C:/+/curl-7.27.0-rtmp-ssh2-ssl-sspi-zlib-idn-static-bin-w32/curl  -k -u "USERNAME:PASSWORD" -d "{\"text\":\"Have a nice day!\"}" -H "Content-Type: application/json" -H "Accept: application/json" "http://api.theysay.io:60000/v1/sentiment?"', intern = TRUE)

And here is how I've rewritten it in RCurl:

curl.opts <- list(userpwd = "username:password", 
                  httpheader = "Content-Type: application/json",
                  httpheader = "Accept: application/json",
                  timeout = 20, 
                  connecttimeout = 20, 
                  verbose = TRUE, 
                  useragent = "RCurl",
                  cainfo = system.file("CurlSSL", "cacert.pem", package = "RCurl"))

postForm("http://api.theysay.io:60000/v1/sentiment?", .params = c(data = '{\"text\":\"Have a nice day!\"}'), .opts = curl.opts)

This however did not work and results in the following (clearly I have not understood something correctly):

* About to connect() to api.theysay.io port 60000 (#0)
*   Trying 163.1.94.100... * connected
* Connected to api.theysay.io (163.1.94.100) port 60000 (#0)
> POST /v1/sentiment? HTTP/1.1
User-Agent: RCurl
Host: api.theysay.io:60000
Accept: application/json
Content-Length: 193
Expect: 100-continue
Content-Type: multipart/form-data; boundary=----------------------------7620b19c7a5d

< HTTP/1.1 100 Continue
< HTTP/1.1 401 Unauthorized
< WWW-Authenticate: Basic realm="Secured Resource"
< Content-Type: text/plain
< Server: spray-can/1.0-M2.1
< Date: Wed, 12 Sep 2012 12:26:06 GMT
< Content-Length: 77
< 
* Ignoring the response-body
* Connection #0 to host api.theysay.io left intact
* Issue another request to this URL: 'http://api.theysay.io:60000/v1/sentiment?'
* Re-using existing connection! (#0) with host api.theysay.io
* Connected to api.theysay.io (163.1.94.100) port 60000 (#0)
* Server auth using Basic with user '[USERNAME]'
> POST /v1/sentiment? HTTP/1.1
Authorization: Basic [KEY]==
User-Agent: RCurl
Host: api.theysay.io:60000
Accept: application/json
Content-Length: 193
Expect: 100-continue
Content-Type: multipart/form-data; boundary=----------------------------949d191bdaeb

< HTTP/1.1 100 Continue
< HTTP/1.1 415 Unsupported Media Type
< Content-Type: text/plain
< Server: spray-can/1.0-M2.1
< Date: Wed, 12 Sep 2012 12:26:06 GMT
< Content-Length: 93
< 
* Connection #0 to host api.theysay.io left intact
[1] "There was a problem with the requests Content-Type:nExpected 'text/xml' or 'application/json'"
attr(,"Content-Type")

"text/plain" 
Monday, September 6, 2021
 
Rawr
 
5

Based on Juba suggestion, here is a working RCurl template.

The code emulates a browser behaviour, as it:

  1. retrieves cookies on a login screen and
  2. reuses them on the following page requests containing the actual data.


### RCurl login and browse private pages ###

library("RCurl")

loginurl ="http=//www.*****"
mainurl  ="http=//www.*****"
agent    ="Mozilla/5.0"

#User account data and other login pars
pars=list(
     RURL="http=//www.*****",
     Username="*****",
     Password="*****"
)

#RCurl pars     
curl = getCurlHandle()
curlSetOpt(cookiejar="cookiesk.txt",  useragent = agent, followlocation = TRUE, curl=curl)
#or simply
#curlSetOpt(cookiejar="", useragent = agent, followlocation = TRUE, curl=curl)

#post login form
web=postForm(loginurl, .params = pars, curl=curl)

#go to main url with real data
web=getURL(mainurl, curl=curl)

#parse/print content of web
#..... etc. etc.


#This has the side effect of saving cookie data to the cookiejar file 
rm(curl)
gc()
Friday, September 17, 2021
 
Apollo
 
Only authorized users can answer the question. Please sign in first, or register a free account.
Not the answer you're looking for? Browse other questions tagged :