Python web scraping for javascript generated content

I am trying to use python3 to return the bibtex citation generated by . The url's are predictable so the script can work out the url without having to interact with the web page. I have tried using selenium, bs4, etc but cant get the text inside the box.

url = "#/doi/10.1007/s00425-007-0544-9"
import urllib.request
from bs4 import BeautifulSoup
text = BeautifulSoup(urllib.request.urlopen(url).read())

Can anyone suggest a way of returning the bibtex citation as a string (or whatever) in python?


You don't need BeautifulSoup here. There is an additional XHR request sent to the server to fill out the bibtex citation, simulate it, for example, with requests:

import requests

bibtex_id = '10.1007/s00425-007-0544-9'

url = "#/doi/{id}".format(id=bibtex_id)
xhr_url = 'doi2bib'

with requests.Session() as session:

    response = session.get(xhr_url, params={'id': bibtex_id})


    doi = {10.1007/s00425-007-0544-9},
    url = {},
    year = 2007,
    month = {jun},
    publisher = {Springer Science $\mathplus$ Business Media},
    volume = {226},
    number = {4},
    pages = {981--987},
    author = {Ingo Burgert and Michaela Eder and Notburga Gierlinger and Peter Fratzl},
    title = {Tensile and compressive stresses in tracheids are induced by swelling based on geometrical constraints of the wood cell},
    journal = {Planta}

You can also solve it with selenium. The key trick here is to use an Explicit Wait to wait for the citation to become visible:

from selenium import webdriver
from import By
from import WebDriverWait
from import expected_conditions as EC

driver = webdriver.Firefox()

element = WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.XPATH, '//pre[@ng-show="bib"]')))


Prints the same as the above solution.


 ? Scraping Javascript generated data
 ? Html-Agility-Pack not loading the page with full content?
 ? scrapy xpath selector repeats data
 ? Find next siblings until a certain one using beautifulsoup
 ? How to properly use mechanize to scrape AJAX sites
 ? Reading data from PDF files into R
 ? How can I input data into a webpage to scrape the resulting output using Python?
 ? curl 302 redirect not working (command line)
 ? Web page scraping gems/tools available in Ruby
 ? Python data scraping